Local Network Analysis

The curious case of communities in collections of connections

David Gleich

Purdue University

1. What is a community?

2. How do we detect a community?

use a score + optimization

3. Global community detection has a problem

4. Local community detection side-steps it

5. There are good algorithms for local community detection

neighborhoods, PageRank / spectral, flow-based methods

6. We can combine these ideas.

7. There aren’t just “one” set of communities in most empirical networks.

8. You can use a model to change the data you were given!

What is a community?

An algorithm.

A score.

An algorithm for optimizing a score.

“Physics”

(behavior of some process)

“Statistics and Probability”

(which often give either an algorithm or a score)

How do we detect a community?

An algorithm.

A score.

An algorithm for optimizing a score.

“Physics”

(behavior of some process)

“Statistics and Probability”

(which often give either an algorithm or a score)

How do we detect a community?

An algorithm.

A score.

An algorithm for optimizing a score.

“Physics”

(behavior of some process)

“Statistics and Probability”

(which often give either an algorithm or a score)

How do we detect a community?

min score(communities) s.t. we have a partition into communities

or

max score(communities) s.t. we have a partition into communities

Given any score function, there is a simple objective function

This setup has a problem and allows action at a distance

Local clustering is a way around this.

Global clustering

find all the communities

Local clustering

find the community for a single node

Local clustering is a way around this.

volume(S) is the sum of degrees for all nodes in S

volume*(S) is the sum of degrees for all nodes in S

unless S is more than half the total edges, when it flips

Depends on the entire graph!

Only depends on S

We will focus on conductance clustering

This can be generalized to weighted cuts and positive “per-node” volume weights.

(same caveat on the sum needs to be less than half the total)

There are good algorithms for local clustering: neighborhoods, ACL, Flow

Gleich & Seshadhri 2011

If a graph has a power-law degree distribution and high clustering coefficients, then there are vertex neighborhoods that are non-trivial conductance sets.

- Algorithm: Check conductance of all vertex neighborhoods (purely local!)

Andersen Chung & Lang 2006 (and many followups)

For a graph and a node , then we can find a set with nearby with conductance related to the best set containing .

Lang & Rao 2004 (and many followups)

For a graph and a set less than half volume, we can find the minimal conductance set contained within .

There are good algorithms for local conductance: neighborhoods, ACL, Flow

… paraphrased theorems …

There are good algorithms for local conductance: neighborhoods, ACL, Flow

Algorithm. Find all 1-hop neighborhoods. Compute their conductance.

This can be done with something like a triangle counting algorithm.

Find locally minimally groups,

conductance(neighborhood()) conductance(neighborhood()) where

The Andersen-Chung-Lang algorithm approximates a PageRank vector with a nice sparse algorithm.

There are good algorithms for local conductance: neighborhoods, ACL, Flow

the MQI method gives an

exact local clustering algorithm

Given and with volume() volume()/2, then find a set with

minimum condutance() s.t.

the MQI method gives an

exact local clustering algorithm

Given and with volume() volume()/2, then find a set with

minimum condutance() s.t.

if and if conductance() then

cut() volume() or

cut() - volume() 0

0 cut() - volume() + volume() volume()

0 cut() + volume() volume()

if we find a set where cut() + volume() volume()

then the minimum is less than

the MQI method gives an

exact local clustering algorithm

How MQI works.

Convergence

Theorem A. You always improve by at least one edge. So O(volume()*maxflow) time in the worst cast.

Theorem B. You can do binary search on in O(log(volume()) time.

… BUT …

We never see the worst case for Theorem A. Solving this is still an open problem.

the MQI method gives an

exact local clustering algorithm

FlowImprove. Andersen & Lang 2008 / LocalFlowImprove. Orecchia & Zhu 2014

minimize cut() / rvol(; , ) s.t rvol > 0

rvol() = volume() - volume()

If volume(R), then equivalent to MQI

If volume() / volume() then it's called FlowImprove

Theorem (Fountalakis, Meng, Gleich, Mahoney, 2023)

Fix . conductance(MQI()) conductance(LFI()) conductance(FI())

MQI clusters cannot grow. But LocalFlowImprove Clusters Can!

PageRank

Flow

there is a neat relationship between these flow problems and ACL-like PageRank problems by looking at flow as 1-norm and spectral / PageRank as 2-norm

minimize cut() - volume() s.t. …

minimize s.t …

minimize s.t. ...

Flow vs. spectral is just

1-norm vs. 2-norm

We can combine these ideas.

Let ACL grow, use MQI to refine.

Road networks make great examples!

The south is the reference set

Road networks make great examples!

The south is the reference set

LocalFlowImprove

finds California

Road networks make great examples!

The south is the reference set

LocalFlowImprove

finds California

Road networks make great examples!

The south is the reference set

FlowImprove

finds the northeast

… back to the entire graph …

Network community profiles show the results of many local clusters

Network Community Profile plots show conductance and size tradeoffs for many different communities.

Leskovec, Lang, Dasgupta, Mahoney, 2007-2009

We can combine these ideas.

Let ACL grow, use MQI to refine.

Network Community Profile plots show conductance and size tradeoffs for many different communities.

Leskovec, Lang, Dasgupta, Mahoney, 2007-2009

We initially get better and better clusters as they get bigger

We can combine these ideas.

Let ACL grow, use MQI to refine.

Network Community Profile plots show conductance and size tradeoffs for many different communities.

We initially get better and better clusters as they get bigger

Clusters get worse as they get bigger

Leskovec, Lang, Dasgupta, Mahoney, 2007-2009

We can combine these ideas.

Let ACL grow, use MQI to refine.

Network Community Profile plots show conductance and size tradeoffs for many different communities.

We initially get better and better clusters as they get bigger

Clusters get worse as they get bigger

At some point the volume denominator kicks in and they get slightly better

Leskovec, Lang, Dasgupta, Mahoney, 2007-2009

The finding and structure are reliable

Cond.

Size

Collaboration Network

Email Network

Facebook Interactions

~50 or 80 networks in the original paper

broadly holds for different clustering metrics too

something slightly different for very dense (avg deg >= 150) empirical networks

discussions of origins by Schoenebeck (2013) and Rohe (2018)

Networks in physics / CSE often have very good large conductance clusters

US Power Grid

Many good conductance clusters that are large

… one last point …

your data are not your model!

Remember to think about your problem

Benson, Gleich, Leskovec 2016.

Use Motifs to handle more general graph data.

your data are not your model!

Remember to think about your problem

Epasto, Lattanzi, & Paudice 2017. Partition neighbors of egonet. and build a new graph.

1. If there are really obvious communities, it tends not to matter what you do.

2. If there aren’t, it greatly matters!

3. Generally, there aren’t great communities.

Better than random? Yes. But where the details don’t matter? No.

4. Remember that the data you are given is not the model you need to use.

Building new models from this data is often useful!

5. Look at distributions / overlaps / etc. – a node is in many “communities”

Some lessions