The curious case of communities in collections of connections 
David Gleich
Purdue University 
1. What is a community?

2. How do we detect a community?
use a score + optimization

3. Global community detection has a problem

4. Local community detection side-steps it

5. There are good algorithms for local community detection
neighborhoods, PageRank / spectral, flow-based methods

6. We can combine these ideas.

7. There aren’t just “one” set of communities in most empirical networks.

8. You can use a model to change the data you were given! 
What is a community? 
What is a community? 
An algorithm.

A score.

An algorithm for optimizing a score. 

“Physics”
(behavior of some process) 

“Statistics and Probability” 
(which often give either an algorithm or a score)


How do we detect a community? 
An algorithm.

A score.

An algorithm for optimizing a score. 

“Physics”
(behavior of some process) 

“Statistics and Probability” 
(which often give either an algorithm or a score)


How do we detect a community? 
An algorithm.

A score.

An algorithm for optimizing a score. 

“Physics”
(behavior of some process) 

“Statistics and Probability” 
(which often give either an algorithm or a score)


How do we detect a community? 
How do we detect a community? 
min score(communities) s.t. we have a partition into communities 

or 

max score(communities) s.t. we have a partition into communities 
Given any score function, there is a simple objective function
This setup has a problem and allows action at a distance 
Local clustering is a way around this.


Global clustering
find all the communities 


Local clustering 
find the community for a single node 
Local clustering is a way around this.
volume(S) is the sum of degrees for all nodes in S

volume*(S) is the sum of degrees for all nodes in 
unless S is more than half the total edges, when it flips
Depends on the entire graph!
Only depends on S

We will focus on conductance clustering 



This can be generalized to weighted cuts and positive “per-node” volume weights. 

 
(same caveat on the sum needs to be less than half the total) 
There are good algorithms for local clustering: neighborhoods, ACL, Flow 
Gleich & Seshadhri 2011 
If a graph has a power-law degree distribution and high clustering coefficients, then there are vertex neighborhoods that are non-trivial conductance sets
- Algorithm: Check conductance of all vertex neighborhoods (purely local!) 

Andersen Chung & Lang 2006 (and many followups)
For a graph and a node , then we can find a set with nearby with conductance related to the best set containing

Lang & Rao 2004 (and many followups)
For a graph and a set less than half volume, we can find the minimal conductance set contained within .
There are good algorithms for local conductance: neighborhoods, ACL, Flow 
… paraphrased theorems … 
There are good algorithms for local conductance: neighborhoods, ACL, Flow 
Algorithm. Find all 1-hop neighborhoods. Compute their conductance. 
This can be done with something like a triangle counting algorithm. 
Find locally minimally groups, 
conductance(neighborhood()) conductance(neighborhood()) where

The Andersen-Chung-Lang algorithm approximates a PageRank vector with a nice sparse algorithm. 
There are good algorithms for local conductance: neighborhoods, ACL, Flow 
the MQI method gives an 
exact local clustering algorithm
Given and with volume() volume()/2, then find a set with

minimum condutance() s.t.
the MQI method gives an 
exact local clustering algorithm
Given and with volume() volume()/2, then find a set with

minimum condutance() s.t.

if and if conductance() then 

cut() volume() or
cut() - volume()

0 cut() - volume() + volume() volume()

0 cut() + volume() volume()

if we find a set where cut() + volume() volume(
then the minimum is less than
the MQI method gives an 
exact local clustering algorithm
How MQI works. 

Convergence

Theorem A. You always improve by at least one edge. So O(volume()*maxflow) time in the worst cast.

Theorem B. You can do binary search on in O(log(volume()) time. 

… BUT … 

We never see the worst case for Theorem A. Solving this is still an open problem. 
the MQI method gives an 
exact local clustering algorithm
FlowImprove. Andersen & Lang 2008 / LocalFlowImprove. Orecchia & Zhu 2014

minimize cut() / rvol(; , ) s.t rvol > 0

rvol() = volume() - volume()

If volume(R), then equivalent to MQI
If volume() / volume() then it's called FlowImprove

Theorem (Fountalakis, Meng, Gleich, Mahoney, 2023)
Fix . conductance(MQI()) conductance(LFI()) conductance(FI())
MQI clusters cannot grow. But LocalFlowImprove Clusters Can!
PageRank
Flow


there is a neat relationship between these flow problems and ACL-like PageRank problems by looking at flow as 1-norm and spectral / PageRank as 2-norm

minimize cut() - volume() s.t. … 

minimize s.t … 

minimize s.t. ...

Flow vs. spectral is just 
1-norm vs. 2-norm 
We can combine these ideas. 
Let ACL grow, use MQI to refine. 
Road networks make great examples!
The south is the reference set
Road networks make great examples!
The south is the reference set
LocalFlowImprove
finds California
Road networks make great examples!
The south is the reference set
LocalFlowImprove
finds California
Road networks make great examples!
The south is the reference set
FlowImprove
finds the northeast




… back to the entire graph … 

Network community profiles show the results of many local clusters
Network Community Profile plots show conductance and size tradeoffs for many different communities. 
Leskovec, Lang, Dasgupta, Mahoney, 2007-2009
We can combine these ideas. 
Let ACL grow, use MQI to refine. 
Network Community Profile plots show conductance and size tradeoffs for many different communities. 
Leskovec, Lang, Dasgupta, Mahoney, 2007-2009
We initially get better and better clusters as they get bigger 

We can combine these ideas. 
Let ACL grow, use MQI to refine. 
Network Community Profile plots show conductance and size tradeoffs for many different communities. 
We initially get better and better clusters as they get bigger 

Clusters get worse as they get bigger
Leskovec, Lang, Dasgupta, Mahoney, 2007-2009
We can combine these ideas. 
Let ACL grow, use MQI to refine. 
Network Community Profile plots show conductance and size tradeoffs for many different communities. 
We initially get better and better clusters as they get bigger 

Clusters get worse as they get bigger
At some point the volume denominator kicks in and they get slightly better
Leskovec, Lang, Dasgupta, Mahoney, 2007-2009
The finding and structure are reliable
Cond.
Size
Collaboration Network
Email Network
Facebook Interactions
~50 or 80 networks in the original paper
broadly holds for different clustering metrics too  
something slightly different for very dense (avg deg >= 150) empirical networks
discussions of origins by Schoenebeck (2013) and Rohe (2018)

Networks in physics / CSE often have very good large conductance clusters
US Power Grid
Many good conductance clusters that are large




… one last point … 
your data are not your model!
Remember to think about your problem
Benson, Gleich, Leskovec 2016. 
Use Motifs to handle more general graph data. 
your data are not your model!
Remember to think about your problem
Epasto, Lattanzi, & Paudice 2017. Partition neighbors of egonet. and build a new graph.
1. If there are really obvious communities, it tends not to matter what you do.

2. If there aren’t, it greatly matters!

3. Generally, there aren’t great communities. 
Better than random? Yes. But where the details don’t matter? No. 

4. Remember that the data you are given is not the model you need to use. 
Building new models from this data is often useful! 

5. Look at distributions / overlaps / etc. – a node is in many “communities” 
Some lessions