Real Time Association Mining for Large Networks

Existing methods to analyse local community structure in large graphs either rely on distributed computing facilities or incur excessive run-times making them impractical for exploratory work (Clauset, 2005; Bahmani, Chakrabarti, and Xin, 2011). We have developed a real-time tool for analysis of large graphs. Performance is of vital importance as the number of possible queries grows exponentially with the size of the network, and the results of previous queries are combined with human knowledge to inform future queries. Currently no tool exists that provides this important functionality. We define the association strength using the Jaccard similarity of neighbourhood graphs.

To get real-time performance we store a minhash fingerprint for every account of interest. The fingerprint encodes the Jaccard similarity to all other accounts more compactly than storing all pairwise Jaccard coefficients explicitly. The fingerprints allow rapid querying of associated accounts using Locality Sensitive Hashing (LSH). The LSH query outputs a weighted sub-graph onto which we run the WALKTRAP community detection algorithm before visualising the results (Pons and Latapy, 2005).

Twitter major celebrities

Associations between major global celebrities as seen on Twitter

Delicious Twitter Digg this StumbleUpon Facebook