Learning Attributes of Social Network Users

Many people have attempted to learn social media attributes based on the linguistic features of their posts. We take a different approach based on the hypothesis that (a) someone’s interests depend on their attributes and (b) what they follow or like describes (some of) their interests. We use this hypothesis to derive a machine learning model that predicts a user attributes from what/whom they follow. We take attributes mined from user posts as labelled ground truth and use their connections as features. Unlike methods that rely on linguistic style, our method generalises across national and linguistic boundaries. A major challenge of working with social media data is that attributes can be inaccurate and social graphs contains many spurious edges that are not indicative of true interests. For this reason, we adopt a Bayesian classification paradigm, which offers a consistent framework for handling all forms of uncertainty.

Probabilistic Graphical Model

Probabilistic graphical model for learning attributes from partially labelled accounts

Twitter age distribution

The learnt ages of 700 million Twitter accounts compared to a survey of 300 American users.

Delicious Twitter Digg this StumbleUpon Facebook