Who’s on Facebook?

David Blei points me to this report by Lars Backstrom, Jonathan Chang, Cameron Marlow, and Itamar Rosenn on an estimate of the proportion of Facebook users who are white, black, hispanic, and asian (or, should I say, White, Black, Hispanic, and Asian).

Facebook users don’t specify race/ethnicity, but they do give their last name, and Backstrom et al. use Census data on the ethnic breakdowns of last names to estimate the proportion of Facebook users in each of several Census-defined ethnic categories. They present their results for several snapshots of Facebook from 2006 through 2009.

Their analysis seems reasonable enough to me, even if it won’t be exactly right since the Facebook population is not a random sample of Americans within each ethnic category. The next step is to break things down by other variables, most obviously age, sex, education, and state of residence. Does the Census give last name data for any of these subcategories of the population?

And then there’s lots more you can do, once you have these numbers; for example, you can estimate how often people in different groups (categorized by age, sex, ethnicity, etc.) log into Facebook, how many Facebook friends they have, and so forth. You can get all sorts of details, far beyond anything my collaborators and I have learned about social connections.

Also, a few minor comments:

1. Backstrom et al. appear to use the term “white” and “Caucasian” interchangeably, which, as I’ve noted before, isn’t quite right, as most South Asians are “Caucasian” but not white. It’s not clear whether south Asians fall in the “Caucasian” or “Asian/Pacific Islander” category in this analysis.

2. The dotted lines in their very first graph are labled as “the proportion of the Internet population” for each ethnic group. I’m just wondering: where did they get these numbers?

3. Also, along the same lines, could they give the link to the public data they used? I followed the link they did give, but it was a general Census website, and I wasn’t sure where one would go to find the full tables.

6 thoughts on “Who’s on Facebook?

  1. Here is the URL to the census data:

    http://www.census.gov/genealogy/www/data/2000surn

    What I don't quite get from their approach, and I'm no expert on mixture modeling, but aren't they using the census data to refine the census data? The only contribution the data snapshot provides is rate of the last name, which I imagine could be used as a hypothesis test at each level (does the rate of smiths in FB = general population rate). But while that says something about the representative nature of FB data, I'm not sure how that lends information to the ethnic categories.

  2. outside physical anthropology it seems everyone uses 'caucasian' is a fancy synonymous for *european* white. i recall in newsweek a story about how jesus and mary were "not caucasian" since they were middle eastern.

  3. Why census-defined categories?
    Also why stop at race, rather than drop down to smalles ethnic components practical since they're using last names?
    Finally, I assume they explain this, but how do they differentiate between blacks and whites that I presume have high last name overlap? Some sort of statistical technique to split the difference of people with the last name "Jackson" etc.?

  4. Michael,
    Thanks for the helpful link.

    From it:

    "In short, Facebook used statistical data from the Census on race-surname mappings to estimate the racial makeup of their user base. For example, if 73% of people named Smith are white, then multiply the number of Smiths by .73 and add that to the number of white people. In the blog post, they describe how this method assumes Facebook users are randomly sampled from the population, and they used a mixture model to correct for this error (though more details on the modeling would’ve been great)."

Comments are closed.