If you’ve looked at the Genderizer queue lately, you’ll have noticed a lot of Person topics that have a name, but no description or picture. It makes it hard to guess their gender, and you’re left relying just on their given name. It seemed to me that a computer could do this just as well as a human, so I asked our data team what they thought.
Brian Karlak looked into it and told me:
We have ~600K people in Freebase with both names and genders.
I took the first name as the first space-separated token in the /type/object/name of these people. Dreadfully simplistic, I know, but it will do for this analysis. It does mean that “first names” will include such honorifics as “King”, “Princess” and “Dr.”, but that’s OK — some correlate very well with gender.
I looked for first names with at least 100 exemplars. Of those, there were 693 first names that correlated >99% with a particular gender. There were 507 that showed 100% correlation with a particular gender. Over 80% (412) of these were male names.
There were 127 first names that had at least 100 exemplars but did not correlate consistently with a single gender. The most gender-ambiguous name was Andrea (nearly 50/50 split), followed by Ashley (48/52), Dana (53/47), Nicola (47/53), and Charlie (47/53).
Brian tells me we’ll be able to use this to assert the genders of people in Freebase, so you don’t have to.
If you’re interested in gendered names in Freebase, I recently created an app to do more or less what Brian is doing above. It’s called Gendered Names and if you tell it a given name, it will tell you the split between male and female based on what’s in Freebase and present it as a nice chart. Here’s what it thinks about the name “Evelyn” for example:
Of course, as with any Acre app, you can view the source and clone it if you’d like to build something similar.

