If you’ve looked at the Genderizer queue lately, you’ll have noticed a lot of Person topics that have a name, but no description or picture. It makes it hard to guess their gender, and you’re left relying just on their given name. It seemed to me that a computer could do this just as well as a human, so I asked our data team what they thought.
Brian Karlak looked into it and told me:
We have ~600K people in Freebase with both names and genders.
I took the first name as the first space-separated token in the /type/object/name of these people. Dreadfully simplistic, I know, but it will do for this analysis. It does mean that “first names” will include such honorifics as “King”, “Princess” and “Dr.”, but that’s OK — some correlate very well with gender.
I looked for first names with at least 100 exemplars. Of those, there were 693 first names that correlated >99% with a particular gender. There were 507 that showed 100% correlation with a particular gender. Over 80% (412) of these were male names.
There were 127 first names that had at least 100 exemplars but did not correlate consistently with a single gender. The most gender-ambiguous name was Andrea (nearly 50/50 split), followed by Ashley (48/52), Dana (53/47), Nicola (47/53), and Charlie (47/53).
Brian tells me we’ll be able to use this to assert the genders of people in Freebase, so you don’t have to.
If you’re interested in gendered names in Freebase, I recently created an app to do more or less what Brian is doing above. It’s called Gendered Names and if you tell it a given name, it will tell you the split between male and female based on what’s in Freebase and present it as a nice chart. Here’s what it thinks about the name “Evelyn” for example:
Of course, as with any Acre app, you can view the source and clone it if you’d like to build something similar.

September 10th, 2009 at 1:00 am
Hmm, I would be careful though.
For example, the name ‘Anne’ in English is pretty much always a female name. However in Dutch it is also a male name. So even though for 99% of the world population it may be pretty unambiguously a female name, in the context of a specific language or country it could be either a male or a female name.
So if there is regional information for a topic, it would be good to take this into account to improve the accuracy of this method. Maybe first do a project to derive regional information? :) Based on keywords and presence/article size in regional wikis you could get pretty far, too ;p.
I looked in the Gendered names app, and it looks like Anne is male in 3% of the cases on Freebase. Of the examples given, most cases seem to be French, so I guess the Dutch Annes are ‘lucky’ that a bigger country also has Anne as a male name.
~Laurens
September 10th, 2009 at 1:11 am
What about using the information on Wikipedia here?
http://nl.wikipedia.org/wiki/Categorie:Jongensnaam
http://nl.wikipedia.org/wiki/Categorie:Meisjesnaam
http://en.wikipedia.org/wiki/Category:Masculine_given_names
http://en.wikipedia.org/wiki/Category:Feminine_given_names
~Laurens