You guessed it — that’s the number of topics currently in Freebase. We wanted to share with you some of the data loading activities that we’ve been working on recently at Metaweb.
In the last 90 days, we’ve seen the number of topics grow by 15% overall. Where has this growth occurred? Here’s a 90-day comparison of some types in Freebase:
| Type | 90-day change | 3/2/08 | 12/3/07 |
|---|---|---|---|
| Topic | 15% | 3,260,134 | 2,827,916 |
| Person | 20% | 675,675 | 561,365 |
| Location | 20% | 434,977 | 361,179 |
| Musical Artist | 13% | 348,750 | 309,674 |
| Company | 42% | 46,621 | 32,866 |
| Film | 14% | 38,038 | 33,354 |
| Book | 17% | 21,467 | 18,396 |
Here’s another way of looking at Freebase topic additions. The size of a rectangle in the picture below is proportional to the number of topics for that type in Freebase. (Note that the topics depicted in the picture are limited to those that include the topic type.) Greenish colors indicate growth, and reddish colors show decline, where the brightness indicates degree. So, bright green means fast growth, dull green is slow growth, and black is no growth. Bright red means quick decline. You get the idea. (The reddish group_member decline occurs because of some merges that occurred.)
Much of this growth is a result of our efforts to continually and increasingly leverage the growth in Wikipedia. As Jamie mentioned recently in his blog post on WEX, we’re getting about 1,700 new topics a day from Wikipedia. That’s about 50,000 new topics per month!
We’ve also been analyzing the structure in Wikipedia to make type and property assertions on Freebase topics created from Wikipedia articles. In the last few months, we’ve added over 1.5 million property assertions to these topics. Plus, we were able to create over 220,000 type assertions on them. For example, we typed Roger Clemens as a baseball player and pro athlete, based on information in his Wikipedia article. In addition, we asserted his playing position of pitcher, and one of the places he’s lived (Dayton, OH).
Another continually growing data resource for us is MusicBrainz, which adds about 1,000 new artists and 1,500 new releases per week. We’ve recently updated Freebase with MusicBrainz data to the tune of 73,888 new artists, 104,664 new releases, and 102,093 new albums.
But, we’ve also been targeting other data sources. Through these efforts, we’ve built up some interesting verticals that we’re seeing developer interest for building applications on. Here are some highlights of these data loads:
- Startups and venture funded companies: we’ve added 1,548 companies (592 of those are venture-backed), 1,184 venture investors, and 11,160 people (members of boards of directors, members of management teams). That brings us to a total of 2,535 venture-backed companies.
- California Zinfandels: 674 wines with vintages, regions, appellations, producers, and grapes. We’ve also added 259 wine producers.
- Automobiles: over 450 automotive models added in the last month. Details include body styles, fuel economy, transmission types, and colors.
- Retail store locations: 450 malls, 45,000 retail locations within those malls, and 3,700 Wal-Mart stores.
- Publicly traded companies: in the last few days, we’ve added 7,290 companies with ticker symbols and exchanges, bringing the company total to 46,621.
That’s all for now. We’ll keep you posted on a regular basis to let you know what new data we’ve loaded.
