Here’s an update on the growth of Freebase, since our last data blog in early March. In addition to overall 11% growth in topics, we’ve also added lots of images, property and type assertions from Wikipedia, and topics in certain vertical areas.
Comparing some of the larger Freebase types to where we were in early March, we see 12% growth in people, driven partially by Toby’s addition of 60,000 directors and high-level employees of public companies:
| Type | Change | 5/22/08 | 3/2/08 |
|---|---|---|---|
| Topic | 11.2% | 3,624,075 | 3,260,134 |
| Person | 12.2% | 757,905 | 675,675 |
| Location | 7.8% | 469,091 | 434,977 |
| Musical Artist |
4.8% | 365,761 | 349,002 |
| Company | 6.6% | 49,714 | 46,621 |
| Film | 0.9% | 38,362 | 38,038 |
| Book | 2.3% | 21,965 | 21,467 |
In the March data blog, we graphically depicted changes in type assertions in Freebase. Here’s a similar picture that shows growth since then.
Recall that the size of a rectangle in the picture below is proportional to the number of topics for that type in Freebase. Greenish colors indicate growth, and reddish colors show decline, where the brightness indicates degree. So, bright green means fast growth, dull green is slow growth, and black is no growth.
Some of the green growth in the picture above occurs from additional type assertions on existing topics (e.g, schools) , and some from type assertions on new topics. About 40%, or 147,772 of the new 363K topics have come from Wikipedia. To these new Wikipedia-derived topics, and existing ones, we’ve made 242,073 new property assertions and 145,171 type assertions. Two of the bright green areas are also related to data mobs in April, about schools and public libraries.
The brownish-red for the large rectangle of albums occurs because we did a refactoring of multi-disc albums, collapsing about 10K multi-disc albums into single albums.
You might have noticed more images in Freebase: we recently added 296,421 new images from Wikipedia and connected them to their associated topics.
Other areas that we’ve been working on include:
- California wines: 574 California Cabernet Sauvignon and Pinot Noir Wines and 216 Wine Producers.
- Automobiles: 1,911 Automotive Model Years added
- Public transit: 2,591 new Transit Stops and 311 Transit Lines
- Celebrities: 3,800 heights of people, mostly celebrities
- Consumer products: 759 Digital Cameras
That’s all for now. We’re working on a bunch of data sets, including books and more schools, which we’ll blog about soon.
