Full data dumps are now available

We’re delighted to announce that there are now full data dumps of Freebase available for download.

Although Freebase has always had an open API to allow you to query our data, it hasn’t been so easy to get everything that’s there. Well, that’s just changed. We’re now offering full data dumps in two formats from the Freebase downloads page.

The first format is tab-separated variable files, one per Freebase “type”. That is, you can grab the TSV for Films, Digital Cameras, or Shopping Malls and load them into your preferred application, whether that’s Excel or Postgres. You can also find tarballs of any domain (like Sports or Politics), or a giant tarball of the whole lot if you’re that way inclined.

The data in those TSV files is denormalised — that is, you’ll find some of the same information in two places at once — and it doesn’t contain the guids for every single bit of data in it, so you can’t use it to build your own graph. That’s why we’ve provided the second format: a text file full of quadruples that can be reasonably trivially converted to RDF or whatever you like. You might like to think of this as the “advanced” option.

We’ll be producing these data dumps every three months from now on.

If you’re interested in our data dumps, you might also want to read about our WEX (Wikipedia Extraction) downloads.

9 Responses to “Full data dumps are now available”

  1. Will Moffat Says:

    Hi, would it be possible to provide a sample of the 2nd (link) format? It would save downloading the full 500MB link file just to satisfy my curiosity.

  2. Moritz Stefaner Says:

    Extremely awesome, thanks a lot!

  3. skud Says:

    Will: good point! I’ll ask JG and post something on Monday.

    Moritz: thanks!

  4. Vivek Puri Says:

    How about data dumps for MySQL?

  5. skud Says:

    Vivek, you should be able to load the TSVs you’re interested in into MySQL using something like the “LOAD DATA INFILE” syntax described at http://dev.mysql.com/doc/refman/5.0/en/load-data.html … but you’d need one table per type, and it wouldn’t be normalised. I’m not sure that Freebase is very well expressed in an RDBMS. May I ask what eventual application you had in mind?

  6. Vivek Puri Says:

    We are working on a social media application where we want to integrate data from freebase into our application. Looking at the files, especially film.tsv, i dont see too much data for poster/images, while if you browse freebase, you do see the data out there. Not sure if i am looking at the right file, but if it is, am i correct in assuming that you would be providing the data somewhere?

    Also, for people who are not looking to import the data right away, it is kinda hard to tell what all data each of those files would contain. As for MySQL users, i think it would be pain if you have to go about creating a table for each of those files if you dont really know the datatype for them.

    Datadumps aside, i tried searching on Freebase but could not tell for sure of API call output can be in XML format. Is it possible? The reason being that the platform we are building on(Bungeeconnect) does not have a native json to XML parser, which makes our life hard.

  7. skud Says:

    Vivek, if you want to access the images, you’ll need to use our API. The data dumps only contain “facts” from the Freebase graph, not blobs like images or articles.

    If you’d like help using the API and perhaps some further discussion on how to convert to XML, you might want to join our Developers’ mailing list at http://lists.freebase.com/mailman/listinfo/developers

  8. Vivek Puri Says:

    I meant more like url for images, not the images themselves. Anyway, i am already working with the API. Thanks

  9. skud Says:

    Will, I forwarded you a sample of the quadruple format, and I’ve also put up a sample on the download page for anyone else who’s interested.

About

Freebase is a free database of the world's information. This is the official Freebase blog.