Freebase at SIGMOD: a report from Kurt

[Kurt Bollacker sent me this report on his trip to the SIGMOD '08 conference in Vancouver last week.]

Jutta, Praveen, and I just spent the past 3-4 days at SIGMOD 2008, the big ACM database conference. This conference was very academic, and one of the two most prominent database conferences. Here are few things I thought worth sharing.

Our Demo:

We demo’ed Freebase during a “science fair” style system demonstration session to a variety of students, researchers, and many industry folks including Google, Microsoft, Yahoo, and SAP.

Feedback was very positive, with far more folks already aware of who we are and what we do than at the demo a year ago at AAAI 2007. Questions ranged from very general (e.g. “What is Freebase?”, “How are you going to make money?”) to very specific (e.g. “What algorithms do you use to reconcile multiple data sources?”, “How complete is your movie data?”)

While most folks had not yet used Freebase beyond a little browsing, a few have started using us seriously. At least one academic at a university in Canada has crawled all of our movie data through the API for a research project. Also, a researcher at a major industrial research lab had already downloaded the graph dumps to use in optimizing their ad placement algorithms.

Feel of the Conference:

It’s an exciting time for database research. There were two distinct paradigms represented at the conference: The old school relational world (think IBM, Oracle, SAP) and the new Internet world (think Google, Yahoo, and new startups). Often, the old school was more interested in issues related to enterprise applications and optimization, while the new world was interested in new database architectures and the Web. It will be interesting to see how the evolution of interaction between these sides plays out.

While there was a bunch of the expected hard core database algorithm/architecture papers (e.g. on query optimization and system performance), there were also several lines of novelty more related to Freebase, including:

Graph databases: There appears to be small surge of research into graph databases, including query languages, performance issues, and representation. Column stores also got a little attention, included a map-reduce language for Hadoop and a presentation by Google on a service layer built on Bigtable.

Streams and temporal data: Folks there very interested in data mining streams of data, such as Web logs. Google gave a keynote talk on how they scale their algorithms for adsense/words placement optimizations to the sets of log data they collect.

Geo data: Microsoft demo’ed their new geo index, probably spurred by the existence of PostGIS, and a few students demoed their research projects.

Provenance and versioning: There seems to be a nascent interest in the history of information in databases, but no real traction yet. One interesting note is that we seems to be the only folks in the world building a public “permanent” database. It was sometimes hard for folks to get the implications of building an “append only” database to scale or why that’s so important for collaborative creation of data.

For example, the interaction of maintaining provenance through schema refactoring has been a thorny problem for the data team for months. At sigmod, this issue seemed to be completely new to all of folks I spoke to. In fact, I believe that we are at the forefront of research and understanding in (time) scalable provenance representation.

Comments are closed.

About

Freebase is a free database of the world's information. This is the official Freebase blog.