I’m in Portland, Oregon at the O’Reilly Open Source Convention. Jamie, Colin, and Toby are (or will be) here as well and are presenting on Thursday:
Machine Learning for Knowledge Extraction from Wikipedia & Other Semantically Weak Sources
Wikipedia contains a wealth of collective knowledge but due to its semi-structured design and idiosyncratic markup mining this resource is a formidable challenge. This session will examine techniques for mining semantically weak data sources for explicit facts.
The session will utilize WEX and preprocessed normalization of Wikipedia designed to make this corpus easily accessible to developers interested in machine learning, natural language processing, or knowledge extraction. The process through which WEX is prepared, as a guide to creating mineable structures from semi-structured data, will be discussed followed by approaches to machine extraction on structures of mixed data quality.
The session is targeted at intermediate developers with an interest in machine learning or knowledge extraction (though no experience is assumed with either).
The demonstrations leverage the power of Postgres 8.3’s XPath capability to simplify the programming model and present examples in Python, but the data and principles are compatible with any modern data infrastructure.
My presentation’s tomorrow, in a tutorial about People for Geeks which is completely non-Freebase related, but if you see me round, please stop and say hi! I might be wearing a Freebase tshirt, but if not, my hair’s purple and kind of hard to miss. I’d love to meet and chat to anyone who’s using Freebase and swap notes.
