close

  
Filter options:

Freebase Commons Common /common

  • Freebase provides full data dumps of all the current facts and assertions in our graph database. These are released on a quarterly basis and are available from http://download.freebase.com. What formats are available? Which format should I use? How is text encoded? Why are there sometimes GUIDs, and sometimes IDs? Why aren't images and article descriptions included? How can I tell if something has changed in a topic since the last data dump? How often are data dumps released? Are are there any licensing requirements if I use information from a Freebase data dump?How should I cite data dump information in a publication?   What formats are available? TSVA tab-separated file for each type in Freebase, suitable for loading into spreadsheets or database software. Each line in these files represents an instance of a Freebase type, the columns represent the available properties for the type. You may download the full set, or browse Freebase domains and types to find specific data sets. The October 2008 full download is approximately 432 Mbytes compressed in the Bzip2 format The October 2008 browseable set contains 1348 TSV files in 74 domains Link Export A full dump of Freebase assertions in a simple utf8 text format. The format of the link export is a series of lines, one assertion per line. The lines are tab separated quadruples, , , , . An assertion is a statement of fact about the object. In any assertion, either the or , or both and are present.  The October 2008 Link Export contains approximately 210MM assertions and is approximately 1220 Mbytes compressed in the Bzip2 formatWhich format should I use? TSV is best suited for loading data into spreadsheets and database software. Link Export is best suited for post-processing into RDF or XML datasets. How is text encoded? Text is encoded as UTF-8, with backslash escaping of newline, tab, and backslash. Why are there sometimes GUIDs, and sometimes IDs? Previously the data dump used the same algorithm as the Web client; when an ID could be generated for an object, that was preferred, but if no ID could be generated, a GUID was created. Moving forward from October 2008 data dumps will contain only GUIDs. Why aren't images and article descriptions included? The data dump only contains data from the Freebase graph database, which is limited to factual assertions. Images and article descriptions are contained in a separate content storage database. You can download the Freebase extraction of the English-language Wikipedia in machine readable form - since many of the article descriptions in Freebase come from Wikipedia, this will give you access to the majority of the article descriptive text. How can I tell if something has changed in a topic since the last data dump? You can use the as_of_time MQL query envelope parameter to make queries against the Freebase database as it existed in the past. See section 4.2.4.4. of the MQL Reference Guide, Making Queries in the Past, for more information. How often are data dumps released? Data dumps are released on a quarterly basis. For 2009 dumps will be released in January, April, July, and October. Are are there any licensing requirements if I use information from a Freebase data dump? Freebase Data Dumps are provided free of charge for any purpose with regular updates by Metaweb Technologies. They are distributed, like Freebase itself, under the Creative Commons Attribution (CC-BY) license and use is subject to the Freebase Terms of Service. If you include the data from these data dumps in a website or application, you must attribute us as described in our Licensing Policy. How should I cite data dump information in a publication?If you'd like to cite these data dumps in a publication, you may use: * Metaweb Technologies, Freebase Data Dumps, http://download.freebase.com/datadumps/, [date of dump used, such as October 10, 2008] Or as BibTeX: @misc{metaweb:datadumps, title = "Freebase Data Dumps" author...

Freebase Commons Freebase /freebase

Comments

Hide