Basic Concepts

If you have not dealt with Freebase Data ever before, this section contains the basic concepts in Freebase data modeling that you cannot afford to miss.

Topics

A lot of data in Freebase corresponds to information you might find on Wikipedia. Corresponding to a Wikipedia article is a Freebase topic (as in "topic of discourse"). For example, corresponding to this Wikipedia article on Bob Dylan is this Freebase topic on Bob Dylan. Of course, Freebase contains a lot of topics that have no correspondence in Wikipedia.

The term "topic" is chosen purposedly for its vagueness because there are all kinds of topic. Topics can range from

Some topics are important because they hold a lot of data (e.g., Wal-Mart), and some are important because they link to many other topics, potentially in different domains of information. For example, because such abstract topics as love, poverty, chivalry, etc. can be book subjects, poetry subjects, film subjects, and so forth, given a book, we can find poems and films about the same subject(s) as the book.

Types and Properties

As we mention in the introduction of this Getting Started guide, there can be many aspects to the same topic:

  • Bob Dylan was a song writer, singer, performer, book author, and film actor;
  • Leonardo da Vinci was a painter, a sculptor, an anatomist, an architect, an engineer, ...;
  • Love is a book subject, film subject, play subject, poetry subject, ...;
  • Any city is a location, potentially a tourist destination, and an employer of civil servants.

In order to capture this multi-faceted nature of many topics, we introduce the concept of types in Freebase. The topic about Bob Dylan is assigned several types: the song writer type, the music composer type, the music artist (singer) type, the book author type, etc. Each type carries a different set of properties germane to that type. For example,

  • the music artist type contains a property that lists all the albums that Bob Dylan has produced as well as all the music instruments he was known to play;
  • the book author type contains a property that lists all the books Bob Dylan has written or edited, as well as his writing school of thoughts or movement;
  • the company type contains many property for listing a company's founders, board members, parent company, divisions, employees, products, year-by-year revenue and profit records, etc.

Thus, a type can be thought of as a conceptual container of properties that are most commonly needed for describing a particular aspect of information. (You can think of a type as analogous to a relational table, and each "type" table has a foreign key into the one "identity" table that uniquely defines each topic.)

Hands-on exercise:

Domains and IDs

Just as properties are grouped into types, types themselves are grouped into domains. Think of domains as the sections in your favorite newspaper: Business, Life Style, Arts and Entertainment, Politics, Economics, etc. Each domain is given an ID (identifier), e.g.,

  • /business is the ID of the Business domain
  • /music - the Music domain
  • /film - the Film domain
  • /medicine - the Medicine domain
  • etc.

The ID of a domain looks like a file path, or a path in a web address.

Each type is also given an ID, and its ID is based on the domain in which it belongs. For example, the Company type belongs in the Business domain, and it's given the ID /business/company. Here are some other examples:

  • /music/album is the ID of the (Music) Album type, belonging in the Music domain
  • /film/actor - the Actor type in the Film domain
  • /medicine/disease - the Disease type in the Medicine domain

Hands-on exercise:

Just as a type inherits the beginning of its ID from its domain, a property also inherits the beginning of its ID from the type it belongs to. For example, the Industry property of the Company type (used for specifying which industry a company is in) is given the ID /business/company/industry. Here are some other examples:

  • /automotive/engine/horsepower is the ID of the Horsepower property of the (Automotive) Engine type
  • /astronomy/star/planet_s is the ID of the Planets property of the Star type (used for listing planets around a star)
  • /language/human_language/writing_system is the ID of the Writing System property of the Human Language type

Thus, domains, types, and properties are given IDs conceptually arranged in a file directory-like hierarchy.

Namespaces, Keys, and Topic IDs

The file directory-like hierarchy of domain, type, and property IDs is just one application of a more general concept: namespaces and keys. A namespace is like a file directory, and a key is like a file name. Just as all file names within a particular file directory must be unique among themselves, all keys within a particular namespace must also be unique among themselves.

As a more specific example, /business is the namespace corresponding to the Business domain. Within it, Business-related types are given keys (e.g., company) that are unique among themselves. Each type's ID is formed by appending its key to the namespace's ID (e.g., /business/company).

There are several kinds of namespaces beside namespaces that correspond to domains and types. Most important and frequently encountered is the /en namespace. This is the English namespace in which most well-known topics can be given unique keys to form human-readable English IDs. For example, the prolific Bob Dylan is so well-known that his topic in Freebase is given the key bob_dylan in the /en namespace, and so the topic's ID is /en/bob_dylan. This ID allows you to access his topic on Freebase.com with the simple URL

http://www.freebase.com/view/en/bob_dylan

We will continue this discussion in depth in the Namespaces and Keys section.

Topic GUIDs

While a topic might or might not be identifiable by namespace/key IDs, it can always be identified with a GUID--a Globally Unique Identifier, in the form of 32 hexadecimal digits following a #. For example, the GUID of the Gone With the Wind movie is #9202a8c04000641f800000000081e23b. Each topic has exactly one GUID, and each GUID maps to exactly one topic.

So that GUIDs and namespace/key IDs can be used in a uniform manner, Freebase APIs can understand the virtual namespace /guid. That is, if you were to ask Freebase for the topic with the ID /guid/9202a8c04000641f800000000081e23b, then Freebase understands that you want the topic with the GUID #9202a8c04000641f800000000081e23b. For example, if you were to navigate your browser to

http://www.freebase.com/view/guid/9202a8c04000641f800000000081e23b

then Freebase would recognize the virtual /guid namespace, identify the topic with that GUID, try to find an /en key for the topic, and redirect you to a more human-readable URL

http://www.freebase.com/view/en/gone_with_the_wind_1939

Thus, for all practical purposes, you can think of /guid/9202a8c04000641f800000000081e23b as an ID composed of the key 9202a8c04000641f800000000081e23b in the namespace /guid.

More on Properties

The last basic concept to discuss involves a major difference between Freebase properties and their analogy in relational database technologies, namely relational table columns. For each row, a relational table column can only hold one value. For example, consider a typical "book" relational table with a column named "author". For each row in the "book" table, the "author" column can only hold one foreign key to an "author" table. If a book happens to have several authors, then this simple relational schema design does not work, and we would have to make a new table to model the authorships. That is, we would need one "book" table, one "author" table, and one "authorship" table to store the n-to-n relationships between books and authors. And the way you retrieve data changes quite radically as you switch from one schema design to the other.

In contrast with conventional database technologies, Freebase considers multi-value properties to be so desirable in modeling real-life data that it supports multi-value properties by default. That is, when the /book/written_work/author property was created, it was assumed to allow for multiple authors per book, and you can query for a multi-value property and for a single-value property in exactly the same way. There is no need to think if you need to join with a third table that models the n-to-n relationship.

Summary

  • A type is a conceptual container of related properties commonly needed to describe a certain aspect of a topic.
  • A topic can be assigned one or more types (the default type being /common/topic)
  • As properties are grouped into types, types are grouped into domains.
  • Domains, types, and properties are given IDs in a namespace/key hierarchy.
  • Common well-known topics are given IDs in the /en namespace, which are human-readable English strings.
  • Topics are uniquely identified within Freebase by GUIDs.
  • Properties are multi-value by default, and multi-value properties and single-value properties can be queried in the same way.