Search the API Reference:
This chapter explains, and demonstrates with examples, how to deliver MQL write queries to Metaweb's mqlwrite service. As a necessary prerequisite, it shows how to log in to Metaweb to obtain authentication credentials. The chapter also demonstrates how to use the upload service to upload content, such as images and HTML documents to the Metaweb content store.
The examples in this chapter are written in Python. Some are extensions to the metaweb.py module of Example 4.15, adding support for the various write services. Other examples are command-line scripts that use metaweb.py to create Metaweb objects and upload content.
No registration or authentication is required to use the mqlread service, but writing to Metaweb requires that you login first. Before we can cover mqlwrite, therefore, we must explain the login service.
Since Metaweb services are HTTP-based, Metaweb authentication is cookie-based. You log in to the sandbox server by making an HTTP request (either a GET or a POST) to the URL https://sandbox.freebase.com/api/account/login, passing your username and password as URL-encoded form parameters. If login is successful, Metaweb returns one or more HTTP cookies in the response headers. These cookies contain your authentication credentials, and you must pass these back to Metaweb in the HTTP request headers of all subsequent write and upload requests. To log into the main freebase.com server instead of the sandbox, use api.freebase.com instead of sandbox.freebase.com.
The names and values of the authentication cookies are an implementation detail rather than a specification detail, and are subject to change. To ensure success, your code should accept all cookies returned by the login service, and must present all of them to subsequent calls to the mqlwrite and upload services.
If you write your applications using a suitably high-level HTTP library (or run them in a browser), cookie handling may be performed automatically for you. In this case, you may want to add the parameter rememberme=1 to the login request; doing this will cause Metaweb to return persistent cookies rather than session cookies. In the metaweb.py module shown in Example 4.15, the Session class maintains a "cookie jar" for storing cookies, and the _http() utility method adds cookies to each HTTP request and retrieves new cookies values from each HTTP response. We can therefore add support for write services to the module without doing any explicit cookie management.
The cookies returned by the login service have a long lifetime and it is possible for a script to login once, and then save the contents of its cookie jar to a file. On subsequent runs, it can load the cookie jar from the file rather than logging in again. The examples in this chapter, however, simply call the login service each time the script is executed.
Example 6.1 shows the Python code for a Metaweb login() method. This method is written to be part of the metaweb.Session class of Example 4.15, and it depends on constants and utility methods defined in that example. If login fails, the login() method raises a metaweb.ServiceError exception. Otherwise it stores authentication credentials in the cookie jar of the Session object and returns silently.
Example 6.1. metaweb.py: the login service
def login(self, username, password):
"""
Submit the username and password to the Metaweb login service.
This causes Metaweb to return an authentication cookie which will
be passed back to the server with all subsequent requests.
Returns nothing, but raises ServiceError on failure.
"""
# This is the URL that we POST our login request to
url = "https://%s%s" % (self.host, LOGIN)
# This header specifies how the request body is encoded
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
# This is the body of the request
body = urllib.urlencode({'username':username, 'password':password})
# POST the request body to the url. Store authentication cookies
# returned with the response. Parse the response to test for success
# and raise an error if login failed.
self._check(self._fetch(url, headers, body))
The fact that this login() method can use the same _fetch() and _check() utility methods that the read services of Chapter 4 do indicates that the Metaweb login service behaves very much like a read service. The HTTP response body is a JSON object that includes a code property. If the value of this property is "/api/status/ok", then the login was successful, and the Cookie header of the response contains the authentication credentials. Otherwise, the login failed, and the messages property contains an error message and other details.
Chapter 5 explained, in great detail, how to express write queries in MQL. Now we explain how to send those queries to Metaweb and retrieve the result. The Metaweb write service is mqlwrite. The path to this service is /api/service/mqlwrite, and using it is much like using mqlread. Follow these steps to perform a write:
-
Place your write query into a request envelope object, as the value of the
queryproperty. -
Add any necessary envelope parameters to the envelope object.
-
Serialize the envelope object to a JSON string, and URL encode the string.
-
Make an HTTP POST request to
/api/service/mqlwriteon the Metaweb server, configured as follows:-
The body of the request should be the string
query=followed by the URL-encoded and JSON serialized query envelope -
The request must have a
Cookieheader that contains the authentication cookies returned by the login service. -
For security reasons, the request must also include an
X-Metaweb-Requestheader. The value doesn't matter; this header must simply be present.
-
-
When the response arrives, parse the
Set-Cookieheader to extract themwLastWriteTimecookie and save it for use with subsequent read requests. -
The body of the response will be a JSON string. Parse this to obtain the response envelope.
-
Check the
codeproperty of the response envelope. If it is "/api/status/ok", then the write query succeeded and theresultproperty of the response envelope contains the query results. If thecodeproperty has any other value, then the write failed, and themessagesproperty is an array of message objects that contain details.
In addition to special cookie and HTTP header requirements, there are two important differences between mqlwrite and mqlread. First, mqlwrite responds only to POST requests, not GET requests. Second, it does not support a callback URL parameter. These differences mean that it is impossible to invoke the mqlwrite service from client-side JavaScript code using a <script> tag.
The sub-sections that follow demonstrate the use of mqlwrite with Python code, explain how to submit multiple named queries in a single mqlwrite invocation, and provide details about mqlwrite envelope parameters, the X-Metaweb-Request header, the mwLastWriteTime cookie.
Example 6.2 is a write() method intended to be inserted into the metaweb.Session class of Example 4.15. It uses a number of the utility methods defined in that previous example, and the fact that these utilities can be shared between mqlread and mqlwrite code demonstrates how similar these two services are. In particular, recall that the _fetch utility method invokes the _http method, which does automatic cookie management. This means that our code does not have to explicitly handle authentication credentials or the mwLastWriteTime cookie. Also, remember that: _http automatically does an HTTP POST when a body is supplied for the request, _fetch does JSON parsing of the HTTP result, and _check raises a ServiceError if the code property of the response is not /api/status/ok.
Example 6.2. metaweb.py: invoking the mqlwrite service
def write(self, q, **options):
"""
Submit the MQL write q and return the result as a Python object.
Options specify envelope parameters. Authentication credentials,
from a previous call to login() must exist in the cookie jar.
Raises ServiceError if the query fails.
"""
# We're requesting this URL
url = 'https://%s%s' % (self.host, WRITE)
# These headers identify how the query is encoded in the request body
# and guard against XSS attacks.
headers = {
'Content-Type': 'application/x-www-form-urlencoded',
'X-Metaweb-Request': 'True'
}
# Gather options that apply to this query
opts = self._getopts("use_permission_of", **options)
# Build the query envelope, adding options to it
envelope = {'query': q}
envelope.update(opts)
# JSON encode the envelope
encoded = self._dumpjson(envelope)
# Use the encoded envelope as the value of the query parameter in
# the body of the request.
body = urllib.urlencode({'query':encoded})
# Now do the POST and parse and check the response
response = self._check(self._fetch(url, headers, body))
# Return the result from the response envelope
return response['result']
Example 6.1 and Example 6.2 are helpful methods, but they are just utilities, not real-world examples of how you might write to Metaweb. For the sake of example, let's suppose that you are a coin collector and you think that freebase.com should include information about each of the coins issued by the US Mint under its 50 State Quarters program. First, visit the US Mint's website to find out when each state quarter was released, how many were minted for each state, and when each state became a state.
Next, create a Metaweb type to model this data. Login to sandbox.freebase.com and create a new type named "US State Quarter" in your default domain. For our example, we'll use the type id /user/docs/default_domain/us_state_quarter.
Next, give your type four properties:
-
A property named "State", of
/location/us_stateto specify the state with which the quarter is associated. -
A property named "Release", of
/type/datetime, to specify the date on which the quarter was released into circulation. -
A property named "Statehood", of
/type/datetimeto specify when the state gained statehood. -
A property named "Mintage", of
/type/int, to specify how many quarters were minted.
Optionally, make each of these properties unique by clicking the "Restrict to one value" checkbox.
Next, you need to get your data into manageable form. Extract data from the US Mint site, and arrange it in a plain text file named quarters.txt that looks like the following:
Delaware,1999-01-04,1787-12-07,774824000 Pennsylvania,1999-03-08,1787-12-12,707332000 New Jersey,1999-05-17,1787-12-18,662228000 Georgia,1999-07-19,1788-01-02,939932000 Connecticut,1999-10-12,1788-01-09,1346624000 Massachusetts,2000-01-03,1788-02-06,1163784000
Each line in this file is the data for a single quarter. Fields are separated by commas. The first field is the name of the state. The second and third fields are the release date and statehood date for that state. And the fourth field is the mintage for that state quarter.
With our type created, and the data in this format, we can now write a simple script to upload the data to sandbox.freebase.com. Example 6.3 shows how to use the metaweb.py module to do this. Note that you need to insert your own Freebase username and password into the script to make it work for you.
Example 6.3. quarters.py: writing a data set to Metaweb
import metaweb # Use the metaweb module
USERNAME = 'username' # Put your Freebase username and password here
PASSWORD = 'secret'
# The ID for our US State Quarter type depends on our username
TYPEID = '/user/' + USERNAME + '/default_domain/us_state_quarter'
# Create a metaweb.Session object to interact with the sandbox server
sandbox = metaweb.Session("sandbox.freebase.com")
# Make sure we can log in before we go any further
sandbox.login(USERNAME, PASSWORD)
# We will be creating multiple quarters in a single MQL query.
# We start with an empty array and add MQL writes to it in the loop below
query = []
f = open("quarters.txt", "r") # Open our file of quarter data
for line in f: # Loop through lines of the file
# Break each line into fields
fields = line.strip().split(',')
# This query creates a single quarter
q = {'create':'unless_exists', # Create a new object
'id':None, # And return its id
'type':["/common/topic", TYPEID], # Make it a topic and a quarter
'name': fields[0] + ' State Quarter', # The object's name
'state': {'connect':'update', # Connect to...
'type':'/location/us_state', # a US State object...
'name':fields[0]}, # with this name.
'release': fields[1], # Release date
'statehood': fields[2], # Statehood date
'mintage': int(fields[3])} # How many minted
# Add this write to the array of writes
query.append(q);
f.close() # Close the data file
# Now send our one big query to Metaweb and get the result
result = sandbox.write(query)
# Display the id of the Metaweb object for each state
for r in result:
print "%s %s: %s" % (r['create'], r['state']['name'], r['id'])
As we saw in Chapter 5, the MQL write grammar allows multiple independent writes to be specified in a single query as elements of a JSON array, and we took advantage of this fact in Example 6.3 to create all of our US State Quarter objects in a single invocation of mqlwrite.
Like mqlread, the mqlwrite service also allows multiple queries to be submitted using the queries parameter instead of the query parameter. To use the queries parameter, place one or more regular query envelope objects inside a new "outer envelope" object. The property names used in the outer envelope are arbitrary and are re-used in the response envelope. This is exactly the same for mqlwrite as it is for mqlread.
Suppose we want to create two objects in a single call to mqlwrite. Here are two envelopes that can accomplish that:
| 2 Writes in 1 Query | 2 Queries in 1 Envelope |
|---|---|
{
"query":[{
"create":"unless_exists",
"name":"my test object #1"
},{
"create":"unless_exists",
"name":"my test object #2"
}]
}
|
{
"q1":{
"query":{
"create":"unless_exists",
"name":"my test object #3"
}
},
"q2":{
"query":{
"create":"unless_exists",
"name":"my test object #4"
}
}
}
|
When you include multiple writes in a single query, the writes are executed atomically: they all succeed or they all fail. As a result, they are not allowed to depend on each other, and there is no way to tell what order they are executed in.
If you submit multiple queries using the queries parameter, they are not atomic. Each one succeeds or fails on its own – the outer response envelope includes separate response envelopes for each query, and each of these inner response envelopes has its own code property to indicate the success or failure of the query. Note, however, that queries are unordered, and there is no guarantee that they will be executed in the order in which they are written. Since execution order cannot be predicted, queries passed to in the same invocation of mqlwrite should not depend on each other.
Note that the read() method of Example 4.15 accepts multiple read queries and uses the queries parameter to mqlread. The write() method of Example 6.2, allows only a single write query and uses the query parameter instead. It would not be difficult, however, to modify the write() method to use the queries parameter instead.
Each query envelope passed to mqlwrite includes a property named query to specify the MQL query to be executed. Any other properties within the envelope object are known as envelope parameters and provide additional input to mqlwrite. At the time of this writing [24], the only supported envelope parameter for mqlwrite is use_permission_of.
By default, new objects created with MQL write queries are given a permission property of /boot/all_permission, which makes them modifiable by any Metaweb user. Once an object has been created, it is not possible to alter its permission property, so if you want to create an object with restricted write permissions, you must do so when the object is created.
The use_permission_of envelope parameter does exactly this. The value of this parameter should be an object id. The permission of the specified object is reused and becomes the permission of any objects created by the query. Note that this parameter is not named use_permission and its value should not be the id of a /type/permission object. Instead, specify the id of an object (typically a user, domain or type) whose permission you want to copy.
As an example, recall from Chapter 5 that we created a namespace /user/docs/music/notes to hold names for instances of our /user/docs/music/note type. We created that namespace with a MQL write query, and submitted the query using the query editor on the sandbox server. The query editor didn't set the use_permission_of parameter, and we ended up with a globally writeable namespace. It is more likely that we would want that namespace to be restricted to the same set of users that have permission to modify the /user/docs/music domain, so we should have submitted the query using this envelope:
{
"query": {
"id":"/user/docs/music",
"/type/namespace/keys": {
"value":"notes",
"namespace": {
"create":"unless_connected",
"type":"/type/namespace",
"unique":false
}
}
},
"use_permission_of":"/user/docs/music"
}
The metaweb.py module uses Python's named arguments for specifying envelope parameters, and we can execute the write above with code like this:
import metaweb
username = "username"
password = "secret"
domain = "/user/" + username + "/music"
sandbox = metaweb.Session("sandbox.freebase.com", use_permission_of=domain)
sandbox.login(username, password)
sandbox.write({"id":domain,
"/type/namespace/keys": {
"value":"notes",
"namespace": { "create":"unless_connected",
"type":"/type/namespace",
"unique":False }}})
The Metaweb mqlwrite service (and also the upload service documented later in this chapter) require a custom HTTP request header, named X-Metaweb-Request to be present in all requests. The value of the header is ignored, but if the header is not present in the request, the request will not be processed.
The requirement that the custom header be present is a security measure to prevent cross-site scripting (XSS) attacks. As a practical matter, it means that the mqlwrite and upload services cannot be invoked via HTML form submission, since there is no way to tell a web browser to add a custom header like this when POSTing a form.
For efficiency, Metaweb servers cache query results. Suppose you issue a read query and get a result. Then, another Metaweb user performs a write that alters the data you queried. If you re-issue your query, you are likely to get the same results (now cached) that you got the last time. After some time, the cached result will time out, and you'll see the new data written by the other user, but this will not happen right away.
If you're mixing reads and writes yourself, however, you always want your read queries to return the data you've just written, of course. Metaweb uses a cookie-based scheme to ensure that this happens. Any time you do a write with mqlread (or upload), the service returns a mwLastWriteTime cookie. Your web browser (or client code) then presents this cookie when it makes any subsequent read requests. This tells the mqlread service that it may not return any cached result older than the most recent write you've done. Your results may not arrive as quickly, but they will be consistent with the writes you've performed.
If you do your reads and writes using the freebase.com client, this cache management is invisible to you because your web browser handles cookies automatically. Similarly, the metaweb.py module developed in this chapter and Chapter 4 manages cookies automatically in the _http utility method, and the results returned by the read() method will be consistent with any previous writes made with the write() method.
If you develop your own library for working with mqlread and mqlwrite, you'll need to be aware of this caching issue and handle this mwLastWriteTime cookie appropriately.
Sometimes automatic cookie management is not enough to get proper caching behavior. Suppose you run one script to perform some writes, and then run another script to do some reads. Unless you saved the contents of cookie jar after doing the writes in the first script, and initialized the cookie jar of the second script from the saved state before doing the reads, you may get cached results that do not include the newly written data. Even trickier is the case where you do writes in the client (adding a property to a type, for example) and then do reads in a script. Unless you can make your script share the cookies of your web browser, you're likely to run into trouble. Another possible source of difficulty is when you perform reads in your web browser to verify the success of writes you've done in a script. If you check the state of an object with your web browser, and then run a script to modify that object, and then check the object again in your browser, you may not see the modification you've just made.
To solve these problems you need a way to obtain a current mwLastWriteTime cookie. The touch service does just that. If you make an HTTP GET request to /api/service/touch, the Metaweb server will send you fresh mwLastWriteTime cookie that allows any subsequent reads to see the current state of the database. At the time of this writing [25], you can do this in the freebase.com client by typing F8 to open the "Dev Tools" box at the bottom of the web page. Once you've done that, scroll to the bottom and click the "Refresh cache" link that has appeared. You can then type F8 again to remove the Dev Tools.
Example 6.4 shows how to invoke the touch service in Python. It defines a simple touch() method to be added to the metaweb.Session class of Example 4.15:
Example 6.4. metaweb.py: invoking the touch service
def touch(self):
"""
Defeat caching by requesting a current mwLastWriteTime cookie.
This method returns nothing.
"""
self._fetch("https://" + self.host + TOUCH)
Despite its name, the value of the mwLastWriteTime cookie is not just a timestamp: it is an opaque value that contains cache information other than just the last write time. For this reason, it is not possible to simply create your own value for the cookie – you have to obtain a valid value from mqlwrite, upload, or touch.
The unique feature of Metaweb is the way that it stores relationships between objects. But, like any database, it can also store large chunks of data, such as long HTML documents or binary image files. In Chapter 4 we learned about the trans service for retrieving content. Here, we'll learn how to upload content to be stored in Metaweb. The service for uploading content is named upload, and has the URL path /api/service/upload. The upload service responds only to HTTP POST requests. The data to be uploaded is passed in the body of the request. The MIME type of the content, as well as the encoding of textual content is specified in the HTTP Content-Type request header.
Example 6.5 shows an upload() method for our metaweb.py module. It expects two arguments: a string of content (Python strings can be binary data or textual data) and the MIME type for the content. upload() creates a new /type/content object to hold your content and returns the id (in the /guid pseudo-namespace) of that object to you. This guid can be used to retrieve the content with the /api/trans/raw service (see Chapter 4).
Example 6.5. metaweb.py: uploading content to Metaweb
def upload(self, content, type):
"""
Upload the specified content (and give it the specified type).
Return the guid of the /type/content object that represents it.
The returned guid can be used to retrieve the content with
/api/trans/raw.
"""
# This is the URL we POST content to
url = 'https://%s%s'%(self.host, UPLOAD)
# These are the HTTP headers
headers = {
'Content-Type': type,
'X-Metaweb-Request': 'True'
}
# POST the request, parse the response, check for success
# This will raise an exception on failure
response = self._check(self._fetch(url, headers, content))
# Return the id of the uploaded conent
return response['result']['id']
In order to demonstrate the upload() method of Example 6.5 let's continue with our US State Quarter example. The US Mint has images of each of the state quarters on its website. Let's upload those images to sandbox.freebase.com, and make them visible through the /common/topic/image property of each quarter object. Example 6.6 shows how to do this.
In order to understand this code, you have to know that the upload service handles images specially. When you upload images, the /type/content object is also given the type /common/image with various properties filled out. Because your uploaded /type/content is co-typed as /common/image, you can link directly to image content from /common/topic/image.
Example 6.6. quarterpix.py: uploading images to Metaweb
import metaweb # Our metaweb utilities
import urllib2 # For downloading images from the mint server
USERNAME = 'username' # Put your Freebase username and password here
PASSWORD = 'secret'
# The ID for our US State Quarter type depends on our username
TYPEID = '/user/' + USERNAME + '/default_domain/us_state_quarter'
# Create a metaweb.Session object to interact with the sandbox server
sandbox = metaweb.Session("sandbox.freebase.com")
# Make sure we can log in before we go any further
sandbox.login(USERNAME, PASSWORD)
# All the images files are beneath this URL
imagedir = 'http://www.usmint.gov/images/mint_programs/50sq_program/states/'
# This dictionary maps state name to image name.
images = { 'Delaware': 'DE_winner.gif',
'Pennsylvania': 'PA_winner.gif',
'New Jersey': 'NJ_winner.gif',
'Georgia': 'GA_winner.gif',
'Connecticut': 'CT_winner.gif',
'Massachusetts': 'MA_winner.gif'}
# Loop through the states
for state,filename in images.items():
# First, download the image from the Mint's website
image = urllib2.urlopen(imagedir + filename)
type = image.info()['Content-Type']
content = image.read()
# Now upload it to Metaweb
id = sandbox.upload(content, type)
# Define a write query to link the quarter object to the uploaded image
query = { 'type': TYPEID,
'state':state,
'/common/topic/image': { 'id':id, 'connect':'insert' }}
# Submit the query and get the result
result = sandbox.write(query)
# Output the result
print "%s: %s %s" %(state, result['/common/topic/image']['connect'], id)
Once you have run the code in Example 6.6, use the freebase.com client on the sandbox server to view your state quarter objects (refreshing the cache if necessary). You'll see that there are now images on the page.
It is fairly easy to upload an image and make it visible to users of freebase.com. It is a little trickier to do the same for textual content. When you upload a document, a /type/content object is created for that document. This allows the content to be retrieved with /api/trans/raw, but it doesn't allow it to be viewed in any natural way in the client. To accomplish that, you must create a /common/document object to reference the content, and (optionally) a /common/topic object to reference the document. Example 6.7 shows how you can do this.
Example 6.7. uploaddoc.py: uploading HTML documents to Metaweb
import sys, re, metaweb
# Read the content of the file specified on the command line
# It must be an HTML file with a <title>
filename = sys.argv[1]
try:
f = open(filename)
doc = f.read()
f.close()
except Exception, e:
sys.exit(e)
# Search through the document for a title
try: title = re.search("(?i)<title>(.*)</title>", doc).group(1)
except: sys.exit("Document has no title")
# Log in to the sandbox server.
USERNAME = 'username' # Put your own username and password here
PASSWORD = 'secret'
sandbox = metaweb.Session("sandbox.freebase.com");
sandbox.login(USERNAME, PASSWORD)
# Upload the document content to Metaweb.
# Note that we hardcode the text/html content type.
content_id = sandbox.upload(doc, "text/html")
# Submit a MQL write query to create a /common/topic and /common/document
# for the uploaded content
result = sandbox.write({'create':'unless_exists',
'type':'/common/topic',
'id':None,
'name':title,
'article' : { 'create':'unless_exists',
'type':'/common/document',
'id':None,
'content':content_id }})
# Tell the user what we did
print "Uploaded %s: %s\n\tcontent: %s\n\tdocument: %s %s\n\ttopic: %s %s" % (
filename, title, content_id,
result['article']['create'], result['article']['id'],
result['create'], result['id'])
This Python program expects the name of an HTML file as a command-line argument. It reads the file and determines the document title by searching (using a regular expression) for a <title> tag. It uploads the document text with the upload() method of Example 6.5. Then it submits a MQL write query to create a /common/topic that refers to a /common/document that refers to the uploaded content. It uses the document title as the name of /common/topic object.