Developer Documentation

Developer Tools

Query Editor App Editor Schema Explorer

api/service/geosearch

The api/service/geosearch API can be used to query the geo index in PostGIS. It currently serves three styles of queries:

  1. retrieving a shape for a topic
  2. retrieving shapes in proximity to a topic
  3. counting types of topics in proximity to a topic

Table of Contents

  1. Definitions
  2. Formats Returned
  3. Limiting the Scope of Results
  4. Simplifying the Shapes Returned
  5. Specifying Locations
  6. Retrieving a Shape for a Topic
  7. Finding Topics in Proximity to a Topic or Location
  8. Sorting Results
  9. Counting Results
  10. Performance Considerations

1. Definitions

shape
A shape is defined as any geojson geometry specified at [1] including Point.
topic
A topic or a location is a graph topic with geolocation data such as a longitude/latitude geocode or a geometry blob containing a more complex shape.
topic type
A topic type is a type id, such as /location/citytown, that a topic may be typed with in the graph.

2. Formats Returned

By default, results are returned in geojson format [2]. Other formats supported include json - the typical Metaweb JSON format, kml - to work with Google Earth, kml/maps work with Google Maps, ids - to only return the Freebase ids of matches and guids - to only return the Freebase guids of matches.

format
Use the format parameter to make a selection. Support for more formats, such as shape may be added at a later time.

The geosearch API uses both a geo index and the graph to produce its results. The query is first sent to the geo index which returns the guids of matches along with the geo data these were indexed with. These guids are then to the graph via a MQL query to extract more information such as name, thumbnail image, id, etc... about the matching topics.

mql_output
This MQL query can be overridden with the mql_output parameter to add a filter constraint or to extract more data from the graph, for example.

By default, the MQL query used is:

[ {
    "guid": None,
    "id": None,
    "name": None,
    "type": [],
    "/common/topic/image": [ {
        "guid": None,
        "id": None,
        "type": "/common/image",
        "optional": True,
        "limit": 1,
        "index": None,
        "sort": "index",
        } ]
   } ]

Depending on the format selected, the MQL results are then inserted into the final query results as follows. When the format chosen is json or geojson, the MQL results are inserted via a properties dictionary.

When kml is selected, these MQL results are processed to extract the name, the Freebase URL and the Freebase thumbnail for the topic into the feature's description. The MQL results are also inserted into an <extendeddata> element tree under each <Placemark> as specified at [6]. / characters in MQL key names are converted to - when used in XML element names.

The MQL query step is skipped when mql_output is "null" or when format is ids or guids.

3. Limiting the Scope of Results

The scope of a geo query is something to pay close attention to as such a query can easily return the entire database.

limit
The optional limit parameter, defaulting to 20, can be used to retrieve only up to a number of matches. To run an unlimited query, use limit=0
start
The optional start argument can be specified to return results starting at a given position in the result set.
type
The optional type parameter may be used to return geo data only for topics of a given type. The index only contains information about the direct types of a topic. Included and otherwise inherited types are not indexed.
property
The optional property parameter may be used to return data only for topics which have a value for or are a value of the given property. This parameter accepts multiple comma-separated properties and a topic must match at least one of the properties to be considered a match.
geometry_type
The optional geometry_type parameter may be used to return geo data only for topics that have geometry data of the given geometry type name(s). The valid names are point, multipoint, linestring, multilinestring, polygon and multipolygon.
outer_bounds
The optional outer_bounds parameter may be used to restrict the matches to it. It takes the same style of values are the location parameter.
intersect
The optional intersect parameter may be used to restrict the matches to a set of topics. This parameter may specify a MQL query returning indexed topic guids or a JSON list of guids.
mql_filter
The optional mql_filter parameter may be used to filter the results of the PostGIS query according to some MQL criteria. When used in conjunction with start or limit, the software ensures that that many results are returned, if they exist. If an unlimited query is run, the MQL filter is applied to all PostGIS matches in the database.
as_of_time
The as_of_time parameter may be used together with mql_filter to further constrain the search. The syntax of this parameter is documented in the MQL documentation [8].

4. Simplifying the Shapes Returned

accessor
The optional accessor parameter makes it possible to simplify the returned shape by extracting its bounding box, its convex hull or its shell. It accepts one of envelope, hull or shell, respectively. The shell of a shape is a shape that was enlarged by 0.1 degree before being simplified again with a tolerance of 0.1 degree.
simplify
The optional simplify parameter uses the PostGIS ST_Simplify() function to simplify complex shapes with the Douglas-Peuker algorithm [7]. It takes a a floating point number, a so-called tolerance value, expressed in degrees, which is best explained by reading [7] where one can visualize it. The right tolerance to use depends on the actual geographical size of the shape used in the query.

Douglas-Peuker shape simplification introduces a new trade-off between performance and accuracy. A hull, while less precise, is guaranteed to contain the entire original shape. A simplified shape, on the other hand, may have lost some area. Again, this can be visualized by reading [7].

5. Specifying Locations

All three styles of queries described above normally require that one or more locations be specified via the location or the mql_input parameters.

If neither location or mql_input are specified the queries are run against the whole world; the generated SQL contains no geo constraints and the only supported value for inside is true or 1.

location

The location parameter specifies a single location and accepts a variety of input formats:

  1. An id for a graph topic such as /en/california that was indexed because it had location data.
  2. A guid for a graph topic starting with # that was indexed because it had location data. (Note that # is not a valid URL character and must be encoded as %23).
  3. One or more terms such as Berkeley or San Francisco that are passed to the Lucene relevance server to retrieve the actual topic id. The optional location_type parameter may be specified to help the Lucene relevance query return the desired topic. This parameter defaults to /location/location.
  4. A GEOJSON shape as a dictionary as specified at [1]
  5. A bounding box as a list [x0, y0, x1, y1] where (x0, y0) and (x1, y1) specify two diagonally opposite points of a rectangle. x denotes a longitude and y, a latitude.
mql_input

The mql_input parameter specifies a MQL query to run against the graph to collects the guids of matching topics to query the geo index with. This query may be single or multi cardinality. If a topic was not indexed because it had no location data, the results may be empty or not contain any geo data. For example, the MQL query about all cities named "San Francisco":

[
  {
    "name" : "San Francisco",
    "type" : "/location/citytown"
  }
]

finds five such cities, only one of which has actual geo data and occurs in the geo index.

Please note that both location and mql_input usually take values that must be encoded for URL use. Python's urllib.quote() does the trick:

>>> urllib.quote('[{ "name" : "San Francisco","type" : "/location/citytown" }]')
>>> %5B%7B%20%22name%22%20%3A%20%22San%20Francisco%22%2C%22type%22%20%3A%20%22/location/citytown%22%20%7D%5D

6. Retrieving a shape for a topic

The simplest possible query is to retrieve one or several shapes for a given topic. When several shapes are present for a topic, they're sorted in decreasing order of dimension (see ST_Dimension() function at [3]).

location
For this kind of query, the location parameter can only be an id, a guid or a topic name. The mql_input parameter may be used to retrieve the shapes associated with the matches of a MQL query.
For example:

Retrieving the shapes for San Francisco: geosearch?location=San+Francisco&location_type=/location/citytown

Retrieving the shapes for San Francisco in KML format: geosearch?location=San+Francisco&location_type=/location/citytown&format=kml

Retrieving one shape for San Francisco, typically its outline: geosearch?location=San+Francisco&location_type=/location/citytown&limit=1

Retrieving the bounding box for San Francisco: geosearch?location=San+Francisco&location_type=/location/citytown&limit=1&accessor=envelope

Retrieving the shapes of all the cities called Berkeley: geosearch?mql_input=[{"name":"Berkeley","type":"/location/citytown"}]

7. Finding Topics in Proximity to a Topic or Location

The more complex queries supported involve finding topics in proximity to a given anchor topic or location. Two styles of proximity query are supported:

  1. distance-based queries via the within parameter
  2. containment-based queries via the inside, operator or function parameters

One of these parameters must be specified for a proximity query. If more than one is used, function overrides operator which overrides inside which overrides within.

within
The within parameter accepts a floating pointing number denoting the radius in kilometers searched from the anchor location. When more than one shape is indexed for the anchor topic, the lowest dimension shape, typically a point, is used for this style of query.
inside
The inside parameter accepts one of true or false to search inside or outside the shape of the anchor location. When more than one shape is indexed for the anchor topic, the highest dimension shape, typically a multi polygon, is used for this style of query. This style of query makes no sense when only a point is available for the anchor topic.
Note:
For performance reasons, when inside or within are specified, the actual shapes used to evaluate the geo-constraint are 1/10th degree larger simplified shells of the actual shapes indexed. To use the actual, exact, shapes in the index, use one of operator or function below instead.
operator
The operator parameter makes it possible to control which PostGIS operator is used to compare the bounding boxes of two locations. It accepts one of &<, &>, <<, >>, &<|, |&>, <<|, |>>, ~=, @, ~ or &&. These operators correspond to the PostGIS relationship operators documented at [5]. To negate an operator, the optional negate parameter may be used.
function
The function parameter makes it possible to control which PostGIS function is used to compare two locations. It accepts one of equals, disjoint, intersects, touches, crosses, within, overlaps, covers or coveredBy. These names correspond to the ST_<name> PostGIS relationship functions documented at [4]. To negate a function, the optional negate parameter may be used.
For example:

Finding the restaurants in San Francisco:: geosearch?location=San+Francisco&location_type=/location/citytown&type=/dining/restaurant&inside=true&indent=1

Finding the restaurants in San Francisco and return KML. Enter this URL into http://maps.google.com for a cool rendering of the results: geosearch?location=San+Francisco&location_type=/location/citytown&type=/dining/restaurant&inside=true&format=kml/maps

Finding the restautants within 5 kms of Berkeley and return KML: geosearch?location=/en/berkeley_california&type=/dining/restaurant&within=5&format=kml

8. Sorting Results

order_by
The results of proximity queries may be sorted by relevance (the matching topics' graph link count) or by distance via the optional order_by parameter. By default, the order in which results are returned depends on the query and is determined by PostgreSQL query planner.

If a consistent order is required, but neither relevance nor distance matter, order_by=uid can be used to request the results be sorted in the order they are stored in the PostgreSQL table.

For example:

Finding the 100 most relevant restaurants in California outside of San Francisco, return KML: geosearch?location=San+Francisco&location_type=/location/citytown&type=/dining/restaurant&outer_bounds=/en/california&inside=false&format=kml&limit=100&order_by=relevance

Finding the 100 restaurants in California that are closest to San Francisco but not in it, return KML. Enter this URL into http://maps.google.com for a cool rendering of the results: geosearch?location=San+Francisco&location_type=/location/citytown&type=/dining/restaurant&outer_bounds=/en/california&inside=false&format=kml/maps&limit=100&order_by=distance

When geojson or json is returned, the results include the relevance value or the distance value the query results were ordered by.

9. Counting Results

count
The results of a proximity query may be counted instead of retrieved as location data by using the optional count parameter. The counts represent the number of indexed topics of a given type. When no type is specified, the counts represent the number of indexed topics of all types, sorted in decreasing order:
For example:

Counting the restaurants in California outside of San Francisco:: geosearch?location=San+Francisco&location_type=/location/citytown&type=/dining/restaurant&outer_bounds=/en/california&inside=false&count=1&indent=1

Counting all indexed locations in San Francisco: geosearch?location=San+Francisco&location_type=/location/citytown&inside=true&count=1&indent=1

10. Performance Considerations

When working with complex shapes, containment queries can become quite slow. At the expense of some accuracy, when the inside parameter is used, the hull of shapes is used instead during queries greatly speeding up queries.

When the function or operator parameter is used, the actual complex shape is used instead ensuring an accurate, albeit much slower, query.

Depending on the shape, using its convex hull can have considerable precision drawbacks. In particular, multipolygons become one simple polygon that contains all the in-between parts. For example, the hull around France contains pieces of the Mediterranean, Italy, Switzerland and Germany because of the position of Corsica.

Another performance accuracy trade-off is possible with the optional simplify parameter that triggers the use of the PostGIS ST_Simplify() function to simplify complex shapes with the Douglas-Peuker algorithm [7]. It takes a floating point number, a so-called tolerance value, expressed in degrees, which is best explained by reading [7] where one can visualize its meaning. The right tolerance to use depends on the actual geographical size of the shape used in the query.

Douglas-Peuker shape simplification introduces a new trade-off between performance and accuracy. A hull, while less precise, is guaranteed to contain the entire original shape. A simplified shape, on the other hand, may have lost some area. Again, this can be visualized by reading [7].

[1](1, 2) http://wiki.geojson.org/GeoJSON_draft_version_6#Geometries
[2]http://wiki.geojson.org/GeoJSON_draft_version_6
[3]http://postgis.refractions.net/docs/ch06.html#id2595672
[4]http://postgis.refractions.net/docs/ch06.html#id2594839
[5]http://postgis.refractions.net/docs/ch06.html#id2597154
[6]http://code.google.com/apis/kml/documentation/extendeddata.html#opaquedata
[7](1, 2, 3, 4, 5, 6) http://marblemice.com/2007/09/12/douglas-peuker-line-simplification-explained/
[8]http://mql.freebaseapps.com/ch02.html#typedatetime