to get (image) statistics
-
-
I want to have some basic statistics of images, e.g. how many images there are in freebase, how many objects or topics have images. Now my query is like:
{"type":"/common/image",
"id":null,
"limit":10000
}
and then parse result to see how many are returned. Then I change "limit" and see when it breaks. I wonder if there's any more systematic way of doing this? Thanks!
-
The result times out because there is a large amount of /common/image topics. Usually, the way to request counts is:
{
"id" : null,
"return" : "count",
"type" : "/common/image"
}But this will still time out on requests with a large amount of topics. You could query for the instance count which is tallied nightly:
{
"/freebase/type_profile/instance_count" : null,
"id" : "/common/image"
}but that isn't real-time.
Lastly, you could count yourself by using a cursor and counting until the end of the result set:
{
"cursor": true,
"query": [{
"id" : null,
"type" : "/common/image"
}]
}To find topics that have an image, you can query for them like so:
[
{
"id" : null,
"image" : [
{
"id" : null
}
],
"type" : "/common/topic"
}
]Again, the resultset is very large and will time out, so you will need to cursor and count yourself.
-
Thanks, that's really helpful!
Now if i'm not mistaken there're about 700K images and 330K topics with image. Does that imply that on average those topics with images each has 2 images? Thanks.
-
No, not necessarily - a /common/image topic may not be linked to a /common/topic topic (for whatever reason). You could use the last query to find out how many images each /common/topic topic has by counting the ids in the "image" clause.
-
Thank you. Then if a /common/image object is not linked to any /common/topic object, why is it there and how can it be presented to end users? Thanks.
-
You can just break the link between an image and a topic and not have the image deleted. Just for fun, I ran a quick analysis:
Found 716008 /common/topic topics with image links (out of ~4 million topics)
Note that more than one topic can be linked to the same image.
650326 have one img
63058 have two imgs
2296 have three imgs
328 have four or more imgs -
Thank you. It seems I don't know enough of MQL language; let me know if there's reference other than this website for I don't want to bother you like this.
I use the following code embedded in a perl code to count the number of topics with at least one image, and the result is about 438K--much lower than yours. By the way how to get the number of topics with exactly 1, 2, or 3 images? Can this be requested directly in the query?
{
"qname":{
"cursor":"query":[{
"id":null,
"limit":1000,
"type":"/common/topic",
"image":[{
"id":null
}]
}]
}
} -
Some MQL reference material can be found here. I iterated through topics with images and then recorded the number of links there were. Currently, you can't constrain the query to ask it to return topics with exactly n amount of images. I'm not sure why your results differ so greatly that mine - I'll try to investigate a little more.
-
Thanks. I counted the number of iteration for the code above and multiply it by 1000 (since I set the limit 1000) to get approximate image count.
-
Well, I got essentially the same results as reported above using metaweb.py. Are you using your own Perl module, or one from our API library?
-
The following is the code I used. Have a look only if you're free. I print out $count to see number of topics with images and $count~438
Afterwards I grep entire query result and got numbers close to you:
692001 topics with image
626805 topics with 1 image
62827 topics with 2 images
2210 topcis with 3 images
and so on...
#!/usr/bin/perl
use URI::Escape;
# This module provides the uri_escape function used below
# Build the Metaweb query, using string manipulation
# CAUTION: the use of string manipulation here makes this script vulnerable
# to MQL injection attacks when the command-line argument includes JSON.
#$band = $ARGV[0]; # This is the band or musician whose albums are to be listed
my $cursor = 'true';
my $query;
my $pre =
'{
"qname":{
"cursor":';
my $post = ',
"query":[{
"id":null,
"limit":1000,
"type":"/common/topic",
"image":[{
"return":"count",
"id":null
}]
}]
}
}';
my $allresult = '';
my $count = 0;
my $result = '';
until($cursor =~ /error/){
#while($cursor){
$query = $pre . $cursor . $post;
$escaped = uri_escape($query);
$baseurl='http://www.freebase.com/api/service/mqlread'; # Base URL for queries
$url = $baseurl . "?queries=" . $escaped;
$auth = 'metaweb-user=###Enter Your cookie data here###';
#$result = `curl -s --cookie \'$auth\' $url`;
$result = `curl -s $url`;
$result =~ /\/api\/status\/(.*)/;
$status = $1;
$result =~ /cursor\":\s*(\".*\")/;
$cursor = $1;
$count = $count + 1;
print $result . "\n";
}
# Use regular expressions to extract the album list from the HTTP response
#$result =~ s/^.*"album"\s*:\s*\[\s*([^\]]*)\].*$/$1/s;
#$result =~ s/[ \t]*"[ \t,]*//g;
#print $allresult ."\n";
print 'done'."\n";
# Finally, display the list of albums
#print "$result\n";
#
# vim: ts=2 sw=2 -
I suspect /api/service/mqlread is returning a timeout or error. My Perl is rusty, but the until loop doesn't look like it's checking for a 200 OK return response; it just looks like if you don't find a cursor, you assume the counting is complete and exit.
-
Hi parity, minor point: you don't need to send your authentication cookie for doing mqlread (only for mqlwrite).
-