Discussions on rictic
Start a New Discussion
-
-
Hi Raymond --
Great project -- definitely fits in Vivek Kundra and the trends towards "Open Government"!
Can you give us a sense of what you'd like to do at Hack Day? It looks like you may need some data loaded. Peter Burns (aka rictic) will be showing off his new spreadsheet loader, which might be a help ...
Brian
-
i've heard theres talk of a general chart view? which would let things like this be done in the client.
though for now, sounds like a mash with the geneology viewer? or this google api?
-
Thanks for posting here. I'm writing up a longer post right now, but let me list a few things I'd love help with:
1) to do the reconciliation of governement agenices to Freebase, I built a primitive acre app to help me apply Freebase suggest on a lot of items: http://suggest2reconcile.freebaseapps.com/ -- see source: http://acre.freebase.com/#app=/user/rdhyee/suggest2reconcile&file=index and a background writeup of the idea: http://lists.freebase.com/pipermail/developers/2009-June/003014.html Refining this app would be very useful!
2) as part of the reconciliation process, coming up with a good way to figure out from the suggest API whether a given suggestion is given with high confidence or not would be helpful. Tom Morris has some ideas in http://lists.freebase.com/pipermail/developers/2009-June/003015.html
3) writing the data back from the reconciliation would be very useful. The data behind http://labs.dataunbound.com/doc/2009/06/govt.treeview.v0.1.html is http://labs.dataunbound.com/doc/2009/06/OMB_A_11_C_reconciled.v0.1.xml -- how to model the OMB codes and apply them to the government agenices in Freebase? How about the entitites I couldn't find Freebase -- should we create new entities for them?
4) Re what Spencer wrote: yes, I'd love to see someone come up with a better visualization than what I have at http://labs.dataunbound.com/doc/2009/06/govt.treeview.v0.1.html -- especially if there is a generic viewer.
More later...but I hope this helps.
-
In very* alpha stage right at this moment is my generic treeviewer for any phylogeny pattern - the sweet animated visualization courtesy of the Javascript InfoVis toolkit http://thejit.org/
*I'm learning javascript as I go along
-
For the reconciliation process you can work with the algorithms or the data or both. The advantage to cleaning up the data is that it will probably be useful for other users as well. For example, the United States Interagency Council on Homelessness is listed under Interagency Council on Homelessness with no alias for its official name. You could make the name matcher more clever by trying combinations of U.S., US, United States, or empty prefixes, but the next person, including humans looking up by hand will just have the same problem. By adding an alias of United States Interagency Council on Homelessness to Freebase, you (hopefully) take care of the problem once and for all. You probably want to consistently make the topic name either the official agency name or the common name and use the other as the alias.
For missing agencies, I'd say yes, definitely add them if you've reasonably sure they're really missing. If you're mistaken, they can always be merged back together.
What's the scale of the problem here? If there are only a few hundred agencies, it's probably just easier to grind things out by hand rather than spending lots of time programming (unless it's a learning exercise that will be useful in other contexts).
-
Adding aliases is a great idea -- thanks!
The immediate scale is several hundreds of agencies but I'm hoping to expand the tree down to more levels in the federal government as well connect other entities. Moreover, I'd like to apply these techniques to other programs -- do oodles hand-reconciliation of stuff to Freebase looses it charms rather quickly!
-
-
-
Hey, I liked your demo at the FUG the other day. I'm trying reconciliation myself, and I just recommended it to someone else on this thread:
http://sourceforge.net/mailarchive/message.php?msg_id=480C9957.7060404%40monkeyhelper.com
-
SF apparently doesn't include my messages in their archive, so try here instead
-
Thanks Drew!
I didn't have a good way of linking to this at the UG, but the full documentation is here: http://www.freebase.com/view/guid/9202a8c04000641f8000000007beed56
The service is now running on the main freebase.com site: http://www.freebase.com/dataserver/reconciliation/ so with that out of the way, expect a blog post about it soon.
I'd love any feedback about the service and how well it performs for your data. :)
-
-
-
Hi Peter. I'm curious why you changed the Scientific Name for Dog to just "familiaris". All the other subspecies have the full name. I changed it back to "Canis lupus familiaris".
-
Hey Jeff,
I'm unfamiliar with scientific naming schemes for organism classifications, so I'm sure I'm probably wrong. My thought was that it aided in composability of scientific names. It sortof violated some database normalization impulse of mine; we've already captured that Dogs are a subspecies of lupus (itself a species in the genus Canis), so having the scientific name for species and subspecies include that redundant data could make it more difficult to work with scientific names programmatically. It also fell in line with the usage of the Scientific Name field of higher-level Organism Classifications.
If that's not how it should be, I think I messed up pretty much all of the species and subspecies under Caninae... -
jg is planning a big data load of species/genus/family etc. from Wikipedia, cross-correlated with other species databases. This will fill in all the empty fields. You can check out the Organism Classification discussion.
The data load will use the standard full names like "Canis lupus familiaris". If you want to make a case for the truncated names, you might want to post it there. But I think it's pretty standard to show the full names. See, for example:
Wikispecies
ITIS
One might argue that if your database is smart enough to know that the species name "lupus" should be concatenated with the parent "canis" to get "canis lupus" (but that "canis" should not be concatenated with its parent "canidae" to get "canidae canis") then it would be smart enough to extract out the "lupus" from "canis lupus" if that's all you want.
-