I am using Freebase data to perform entity resolution / reference reconciliation-type tasks, and having a probabilistic type "hierarchy" would be useful. Whereas Freebase does not impose a strict hierarchy / dag form on types, in practice there is a sort of probabilistic type "hierarchy" (digraph really).For example, topics typed "Mountain" are often but not always also typed as "Location". Thus, P(Location | Mountain) may be high whereas P(Mount...
more
I am using Freebase data to perform entity resolution / reference reconciliation-type tasks, and having a probabilistic type "hierarchy" would be useful. Whereas Freebase does not impose a strict hierarchy / dag form on types, in practice there is a sort of probabilistic type "hierarchy" (digraph really).
For example, topics typed "Mountain" are often but not always also typed as "Location". Thus, P(Location | Mountain) may be high whereas P(Mountain | Location) may be low.
This project has two parts. First, compute (or approximate/estimate?) P(X | Y) for all (most) X, Y where X, Y are (each one of 5000+) Types in Freebase. This is trivial if X is fixed, but how do we quickly do this for all X,Y pairs in FB?
Given this (sparsely sampled?) matrix of probabilities, then, how do we filter a meaningful subset for programmatic use and/or visual display? Given the Bayesian type probability matrix, produce a useful (graph-based?) visualization.
I've done this already for P(Person | Y) for all Y, but am curious (a) how to do this efficiently / elegantly :) (b) what the schema looks like and (c) how might we then filter and then see this data in a visual form!
less