NNDB Links for People

  1.  
    1. I would like to add about 20,000 links from people to their NNDB pages. To do this I've created the NNDB Profile Page type.

      I've gone through all the people on Freebase and matched them to their NNDB page by name. In the cases where several people share the same name, I simply ignore those pages.

      Would it be appropriate to upload this data to the sandbox?

      1. It would indeed. Please go ahead.

      2. Ok, The complete set of links has been written to the sandbox. Please check out the results and let me know if I can write them to the main database.

      3. The data looks good!

        However, I lied when I said earlier, on the mailing list, that the IMDB Profile Page model was the way to go. We now have the ability to use keys into an external database as a way to generate URIs, which further provides uniqueness checking. I am working on converting our IMDb references into this form. It would be great if you could wait on the final NNDB load and model it that way; I will be happy to show you how once I figure it out myself. (-:

        1. Have you made any progress on this? I've looked at the documentation on enumerations but I can't figure out how to apply it to my model.
      4. Sounds like a great way to model these things. I look forward to learning how to use this new technique.

      5. Oh, yeah! It’s totally done and I forgot to come back here.

        First, I made namespaces (/authority/imdb/film, name, character). Then I made properties that enumerate those namespaces and attached URI templates to them.

        Check out the IMDb profile property on film. It expects Enumeration as its type. Then you have to get a little fancy and switch to the admin view. Set the Enumeration property of the property to the expected namespace. Then co-type the property as a Foreign key property and set a URI template. Set its type to URI Template and fill in both the canonical template (used to generate and recognize URIs) and the other templates (used only for recognition).

        The schema UI will support this at some point… just not yet, as it’s kind of a power-user thing.

      6. Ok, I've created the namespace, I've set up the enumeration property on the NNDB Person type and I've attached a URI template to that property.

        Then I added a sample key to the Paul Newman topic and the NNDB link shows up as expected. Unfortunately, the ID has a forward slash in it which gets escaped and breaks the link. I went back to the Foreign Key Property and explicitly disabled URI encoding but that hasn't fixed it. Any ideas on how to handle this?

        Coincidentally, tsegaran added a foreign key to a NYT page on the same Paul Newman topic and his key also contains forward slashes but he seems to have entered the weblink seperately without using a URI template.

      7. Ah, yes, the char escaping with keys and URI templates... I am in the process of converting the NYT keys to use URI templating, and I also ran into that problem.  There's a bug filed to have the UI behave properly when it encounters escaped URLs - I'll post back when there's a status update to this.

        Toby (tsegaran) actually added a key, and created a discrete weblink (it's not using URI templating).  When I implement the URI templating, I'll be removing the superfluous weblink.

        BTW, good work!

      8. This sounds like simply a bug  - you're right that the NYTimes links got added separately, and they'll probably need to be fixed. It may be too late to get a fix in for next week's release, but I'll try. For my and other's reference, this is CLI-4538 in our bug system.
      9. Ok, thanks guys. I'll watch for CLI-4538 in the release notes.
      10. Looks like everything is working fine now in the new release. Thanks for fixing this.
      11. I've uploaded a new version of the data to the sandbox. If no one has any objections, I'll add it to the main site.

        Is there a limit on how many writes I can do on the main site? Will I be able to make 19,000 writes in a day?

      12. I believe the normal limit is 10,000 writes.  I'll see if I can get your limit increased.
      13. Thanks Brian. The exact count should be 19,619 writes to the API with 2 properties being updated each time.
      14. Any luck getting my limit increased?
      15. I'll try to get it done before the sandbox refresh (Monday PM PST) so you can test against sandbox once more before going live.

      16. Shawn, your limit is now 25K, on sandbox and www.  Happy loading!
      17. Thanks for updating the limit. I ran one more test on sandbox and then uploaded them to www and exceeded my limit. I guess that's 25k combined sandbox & www so I'll upload the rest tomorrow.
      18. I tried running it again today and I exceeded my limit again about half way through. Is it 25k writes or 25k facts? This only happens on www. The sandbox was able to write the whole dataset at once.
      19. Just noticed that the "limit exceeded" error message says that my max_writes is 10,000 per day. I guess www is still using the old limit.
      20. Hmmm, we just recently moved datacenters, and in  the shuffle, your write limit which was previously upped, might have gotten reverted.  I'll check it out and get back to you...
      21. Additionally, are you adding keys to 25K topics, or something less?  Co-typing and adding the NNDB key is 2 primitives, which if you are trying to do for 25K topics, we really need to set your limit to 50K.
      22. I'm adding the NNDB Person type to the topic and adding the key, so if the limit is on the number of primitives rather than the number of writes then I would need a limit of 50k.


    Discussion is posted in:

    Think this discussion also relates to something else? Cross-post it by adding a new discussion area: