Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freebase API to be retired #38

Open
thatandromeda opened this issue Dec 26, 2014 · 30 comments
Open

Freebase API to be retired #38

thatandromeda opened this issue Dec 26, 2014 · 30 comments

Comments

@thatandromeda
Copy link
Member

Google's retiring the Freebase API on 30 June 2015. Parts of this code depend on Freebase. What's the fallback?

@edsu
Copy link
Member

edsu commented May 17, 2015

Thanks @thatandromeda it really looks like this really is happening June 30th. Wikidata have it on their roadmap to provide a Wikidata Suggest type of service. But who knows if it will be ready in time. Some work that needs to be done:

  • adjust schema to use wikidata ids
  • port over current freebase ids to wikidata ids
  • adjust curation interface to use wikidata instead of freebase

@edsu
Copy link
Member

edsu commented May 17, 2015

This API call is being used by Wikidata's search, and seems to have the basics of what we would need in the UI to select employers and tags.

https://www.wikidata.org/w/api.php?action=wbsearchentities&search=encyc&format=json&language=en&type=item&continue=0

There is a JSON-P callback to allow it to be used, to maybe help get around cross-origin requests (JavaScript from jobs.code4lib.org that wants to talk to wikidata.org).

https://www.wikidata.org/w/api.php?action=wbsearchentities&search=encyc&format=json&language=en&type=item&continue=0&callback=foo

@edsu
Copy link
Member

edsu commented May 17, 2015

One possible way to map our Freebase ids to WikiData ids. https://gist.github.com/edsu/c95c9ae9f60ecdf80077

@tfmorris
Copy link

Google has said that the shutdown will be delayed. I'm pretty sure it was mentioned on the Freebase mailing list, but I can't find the thread right now. If you look at the Wikidata Freebase project page, you'll see the same info:

  • In Q2 2015, a new KG-based Google API will be launched
  • Earliest three months later, the Freebase website will close (planned for Q3 2015)

Because we're already inside the three month window for June 30, the API retirement won't be happening then.

I'd suggest deferring planning of your migration strategy until things are a little clearer, but here are a few random thoughts:

  • Wikidata includes Freebase IDs, so the need to migrate away from them isn't urgent
  • WDQ is experimental and the Wikidataians are debating what the "real" query API will look like
  • There is no good Wikidata-based Freebase Suggest replacement yet, as far as I know
  • Google has said that they'll be making available Knowledge Graph replacements for Freebase Search & Freebase Suggest, but hasn't published the transition plan (which they said would be available by the end of March, 2015)

The whole thing is kind of a mess, but it seems unlikely that Freebase will get shut down without a fair amount of notice, so I'd hold off committing to a transition plan until both Wikidata and Google firm up their plans.

If/when you need to map Freebase IDs to Wikidata IDs, this bulk dump might be easier to use than an API.

@edsu
Copy link
Member

edsu commented May 18, 2015

Thanks for those details @tfmorris ; I didn't know that the announcement on the Freebase website was out of date. Still, I think it should be doable to use the wbsearchentities API call to do the suggest portion, and to use WDQ as a temporary way to turn a few thousand Freebase IDs into WikiData IDs. I'd like to rip this bandaid off now rather than wait, but we'll see since I'm the only person actually maintaining shortimer at this point, and I have other things contending for my attention.

@tfmorris
Copy link

@edsu - I think it's early days still for Wikidata and I have concerns about performance and stability of the API, but it's your call. I'd be happy to generate the ID mapping table for you, if that helps.

At a DPLA Hackathon a few years ago, we hacked up Freebase Suggest to work with the DPLA API. You might consider doing something similar for Wikidata. Suggest is actually one of the nicer autocomplete widgets out there (in my opinion).

https://github.com/scande3/dpla-discovery
http://static.digitalcommonwealth.org/dpla-discovery/

I don't know if you constrain your Suggest searches by type, etc, but if you're using the Freebase schema at all (types or properties), mapping to the Wikidata schema is another task that needs to be added to the list.

@edsu
Copy link
Member

edsu commented May 18, 2015

The API may change, but it's hard to imagine it going away entirely after all the integration work that has gone on at Wikimedia. I'm ok with things changing -- in fact that's the best situation, because it means the service isn't dying, and people are working on it. Alas, the writing is definitely on the wall for Freebase.

The suggestions are constrained by type in a few places in shortimer: by employer and location. I see that wbsearchentities has a type parameter that could be used similarly, maybe. If a mapping of types/properties is put together that would be very useful. I think I will be OK with mapping the IDs, but I will be in touch if it gets tricky.

@edsu
Copy link
Member

edsu commented Dec 23, 2015

It looks like there may be a path forward using the Google Knowledge Graph, which now has an API and they are planning on adding a suggest widget, similar to the one Freebase offers, and which is so important to the workflow here in shortimer.

Apparently even the freebase identifiers are being used, so there may not be a whole lot of cleanup work that needs to happen in the shortimer database. I think I would prefer to use Wikidata on principle, but it may be easier to transition to the Knowledge Graph.

@tfmorris
Copy link

I think using the KG Suggest is the right call. The KG Search API is much less powerful than old Freebase Search API, but it should be fine for this application. The Wikidata Refine Reconciliation Service uses the websearchentities followed by WDQ/SPARQL approach internally and it doesn't appear to me that the search is very robust.

One of the things that I've got on my (long) list of spare time projects is to improve the coverage of matching for Freebase<->Wikidata mappings, which will help provide an escape path if it's needed in the future (plus having the Wikidata reconciliation service for OpenRefine should help with these types of mapping tasks).

BTW, the beta SPARQL endpoint is much faster than the experimental WDQ API, and the data is more current, if you ever have a need to query Wikidata.

@tfmorris
Copy link

p.s. My interpretation is that the 3 month clock doesn't start until the KG Suggest API is available too, so there's still some time...

@edsu
Copy link
Member

edsu commented Dec 23, 2015

@tfmorris thanks for your comments. If you notice KG Suggest get announced and remember this issue it would be really helpful if you can add a note here. I feel like I only accidentally noticed the KG API announcement!

@edsu
Copy link
Member

edsu commented Jan 27, 2016

In preparation for the shortimer db should be updated to store the Freebase Machine ID or mid instead of the id that comes back from the suggest API. This will involve looking them up again.

@tfmorris
Copy link

Freebase switched to MIDs for most purposes a while ago, so you may find that the IDs coming back from the Suggest API were MIDs already.

If you have historical /en/... IDs, you can look up the MID with this query:

https://www.googleapis.com/freebase/v1/mqlread/?lang=%2Flang%2Fen&query=%5B%7B+%22id%22%3A+%22%2Fen%2Fharvard_university%22%2C+%22mid%22%3A+null+%7D%5D

Replace the (encoded) /en/harvard_university with the link that you want to look up. If you've got a list of IDs, I'd be happy to look them up for you and generate a crosswalk.

BTW, haven't heard anything additional on shutdown timeframes...

edsu added a commit that referenced this issue Jan 28, 2016
over time the db has accumulated a fair number of subjects and
employers with duplicate names, which causes problems for
views that use a slugified version of the name.

this commit tightens up the lookups to use the freebase id
and also includes a new command line utility to help diagnose
and correct these duplicates.

refs #38
@edsu
Copy link
Member

edsu commented Jan 28, 2016

@tfmorris thanks for the update! I did get the database converted over to the mids. I looked them up by resolving URLs like:

https://www.googleapis.com/freebase/v1/topic/{freebase_id}

which seemed to work pretty well still...

@edsu
Copy link
Member

edsu commented May 30, 2016

It looks like the new Knowledge Graph Search Widget is available. Also some of the old Freebase API calls are starting to fail now, for example getting the location for an organization.

@edsu
Copy link
Member

edsu commented Sep 10, 2016

Well, now the old Freebase APIs for looking up Employers and Locations are dead. So people can't enter in new jobs. I guess it would be good to move over to the Knowledge Graph API now ;-)

@edsu
Copy link
Member

edsu commented Sep 10, 2016

@tfmorris @danbri do you happen know (or know someone who might know) why topical things like "Semantic Web" don't show up in the Knowledge Graph Search Widget? I get lots of books but not the topic. I even tried with a Search API call to see if I could find the topic in there, but I couldn't find it in 200 results.

Using the JSON-LD context I can see that Google have URIs for entities which is cool. So I can easily turn the old Freebase IDs into Knowledge Graph URIs. For example here's the URI for Semantic Web:

https://g.co/kg/m/076k0

So I can see the entity "Semantic Web" is in the Knowledge Graph, but how can I get the search widget to return it? Would one of the available entity types work?

@edsu
Copy link
Member

edsu commented Sep 10, 2016

Maybe this is the push I need to move over to using Wikidata....

@danbri
Copy link

danbri commented Sep 10, 2016

I don't know but I'll see what I can find out

@danbri
Copy link

danbri commented Sep 10, 2016

(and +1 for Wikidata, regardless)

@danbri
Copy link

danbri commented Sep 10, 2016

From a quick guess, is it only returning entities whose types are in https://developers.google.com/knowledge-graph/ (and mapped there to schema.org)?

@edsu
Copy link
Member

edsu commented Sep 11, 2016

Hmm, that does seem to be the case? Here are the types returned in the first 200 results when searching for 'semantic web' from the search API:

% curl --silent 'https://kgsearch.googleapis.com/v1/entities:search?query=semantic+web&key=AIzaSyDnh2jo5mhnf1EyIs2VQwc9H_bq1_RAgsE&limit=200&indent=True' | jq -r '.itemListElement[].result["@type"][]' - | sort | uniq -c | sort -rn
 124 Thing
  64 Person
  26 Organization
  21 Corporation
  20 Book
   8 Place
   4 EducationalOrganization
   3 CollegeOrUniversity
   1 Movie
   1 CivicStructure
   1 BookSeries
   1 AdministrativeArea

Unfortunately it seems like a lot of terms used to tag jobs in shortimer are rendered invisible in the KG search api ...

edsu added a commit that referenced this issue Sep 22, 2016
this is step one in moving form freebase to wikidata. I added wikidata_id
to the Employer, Location and Subject models. Then I added a migration to
lookup the existing entities in Wikidata using Wikidata's SPARQL endpoint.
The matching logic thus far is:

1. Look up entity using the Freebase ID
2. Use the name of the entity to derive the Wikipedia URL and look that up
3. To search for the label

The next step is to purge entities that don't have Wikidata IDs, and then
to create new suggest functionality that uses Wikidata instead of Freebase.

refs #38
refs #57
@edsu
Copy link
Member

edsu commented Sep 22, 2016

I've been doing some preliminary work trying to migrate things to Wikidata. If you are interested you can track the work over on the wikidata branch.

@edsu
Copy link
Member

edsu commented Sep 25, 2016

WIkidata does offer an autosuggest API interface but it doesn't allow you to limit by particular entity types (locations, organizations, etc). This leads to a lot of noise when looking things up. I also tried using the SPARQL endpoint with regex filters, but it seemed very unstable. There were lots of 502 errors. Perhaps that was just something else going on at the time, but it doesn't lend much confidence as a foundation for building on.

Actually, it does look like other people were experiencing problems.

@edsu
Copy link
Member

edsu commented Oct 4, 2016

So, even with the Wikidata SPARQL endpoint back to functioning normally it still can take multiple seconds for regex queries (what is needed for autosuggest) to come back. Unfortunately this won't be good enough. The wbsearchentities API call is fast, but it doesn't return back much information, and can't be limited to entities of a particular type (Locations, Organizations, etc).

So, my current thinking is to use the entities that have already been collected in jobs.code4lib.org and run autosuggest against them, and let people enter new entities as needed. This will have the downside that they aren't mapped to Google Knowledge Graph or Wikidata, but I just don't have the cycles to do that at the moment...and the site risks dying completely if it's not possible to post new jobs.

@sprater
Copy link

sprater commented Oct 12, 2016

Could the Geonames service be used to look up institutions and locations? It has a rich and snappy API, and support for linked data: http://www.geonames.org/

@edsu
Copy link
Member

edsu commented Oct 18, 2016

It could, but that's only part of the puzzle. Unfortunately I don't have the bandwidth to fully address this problem. I'm planning on shutting the site down on November 1st after making static snapshots of the data and website available on Internet Archive.

@darvid7
Copy link

darvid7 commented Aug 16, 2018

Hi! Sorry to ping this thread. I came across this trying to figure out how to map freebase MIDs to their entities without downloading and searching the 200gb data dump.
Does anyone know if the Google Knowledge Graph API contains MIDs in freebase and if it can be queried using freebase MIDs?
Thanks!

@danbri
Copy link

danbri commented Aug 16, 2018 via email

@tfmorris
Copy link

tfmorris commented Oct 7, 2018

A late reply to the late question (I apparently had this accidentally muted - yay, gmail keyboard shortcuts).

The Freebase MIDs were retained in the Google Knowledge graph and can be used for lookups. The /g IDs (as opposed to the /m IDs which are MIDs) post-date Freebase. As @danbri mentioned, some of them have been mapped to Wikidata entities, but only a small fraction of them. The Google Knowledge Graph will have many more (but the mapping to Wikidata is potentially more useful, if it exists).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants