Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate internal country database using sementic web #14

Open
cgendreau opened this issue May 16, 2013 · 4 comments
Open

Populate internal country database using sementic web #14

cgendreau opened this issue May 16, 2013 · 4 comments

Comments

@cgendreau
Copy link
Contributor

Would be interesting to expand the narwhal to be able to build an up-to-date and well-maintained knowledge base of country names, their alternative representations (possibly multilingual) and mappings to known misspellings using linked open data (semantic Web).

This could be done using a semantic Web URI.
Something like : http://dbpedia.org/page/Category:Member_states_of_the_United_Nations

A country could than be identified with a URI such as http://dbpedia.org/resource/Canada
The name of a country in different languages could populated using "owl:sameAs".
The known misspellings could be handle using SKOS.

For performance reasons, we'd like this thesaurus to be embedded in the library, but with the capacity to be periodically refreshed with data pulled from external resources (like it's currently the case through the gbif-parser).

Benefits:

  • Different labeling used for this concept (see rdfs:label
    http://dbpedia.org/page/Canada) in different languages.
  • Recognize a country name in a different language vs a typo to not report
    country name in different languages as error
  • Information about where it is, without any geopetial query, (ex.
    continent, hemispere)
  • Opens the door for validation using the date. (think Russia, USSR)
  • Use semantic web standards allowing biodiversity application to benefits from it in a near future.
  • Same concept can be expanded to states, provinces and municipalities
@rdmpage
Copy link

rdmpage commented May 17, 2013

Have you had a look at GeoNames ? Lots of Semantic Web goodness if that's your thing, see http://www.geonames.org/ontology/documentation.html

@tucotuco
Copy link

As sources for names and synonyms, there are also The Getty Thesaurus of Geographic Names (http://www.getty.edu/vow/TGNSearchPage.jsp), and GADM (http://www.gadm.org/).

For misspellings, I have accumulated nearly 5000 variants on values mapped to the Darwin Core term country and have provided the corresponding ISO 3166-2 country code for all of the ones for which that is possible. This list is growing as we pass additional data through validation for VertNet.

@peterdesmet
Copy link
Member

Just stumbled upon this tool: http://okfnlabs.org/blog/2013/05/16/nomenklatura-matching-service-reconciliation-made-easy.html Might be of help here.

@cgendreau
Copy link
Contributor Author

I think it is worth mentioning : http://community.gbif.org/pg/file/read/34059/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants