Skip to content
This repository has been archived by the owner on Nov 10, 2022. It is now read-only.

Reconcile by coordinates #101

Open
wetneb opened this issue Mar 3, 2021 · 5 comments
Open

Reconcile by coordinates #101

wetneb opened this issue Mar 3, 2021 · 5 comments

Comments

@wetneb
Copy link
Owner

wetneb commented Mar 3, 2021

Posted by @VojtechDostal at OpenRefine/OpenRefine#3663:

Reconciliation by string matching is useful in many cases, but it is currently (to my knowledge) impossible to find closest items to the matched object.
Proposed solution

Use case: I have a list of buildings with coordinates (lat,lon). I'd like to find what the closest item(s) to those coordinates are. Additionally I'd like to be able to filter out results by class (subclass of: building) and suggest only these. High-confidence matches (very close and corresponding names) could be auto-matched.
Alternatives considered

I don't know of any alternative way/hack to load the closest item to given coordinates. However, the Wikidata SPARQL service has a distance service and I think there is also a special API call for exactly this.

@gitonthescene
Copy link

gitonthescene commented Apr 7, 2021

FWIW, if you don't mind running your own reconciliation service, I've just written a geo scoring plugin for csv-reconcile.

With this you could, say run a SPARQL query to find coordinate locations of points you're looking to match against, export that as a TSV file and use that to run csv-reconcile.

You can get the service up and running as simply as the following:

$ python -m venv serverenv
$ source serverenv/bin/activate
$ python -m pip install csv-reconcile
$ python -m pip install csv-reconcile-geo
$ csv-reconcile --init-db query.tsv item coord --scorer geo 

Here item is the name of the column containing the QID's and coord is the name of the coordinate column in well-known text format, the default export format for coordinates.

This was just my first pass at it. There's certainly room for improvement, but it may suit your immediate needs.

@VojtechDostal
Copy link

@gitonthescene Sounds great! I'll give it a shot at the first opportunity

@VojtechDostal
Copy link

VojtechDostal commented Apr 13, 2021

FWIW, if you don't mind running your own reconciliation service, I've just written a geo scoring plugin for csv-reconcile.

With this you could, say run a SPARQL query to find coordinate locations of points you're looking to match against, export that as a TSV file and use that to run csv-reconcile.

You can get the service up and running as simply as the following:

$ python -m venv serverenv
$ source serverenv/bin/activate
$ python -m pip install csv-reconcile
$ python -m pip install csv-reconcile-geo
$ csv-reconcile --init-db query.tsv item coord --scorer geo 

Here item is the name of the column containing the QID's and coord is the name of the coordinate column in well-known text format, the default export format for coordinates.

This was just my first pass at it. There's certainly room for improvement, but it may suit your immediate needs.

@gitonthescene Please could you assist me with this? I am a bit disoriented and I am not sure if I understand the overall idea of 'my own' reconciliation service correctly. Am I right in assuming that I need to load File number 1 into openrefine, load File number 2 into command line via the commands above, add a reconciliation service "http://127.0.0.1:5000/reconcile" to OpenRefine and reconcile?

I think I was able to start virtualenv on my system (I am on Windows and "source" did not work, but I think I was able to find a solution at https://stackoverflow.com/questions/8921188/issue-with-virtualenv-cannot-activate) and then I was able to install csv-reconcile and csv-reconcile-geo. However, this is what I get when I run the program:

(venv) C:\Users\vojte\Downloads>csv-reconcile --init-db query.tsv item coord --scorer geo
c:\users\vojte\venv\lib\site-packages\normality\__init__.py:72: ICUWarning: Install 'pyicu' for better text transliteration.
  text = ascii_text(text)
Traceback (most recent call last):
  File "C:\Users\vojte\AppData\Local\Programs\Python\Python37-32\Lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\vojte\AppData\Local\Programs\Python\Python37-32\Lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\vojte\venv\Scripts\csv-reconcile.exe\__main__.py", line 7, in <module>
  File "c:\users\vojte\venv\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\vojte\venv\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "c:\users\vojte\venv\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\vojte\venv\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "c:\users\vojte\venv\lib\site-packages\csv_reconcile\__init__.py", line 210, in main
    initdb.init_db()
  File "c:\users\vojte\venv\lib\site-packages\csv_reconcile\initdb.py", line 76, in init_db
    (mid, word) + tuple(matchFields))
sqlite3.IntegrityError: UNIQUE constraint failed: reconcile.id
sqlite3.IntegrityError: UNIQUE constraint failed: reconcile.id

My query.tsv is from https://w.wiki/3BV9

What do you think is happening? Sorry to spam the issue with my questions

@wetneb
Copy link
Owner Author

wetneb commented Apr 13, 2021

Perhaps this discussion could be moved to the csv-reconcile project? Unrelated discussions might put people off :)

@VojtechDostal
Copy link

created as new issue here: gitonthescene/csv-reconcile#3
sorry for this @wetneb :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants