Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filtering #140

Open
VicXue opened this issue Jul 11, 2019 · 11 comments
Open

Filtering #140

VicXue opened this issue Jul 11, 2019 · 11 comments

Comments

@VicXue
Copy link

VicXue commented Jul 11, 2019

Hi,

I'm currently working on a project that requires me to retrieve tree area in aerial images. I found this project really promising. However, I also found out some masked areas in the openstreetmap contain false-positives. For example, mask data imported from LINZ's topographic maps are usually shifted from actual tree areas. I'm wondering with current utility of the config.json file, is there anyway to filter such data out of my training dataset as I don't want too much FP?

Here is my current config file:
"country": "new_zealand", "bounding_box": [175.1848,-37.8868,175.4386,-37.6892], "zoom": 18, "classes": [ { "name": "Woods", "filter": ["==", "natural", "wood"] } ], "imagery": "http://a.tiles.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.jpg?access_token=MY_ACCESS_TOKEN", "background_ratio": 1, "ml_type": "segmentation"

Thanks

@drewbo
Copy link
Contributor

drewbo commented Jul 11, 2019

If you need to make manual changes to the vector data prior to creating the raster label masks, I'd recommend a slightly different workflow:

osmium extract -p aoi.geojson new-zealand-latest.osm.pbf -o my_aoi.pbf
  • use osmium again to filter to just the tags you'd like:
osmium tags-filter my_aoi.pbf w/natural=wood -o woods.pbf
  • convert the pbf to GeoJSON with ogr2ogr:
ogr2ogr -f GeoJSON woods.geojson woods.pbf polygons
  • use your favorite tool to edit the GeoJSON (I'm fond of http://geojson.io)
  • save the edited GeoJSON as use that as the input file to label-maker with the geojson parameter in config.json

Sorry if that's a lot of steps! And let me know if this works out. This would be a nice example to feature if you get the workflow working

@VicXue
Copy link
Author

VicXue commented Jul 11, 2019

Thanks for the hint. A few quick questions before I spend time on that. Just want to make things clear, since I'm not really familiar with this topic.

  1. This workflow sounds like it is all about manual changes, is there any automatic way to filter out masks generated by a specific contributor? Or can we utilize tag information in openstreetmap to filter out certain features (masks)?
    tags

  2. What is the major performance gain for this workflow in comparison to just making changes on openstreet map directly using their web editor?

Thank you

@drewbo
Copy link
Contributor

drewbo commented Jul 11, 2019

You can use osmium tag-filter feature to filter on any tag if you have that information available.

I'd only make the change directly in OSM if (1) these are actually inaccurate labels (not just false positives for your specific task) and (2) you can wait a few days for it to be available in either the OSM QA tiles or a pbf extract

@VicXue
Copy link
Author

VicXue commented Jul 15, 2019

@drewbo Hi Drew, I have tried to modify my personal dataset using your workflow. There are a few things that I want to mention and ask.

  1. Thank you for recommending the osmium-tool. It is very useful for building a personal dataset, especially when you want to filter data by tags.

  2. I want to filter data by contributors and I found 'osmium changeset-filter' can be used for this purpose, as it offer a -u option for filtering changeset by contributor's username. However, it seems the pbf file I downloaded https://download.geofabrik.de/australia-oceania/new-zealand.html does not contain metadata like username. Is there any alternative option in order to filter data by using username?

  3. geojson.io seems like the popular option for editing geojson files. However, I'm not sure why it tries to make all multipolygons ready for changes when a user click the edit button. When, I tried to modify polygons in my area of interest, it simply froze and cannot make any change to any polygon. I'm not sure if this only happens to me.

  4. I mentioned an issue about tile images downloaded by label-maker are sometimes corrupted in Tile images sometimes are corrupted #141 . This also happens in geojson.io when you use satellite images, which makes it sometimes impossible for modifying some polygon masks in corrupted regions.
    geojson

@drewbo
Copy link
Contributor

drewbo commented Jul 16, 2019

@VicXue

  • For 2, this is a consequence of GDPR. The usernames are truncated from the public download that doesn't need sign up. There's another link you can use to sign up and download a pbf with all metadata h/t @geohacker
  • For 3, I'm not sure exactly what is happening but you can try geojson.net or QGIS to see if that makes things easier.
  • For 4, see the update in Tile images sometimes are corrupted #141

@VicXue
Copy link
Author

VicXue commented Jul 30, 2019

@drewbo
I have managed to follow the pipeline and use my own data for training. However, I found out that when I include the geojson parameter in the config.json file, the download process will throw an OSError:

Saving QA tiles to /filter.mbtiles

Traceback (most recent call last):
File "/usr/local/bin/label-maker", line 11, in
load_entry_point('label-maker==0.5.1', 'console_scripts', 'label-maker')()
File "/usr/local/lib/python3.6/dist-packages/label_maker/main.py", line 97, in cli
download_mbtiles(dest_folder=dest_folder, **config)
File "/usr/local/lib/python3.6/dist-packages/label_maker/download.py", line 32, in download_mbtiles
for line in r:
File "/usr/lib/python3.6/gzip.py", line 374, in readline
return self._buffer.readline(size)
File "/usr/lib/python3.6/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/usr/lib/python3.6/gzip.py", line 463, in read
if not self._read_gzip_header():
File "/usr/lib/python3.6/gzip.py", line 411, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'<?')

However, if I download mbtiles without using the geojson file (exclude the geojson parameter in config.json) first, then run labels process with geojson parameter included in config.json, the program will run without throwing error and masks can be successfully retrieved. Is this supposed to happen? Also, when is the geojson file used by the label-maker? Sorry for asking about the fundamental problems. I have attached my config.json file and the geojson file I created with this reply.
config.tar.gz

@drewbo
Copy link
Contributor

drewbo commented Jul 30, 2019

@VicXue the geojson parameter is used for when you are reading from a local file instead of using the download command to obtain QA tiles so they should be used separately.

@VicXue
Copy link
Author

VicXue commented Jul 30, 2019

@drewbo Does this mean I need to prepare separate config.json files when using different commands? Is there any reason regarding why the download command actually take the geojson parameter at the first place? Can't it ignore unnecessary parameters?

@drewbo
Copy link
Contributor

drewbo commented Jul 30, 2019

Ah, I'm rereading the thread and now I better understand the problem. The geojson feature is a bit of a hack and assumes that it isn't being used for downloads and overwrites the country parameter. So for now I'd use two different config.json files but also I'll open this as a bug

@nnRichterNN
Copy link

Hi,
is it possible that in label.py on line 75, it should be
"if op.exists(filtered_geo):" instead of "if not op.exists(filtered_geo):"
to use a geojson file for a polygon boundingbox?

@drewbo
Copy link
Contributor

drewbo commented Sep 6, 2019

@nnRichterNN I think it's correct as is but it's a little bit of a hack:

  • In the standard case (starting from QA tiles), we won't already have a GeoJSON, so we use tippecanoe-decode to create one
  • In the geojson case, we skip this portion because we're counting the input GeoJSON as the filtered geo
  • In both cases, after that step, we have a file which we can convert to mbtiles using tippecanoe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants