Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[search][generator] External postcodes. #8179

Merged
merged 5 commits into from
May 22, 2024
Merged

[search][generator] External postcodes. #8179

merged 5 commits into from
May 22, 2024

Conversation

vng
Copy link
Member

@vng vng commented May 15, 2024

Postcodes for US and UK (GB).

Fixes #717
A partial fix for #1876

@vng vng added Search Search Generator Related to generator tool. labels May 15, 2024
@vng vng requested a review from biodranik May 15, 2024 01:18
@vng vng force-pushed the vng-postcodes branch 4 times, most recently from b0f8805 to d393dd9 Compare May 15, 2024 02:27
continue;

CHECK_EQUAL(valueMapping.size(), index, ());
keyValuePairs.emplace_back(search::NormalizeAndSimplifyString(fields[fieldIndices.m_postcodeIndex]),
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you capitalize all the letters in postcodes? Lowercase postcodes look strange:

i can't find any postcode formats where lowercase letters are used: https://en.wikipedia.org/wiki/List_of_postal_codes

(thank you so much for working on this, i think it'll make a lot of people happy!)

Copy link
Member

@biodranik biodranik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's betatest it.

@@ -232,6 +233,21 @@ void Trim(std::string_view & sv)
sv = {};
}

void Trim(std::string_view & s, std::string_view anyOf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: returning sv here and above may work slightly faster and have less braces.

s.remove_prefix(i);

i = s.find_last_not_of(anyOf);
ASSERT(i != std::string_view::npos, ());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This assert looks unnecessary.

@@ -648,9 +648,14 @@ bool BuildSearchIndexFromDataFile(std::string const & country, feature::Generate
writer->Seek(endOffset);
}

/// @todo FilesContainerW::Write with section overriding invalidates current container instance
/// (@see FilesContainerW::GetWriter), thus we can't make 2 consecutive Write calls.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be merged into the master separately ASAP?

@@ -76,6 +76,8 @@ NEED_BUILD_WORLD_ROADS: false
# FOOD_TRANSLATIONS_URL:
# SRTM_PATH:
# ISOLINES_PATH:

# Local path (not url!) to .csv files.
# UK_POSTCODES_URL:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# UK_POSTCODES_URL:
# UK_POSTCODES_PATH:

@@ -217,6 +217,48 @@ UNIT_TEST(FilesContainer_RewriteExisting)
FileWriter::DeleteFileX(fName);
}

/// @todo To make this test work, need to review FilesContainerW::GetWriter logic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with this logic/test?

bool res = true;
if (topmostCountry == "United Kingdom" && !FLAGS_uk_postcodes_dataset.empty())
if (!FLAGS_uk_postcodes_dataset.empty() && strings::StartsWith(country, "UK_"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any idea on how to pass any other external databases with postal codes in the same format without introducing a command-line flag for each country? E.g. if I want to add swiss postal codes? Maybe simply pass a folder with files starting with prefixes matching our countries?

{
// Zip;City;State;Latitude;Longitude;Timezone;Daylight savings time flag;geopoint
// * Get data source from here: https://simplemaps.com/data/us-zips
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is all attribution present in data/copyright.html? @rtsisyk can you please double-check it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory we can use Nominatim and not simplemaps.com, but:

По поводу Nominatim dataset. Первый же entry по US оттуда: 00586,18.343697,-67.028434
Google Maps говорит что в этих координатах 00685
В simplemaps.com такого кода нет.
Google Maps такой код находит только в Африке.


namespace search
{
using namespace std;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meh...

@vng
Copy link
Member Author

vng commented May 22, 2024

@RedAuburn
Is "WC2" https://overpass-turbo.eu/s/1LQt a real postal code? Or it is used like a prefix?
I don't see it on the nominatim data source, only WC2A .. WC2R

@vng vng merged commit 2f1f05a into master May 22, 2024
15 checks passed
@vng vng deleted the vng-postcodes branch May 22, 2024 20:06
@RedAuburn
Copy link
Sponsor Member

RedAuburn commented May 23, 2024

@RedAuburn Is "WC2" https://overpass-turbo.eu/s/1LQt a real postal code? Or it is used like a prefix? I don't see it on the nominatim data source, only WC2A .. WC2R

It's a postcode area, which contains prefixes inside like WC2A etc. Doesn't need search support IMO 👍️ https://en.wikipedia.org/wiki/WC_postcode_area

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Generator Related to generator tool. Search Search
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support of UK post codes in the search
3 participants