New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tags normalization #401
Labels
Comments
Hi @FrancescoManfredi, first of all thanks for your input and the blog post! Super fascinating. I personally don't have much time these days to tackle the issue but if no one picks it up by mid of May I'll try and tackle it myself. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
A high number of tags refer to the same concept with different wording or different casing/styling for the same words.
It might be a good idea to add a normalization pipeline for the tags in each company.
Here is a mapping from original to normalized tags in the form of a python dict (easily convertible in any other format) that might be useful as a starting point: https://github.com/FrancescoManfredi/AIRV-analysis/blob/main/tags_repl.py
I'm the author of that mapping and this is an invite to make use of it in any way you prefer.
The text was updated successfully, but these errors were encountered: