Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashes in category/file names make retrieval in pycorpora difficult #236

Open
serin-delaunay opened this issue Nov 26, 2016 · 1 comment

Comments

@serin-delaunay
Copy link
Contributor

serin-delaunay commented Nov 26, 2016

At the moment there are categories in corpora like "film-tv" and files like "materials/abridged-body-fluids". When using tools like pycorpora, these names cause problems because they prevent the user from retrieving files using standard syntax, such as pycorpora.category_name.file_name['key'], because - is not a legal character in Python identifiers.
In pycorpora I can work around this as follows:
getattr(pycorpora, 'film-tv').tv_shows[''tv_shows']
pycorpora.materials.get_file('abridged-body-fluids')['abridged body fluids']
However, this isn't ideal and probably either pycorpora and similar libraries should perform these workarounds internally (translating - to _, for instance), or corpora should restrict category and file names to valid JS/Python/C (for example) identifiers.
I've opened a similar issue in pycorpora: aparrish/pycorpora#11.

@hugovk
Copy link
Collaborator

hugovk commented Nov 28, 2016

This has been fixed (but not yet released) in pycorpora. See aparrish/pycorpora#11 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants