-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Context for ngrams? #171
Comments
Hi @DTchebotarev , this is not currently a feature, but I appreciate that padding sequences is a common task in deep learning. I've been dragging my feet on getting DL models into textacy, but when I do, I'd expect to include useful adjacent functionality like this as well. |
Padding sequences is common even not in deep learning. It gives more context to an n-gram (i.e. it states that it is text-initial). |
I recently implemented something like this in a keyterm extraction algorithm: Lines 247 to 251 in 794be59
Unlike |
Is it possible to add context to ngram extraction?
For example, currently running
list(textacy.Doc('I like green eggs and ham.').to_terms_list(ngrams=3,as_strings=True))
returns a list
['-PRON- like green', 'like green egg', 'egg and ham']
But I would ideally like to have the option to specify something like
list(textacy.Doc('I like green eggs and ham.').to_terms_list(ngrams=3,as_strings=True, left_pad=True, right_pad=True))
and have it return something along the lines of
['<s2> <s1> -PRON', '<s1> -PRON- like' ,'-PRON- like green', 'like green egg', 'egg and ham', 'and ham </s1>', 'ham </s1> </s2>]
I don't think this is possible in textacy currently, so I guess this is a feature request.
Also any ideas for a workaround are greatly appreciated :)
The text was updated successfully, but these errors were encountered: