You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been working with natural language processing and often needed to know which words were used in certain corpora. Many dictionaries are comprised of word stems, requiring the extraction of stems from sentences. For example, specific words can be key clues or carry important information, necessitating the extraction of sentences using these words, or processing sentence information with them.
In this context, I have developed a class called AutoLemmatizer in stem/wordnet.py that automatically performs tokenization and part-of-speech-based lemmatization, returning the lemmas of all words used in a sentence.
I also considered converting 'n't' to 'not,' but have not implemented because I can't sure that is good idea.
>>>fromnltk.stemimportAutoLemmatizer>>>auto_wnl=AutoLemmatizer()
>>>print(auto_wnl.auto_lemmatize('Proverbs are short sentences drawn from long experience.'))
['Proverbs', 'be', 'short', 'sentence', 'draw', 'from', 'long', 'experience', '.']
>>>print(auto_wnl.auto_lemmatize('proverbs are short sentences drawn from long experience.'))
['proverb', 'be', 'short', 'sentence', 'draw', 'from', 'long', 'experience', '.']
I have been working with natural language processing and often needed to know which words were used in certain corpora. Many dictionaries are comprised of word stems, requiring the extraction of stems from sentences. For example, specific words can be key clues or carry important information, necessitating the extraction of sentences using these words, or processing sentence information with them.
In this context, I have developed a class called AutoLemmatizer in stem/wordnet.py that automatically performs tokenization and part-of-speech-based lemmatization, returning the lemmas of all words used in a sentence.
I also considered converting 'n't' to 'not,' but have not implemented because I can't sure that is good idea.
Resolves: nltk:#3257
The text was updated successfully, but these errors were encountered: