Skip to content

This project contains Urdu characters and some preprocessing functions

License

Notifications You must be signed in to change notification settings

Tarequzzaman/Urdu-resource-NLP

Repository files navigation

Urdu-resource-NLP

This repo contain preprocessor , Stopwords and Other functionality that we need when we want to do work on Urdu NLP

  1. urdu.py contains URDU_DIACRITICS, URDU_DIGIT URDU_PUNCTUATIONS URDU_EXTRA_CHARACTER URDU_ALPHABET URDU_STOPWORDS

  2. The notebook preprocessor.ipynb contains some exaple of preprocesing

  3. capture_phone_or_email_from_text.py two function that accept string told that phone or email availabe in the text and return boolian vaule. The value
    0 -> Not found
    1 -> Found

About

This project contains Urdu characters and some preprocessing functions

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published