Skip to content

This script creates a list of unique words from Persian text. Words can be sorted by frequency or alphabetical order. This is a new project, there could be major bugs in the code.

Notifications You must be signed in to change notification settings

AshkanArabim/persian-word-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 

Repository files navigation

persian-word-extractor

This script creates a list of unique words from Persian text. Words are sorted by the frequency that they appear in the source.txt file. This is a new project, there could be major bugs in the code. Words with accent marks are excluded from results.

Features:

  • sort by frequency or alphabetical order

  • extract words from source.txt or online links

How to use:

  1. Create a file named 'source.txt' in root directory and paste source text inside.
  2. Run 'main.py'
  3. Follow CLI instructions.
  4. Results will be written to 'output.txt' in root directory.

Feel free to tweak the code to suit your needs.

How did I use it?

I ran this script on a large body of Persian text to extract words for contribution to Monkeytype. I added the "Persian 1k" & "Persian 5k" tests. My first open-source contribution!!

About

This script creates a list of unique words from Persian text. Words can be sorted by frequency or alphabetical order. This is a new project, there could be major bugs in the code.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages