Skip to content

This repository contains the Assignments done for the CS 613 : Natural Language Processing course at IIT Gandhinagar during Semster-1 2021-22.

Notifications You must be signed in to change notification settings

devanshuThakar/Natural-Language-Processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural Langugae Processing

This repository contains the Assignments done for the course CS 613 : Natural Language Processing course offerd at IIT Gandhinagar during Semster-1 2021-22.

Crawling Data

In this assignment data was scrapped from twitter using the twint API. Tweets related to India on the discussing about topics of Pollution, Climate Change, Eco Friendly and Flood were scrapped.

Word cloud for data for each topic (i.e. Pollution, Climate Change, Eco Friendly and Flood) was produced. The word cloud for pollution is shown.

alt-txt

Processing and Understanding Data

In this part a statistical analysis of the Data like frequency distribution of words, validiating the language annotation assigned by Twitter, fitting the Data with the Heap's Law.

According to Heap's Law, the size of vocabulary $|V|$ and number of tokens $N$ are related by the following expression : $$|V| = K N^{\beta}$$ where $K$ and $\beta$ parameters. The plot is shown below :

alt-txt

About

This repository contains the Assignments done for the CS 613 : Natural Language Processing course at IIT Gandhinagar during Semster-1 2021-22.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published