Budget Text Analysis

Budget documents from different local governments are collected, preprocessed and analyzed to draw up conclusions that may help Guilford County to better organize their budget documents and enable them to make better budget decisions for the future.

Introduction

Text analysis has been defined as the automated process that allows machines to extract and classify information from text. Businesses might use text analysis to extract specific information such as keywords, names, and organization information. Businesses may also categorize text with tags according to topic, point of view or classify them as positive or negative. In this project our goal is to preform textual analysis on seven budget documents from different counties and cities across the state of North Carolina. By applying advanced text analysis methods, such as Topic Modelling and Sentiment Analysis, our team is hoping to extract meaningful information. In addition, the team is aiming to build a text generation tool that may assist Guilford County with the production of their budget documents.

Technologies

Python v3.7
Anaconda v3.4.3
Numpy v1.8.2
Pandas v0.25.1

Data Source

The data is obtained from the following organizations as PDF files and then converted into csv files

Goals

Understand the budget text data and analyze its scope.
Use topic modelling techniques to discover abstract topics.
Use NLP methods such as sentiment analysis to extract subjective information.
Find a proper way to quantify the similarities between the budget documents.
Develop a text generation tool.

Contributors

Team Tasks

Data Collection: Sultan Al Bogami
Data Pre-Processing: Everyone
Exploratory Text Analysis: Everyone
Statstical Text Analysis:
- original: Naseeb Thapaliya, Miguel Gaspar and Sultan Al Bogami
- sentiment: Akash Meghani, and Unnati Khivasara
Topic Modeling: Naseeb Thapaliya, Sultan Al Bogami and Miguel Gaspar
Sentiment Analysis: Akash Meghani, and Unnati Khivasara
Corpus Similarity: Sultan Al Bogami
Machine Learning:
- original: Naseeb Thapaliya, Miguel Gaspar and Sultan Al Bogami
- sentiment: Akash Meghani, and Unnati Khivasara
Next Word Recommender (optional): Everyone
Evaluation: Everyone
Deployment: Everyone

Name		Name	Last commit message	Last commit date
Latest commit History 383 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
doc		doc
src		src
util		util
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/ISSUE_TEMPLATE

.github/ISSUE_TEMPLATE

doc

doc

src

src

util

util

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Budget Text Analysis

Table of contents

Introduction

Technologies

Data Source

Goals

Contributors

Team Tasks

About

Releases

Packages

Contributors 6

Languages

License

Unnati20/Budget_Text_Analysis

Folders and files

Latest commit

History

Repository files navigation

Budget Text Analysis

Table of contents

Introduction

Technologies

Data Source

Goals

Contributors

Team Tasks

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages