ENTRANT: A Large Financial Dataset for Table Understanding

Extract and clean tables from financial xlsx files from EDGAR and convert them to JSON with bi-tree positional information and metadata.

Related dataset:

Zavitsanos, E., Mavroeidis, D., Spyropoulou, E., Fergadiotis, M., & Paliouras, G. (2024). ENTRANT: A Large Financial Dataset for Table Understanding [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10667088

Install

Before starting, ideally, it's recommended to switch to a virtual environment first via conda or virtualenv.
Install dependencies via pip install -r requirements.txt

Usage

For table extraction from EDGAR:

Place the xls files in a directory named data in the project's root.
Create a directory named output to store the results.
Run extract_tables_multiprocess.py.

For downloading excel reports

See fetch_reports.py
Pay attention to fair usage of EDGAR

Data

Data is hosted at Zenodo: https://zenodo.org/records/10667088

Tests

Use pytest to run the unit tests.

Contributing

See the contributing file!

License

The project is licensed under Creative Commons Attribution 4 license.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
html		html
output_CTC		output_CTC
submissions		submissions
tests		tests
urls_lists		urls_lists
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENCE		LICENCE
README.md		README.md
cik_ticker.tsv		cik_ticker.tsv
convert_xls_to_xlsx.py		convert_xls_to_xlsx.py
extract_tables.py		extract_tables.py
extract_tables_multiprocess.py		extract_tables_multiprocess.py
fetch_reports.py		fetch_reports.py
post_process.py		post_process.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ENTRANT: A Large Financial Dataset for Table Understanding

Table of Contents

Install

Usage

For table extraction from EDGAR:

For downloading excel reports

Data

Tests

Contributing

License

About

Languages

License

izavits/entrant

Folders and files

Latest commit

History

Repository files navigation

ENTRANT: A Large Financial Dataset for Table Understanding

Table of Contents

Install

Usage

For table extraction from EDGAR:

For downloading excel reports

Data

Tests

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages