19 Jun 15:45

2.0.0 Beta 1

Added Grid-based Data Extraction and Corpus Querying

This update extends the analytical capabilities of the application, allowing for automated and background extraction of structured data from documents, improving efficiency and scalability.

We've added a couple models on the backend:

Extract: Represents a headless, background annotation task linked to a Corpus and Fieldset.
Fieldset: Defines a reusable set of fields for Extracts, linked to Columns.
Column: Represents a discrete data structure to extract from a document, with various properties like query, match_text, output_type, and more.
Datacell: Represents extracted data for each column and document, storing data as JSON.
LanguageModel: Represents a language model to be used in the extraction process.

Improved Test Suite

LlamaIndex is being tested with vcr.py so we actually have realistic tests and mocks for corpus query and corpus extract tasks
Added a lot of graphql query and endpoint tests

New GUI Elements

There is now an extract tab and a number of GUI elements to make it easy to construct an extract grid made up of documents, corpora and re-usable columns.
Within the Corpus view, there is a query tab you can use to ask questions of the corpus

What's Changed

Add Data Extraction by @JSv4 in #117

Full Changelog: v1.3.0...v2.0.0b1

Contributors

JSv4

Assets 2

04 Jun 04:01

JSv4

v1.3.0

ef648e4

Add Nlm Parser

Major feature is addition of nlm ingestor microservice which will eventually totally replace the PAWLs preprocessor (which has some periodic issues for certain doc types). This allows us to import layout blocks along with the document and token layers.

What's Changed

Add Documentation on Annotation Creation Logic + Component(s) by @JSv4 in #113
Create overview.md by @JSv4 in #114
Add Nlm-ingestor by @JSv4 in #115
Add Structural Annotations and Vector Embeddings by @JSv4 in #116

Full Changelog: v1.2.2...v1.3.0

Contributors

JSv4

Assets 2

13 Sep 04:50

JSv4

v1.2.2

be3de1c

Upgrade Parser

I moved the PAWLs parser to its own repo and am now pointing my dependency there. I also noticed that I had made some changes beyond bug fixes in my work to improve outputs where PDF image quality is bad. While this did improve the results, I inadvertently introduced a scaling issue with the token coordinate system, and the tokens were offset from the image, so labeling was effectively broken. I rolled back the OCR quality workarounds I added to fix the scaling issue in my new repo. These can be added back in later, but, for now, OpenContracts functionality is restored.

What's Changed

Fix Broken Coordinate System in Parser by @JSv4 in #112

Full Changelog: v1.2.1...v1.2.2

Contributors

JSv4

Assets 2

13 May 03:49

JSv4

v1.2.1

c2b2902

Add Annotated Document Import Mutation

Created a new format that encapsulates a document's pdf, its text, its PAWLs tokens and all annotations which can be imported in a single API call. This will be useful for remote clients that might process a document and then want to upload multiple annotations simultaneously. Will also support planned feature to export single annotated documents in addition to entire corpuses.

What's Changed

Added import task to import a single annotated doc. Also added a test. by @JSv4 in #110

Full Changelog: v1.2.0...v1.2.1

Contributors

JSv4

Assets 2

10 Mar 06:10

JSv4

v1.2.0

bce304a

Add More Export Formats

The main feature addition here is the ability to export documents into FUNSD-style annotations that can easily be loaded into LayoutLM-style models. There is also a LangChain export, but it's not fully-baked yet . At the moment, it just exports full document text and metadata. This release also comes with a number of bug fixes.

What's Changed

Fix Quickstart Docs by @JSv4 in #84
Fix Django Auth by @JSv4 in #86
Add Export Format Choice GUI by @JSv4 in #88
Quickstart updated to include steps to configure .env files. by @JSv4 in #89
Add Funsd Export by @JSv4 in #92

Full Changelog: v1.1.0...v1.2.0

Contributors

JSv4

Assets 2

28 Feb 04:31

JSv4

v1.1.0

44d85ff

v1.1.0 - Add Metadata Annotations and Improve Parser

Initial release of a version of OpenContracts that supports "metadata" annotations - essentially data fields the user (or API) can populate. Long-term, it'd be great to support multiple data types, but, for now, this is just string data. I've also rebuilt the document processing pipeline for higher performance and more robust handling of extreme variations in document sizes. Every document is split into single pages and then the pages are added to a queue for processing. I do need to add some documentation on how to "tune" celeryworkers for your machine. I'd suggest starting with --concurrency=1 (single threaded) and then scaling the celery worker service via Docker Compose, but there are probably other approaches that would work too.

What's Changed

Bump sphinx from 5.0.2 to 5.3.0 by @dependabot in #29
Bump mypy from 0.982 to 0.991 by @dependabot in #32
Bump postgres from 15.0 to 15.1 in /compose/production/postgres by @dependabot in #31
Bump flake8 from 4.0.1 to 5.0.4 by @dependabot in #30
Bump django-celery-beat from 2.2.1 to 2.4.0 by @dependabot in #13
Bump django-model-utils from 4.2.0 to 4.3.1 by @dependabot in #38
Bump traefik from v2.9.4 to 2.9.5 in /compose/production/traefik by @dependabot in #40
Bump django-environ from 0.8.1 to 0.9.0 by @dependabot in #36
Bump django-stubs from 1.12.0 to 1.13.0 by @dependabot in #39
Add .dockerignore by @JSv4 in #50
Bump actions/checkout from 3.1.0 to 3.2.0 by @dependabot in #48
Bump traefik from 2.9.5 to 2.9.6 in /compose/production/traefik by @dependabot in #47
Bump redis from 3.5.3 to 4.4.0 by @dependabot in #45
Bump drf-extra-fields from 3.2.1 to 3.4.1 by @dependabot in #42
Bump sphinx from 5.3.0 to 6.1.0 by @dependabot in #55
Bump actions/checkout from 3.2.0 to 3.3.0 by @dependabot in #54
Bump actions/setup-node from 3.5.1 to 3.6.0 by @dependabot in #53
Bump psycopg2 from 2.9.3 to 2.9.5 by @dependabot in #52
Bump argon2-cffi from 21.1.0 to 21.3.0 by @dependabot in #44
Add Backend Tweaks for Metadata Annotations by @JSv4 in #56
Bump flake8 from 5.0.4 to 6.0.0 by @dependabot in #59
Bump pytz from 2022.5 to 2022.7 by @dependabot in #57
Bump pillow from 9.2.0 to 9.4.0 by @dependabot in #58
Add GUI Elements to Filter on Metadata and Thumbnails for Docs by @JSv4 in #75
Bump djangorestframework-stubs from 1.4.0 to 1.8.0 by @dependabot in #78
Bump postgres from 15.1 to 15.2 in /compose/production/postgres by @dependabot in #72
Bump redis from 4.4.0 to 4.5.1 by @dependabot in #71
Make Document Processing Pipeline More Fault Tolerant by @JSv4 in #79

Full Changelog: v1.0.1...v1.1.0

Contributors

JSv4 and dependabot

Assets 2

20 Nov 00:25

JSv4

v1.0.1

87ad0b3

v1.0.1 - Added API Token Authorization

New Features:

This release adds an API Token Authorization mechanism so you can more easily integrate OpenContracts into backend services and infrastructure.

Chores:

A number of packages have been upgraded. See below.

What's Changed

Updated codecov badge. by @JSv4 in #10
Added frontend .env file samples and guidance. by @JSv4 in #11
Bump actions/checkout from 3.0.2 to 3.1.0 by @dependabot in #5
Bump crispy-bootstrap5 from 0.6 to 0.7 by @dependabot in #8
Bump black from 22.6.0 to 22.10.0 by @dependabot in #9
Bump traefik from v2.8.7 to v2.9.1 in /compose/production/traefik by @dependabot in #2
Bump mypy from 0.971 to 0.982 by @dependabot in #7
Bump responses from 0.21.0 to 0.22.0 by @dependabot in #4
Bump postgres from 14.5 to 15.0 in /compose/production/postgres by @dependabot in #1
Bump actions/setup-node from 3.4.1 to 3.5.1 by @dependabot in #3
Update Tests and Remove Configs by @JSv4 in #16
Bump flake8-isort from 4.1.1 to 5.0.0 by @dependabot in #6
Bump django-coverage-plugin from 2.0.2 to 2.0.3 by @dependabot in #15
Bump typing-extensions from 4.3.0 to 4.4.0 by @dependabot in #18
Bump pytz from 2021.3 to 2022.5 by @dependabot in #17
Bump coverage from 6.2 to 6.5.0 by @dependabot in #14
Add Cookie Consent by @JSv4 in #19
Bump pydantic from 1.9.1 to 1.10.2 by @dependabot in #23
Bump scikit-learn from 1.1.1 to 1.1.3 by @dependabot in #22
Bump django-debug-toolbar from 3.2.2 to 3.7.0 by @dependabot in #21
Bump traefik from v2.9.1 to v2.9.4 in /compose/production/traefik by @dependabot in #20
Bump pytest-cov from 3.0.0 to 4.0.0 by @dependabot in #26
Bump django-storages[boto3] from 1.12.3 to 1.13.1 by @dependabot in #24
Bump celery from 5.2.1 to 5.2.7 by @dependabot in #25
Add an API Token Auth Mechanism by @JSv4 in #33
Update Test Env File by @JSv4 in #34

Full Changelog: v1.0.0...v1.0.1

Contributors

JSv4 and dependabot

Assets 2

24 Oct 04:13

JSv4

v1.0.0

cb6f7c8

First Public Release

Initial public release, with sample deployments including Gremlin Analyzers.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.0.0 Beta 1

We've added a couple models on the backend:

Improved Test Suite

New GUI Elements

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Releases: JSv4/OpenContracts

v2.0.0 b1 - Add Data Extract and Corpus Querying

2.0.0 Beta 1

We've added a couple models on the backend:

Improved Test Suite

New GUI Elements

What's Changed

Contributors

Add Nlm Parser

What's Changed

Contributors

Upgrade Parser

What's Changed

Contributors

Add Annotated Document Import Mutation

What's Changed

Contributors

Add More Export Formats

What's Changed

Contributors

v1.1.0 - Add Metadata Annotations and Improve Parser

What's Changed

Contributors

v1.0.1 - Added API Token Authorization

What's Changed

Contributors

First Public Release