Skip to content
@DALAI-project

DALAI-project

DALAI - Using artificial intelligence to improve the quality and usability of digital records

The included repositories contain code and model files for tools developed as part of the the DALAI project (September 2021 - August 2023). The project was funded by the European Regional Development Fund’s ”Sustainable growth and jobs 2012–2020” programme and the City of Mikkeli.

The common aim of the different tools is to facilitate the automation of the digitisation and description of cultural heritage materials which are in the holdings of archives and other memory organisations.

Click here for more information on the included repositories
Repository Domain Content
CornerAPI Image Classification Code for an API that detects torn corners and edges from document images.
EmptyAPI Image Classification Code for an API that detects empty pages from document images.
PostitAPI Image Classification Code for an API that detects post-it/sticky notes from document images.
WritingtypeAPI Image Classification Code for an API that classifies document images based on the writing types(s) (handwritten, typewritten, combination) they contain.
FaultyImageAPI Image Classification Code for an API that combines the classification models listed above.
NER_API Named Entity Recognition Code for an API that performs named entity recognition from text input in Finnish.
Train_BERT_NER Named Entity Recognition Code for training Finnish named entity recognition (NER) model based on BERT language model.
Empty_training Image Classification Code for training a neural network model to detect empty pages from document images.
Train_document_classification Image Classification Code for training a neural network model to classify input documents into distinct classes based on the type/format of the document.
Train_fault_detection Image Classification Code for training a neural network model to detect faults like folded corners or sticky notes from document images.
Train_writing_type Image Classification Code for training a neural network model to classify document images based on the writing types(s) (handwritten, typewritten, combination) they contain.
Table_segmentation Image Segmentation Code for segmenting table structures and detecting text content in document images.

Some of the tools are also available via Arkkiivi web user interface.

Pinned

  1. CornerAPI CornerAPI Public

    API for a machine learning model trained to detect folded or torn corners and edges from scanned document images.

    Python 1

  2. PostitAPI PostitAPI Public

    API for a machine learning model trained to detect post-it/sticky notes from scanned document images.

    Python 1

  3. EmptyAPI EmptyAPI Public

    API for detecting empty document images

    Python 1

  4. NER_API NER_API Public

    API for performing named entity recognition from text input in Finnish.

    Python

  5. WritingtypeAPI WritingtypeAPI Public

    Repo for writingtype classifier API

    Python

Repositories

Showing 10 of 17 repositories
  • Annif_API Public

    Instructions and pretrained models for using Annif (https://annif.org/) software for automatic subject indexing as local service.

    0 Apache-2.0 0 0 0 Updated Oct 13, 2023
  • .github Public
    0 0 0 0 Updated Oct 5, 2023
  • Table_segmentation Public

    Code for segmenting table structures and detecting text content in document images.

    Python 0 MIT 0 0 0 Updated Sep 19, 2023
  • Train_BERT_NER Public

    Code for training Finnish named entity recognition (NER) model based on BERT.

    Python 0 MIT 0 0 0 Updated Sep 12, 2023
  • Document_segmentation Public

    A model for segmenting scanned document images.

    1 AGPL-3.0 1 3 0 Updated Sep 11, 2023
  • Arkkiivi_UI Public

    User interface for Arkkiivi web application

    TypeScript 0 MIT 0 0 0 Updated Aug 29, 2023
  • FaultyImageAPI Public

    API that combines empty page, post-it, folded corner and writing type detection models.

    Python 0 MIT 0 0 0 Updated Aug 24, 2023
  • EmptyAPI Public

    API for detecting empty document images

    Python 1 MIT 0 0 0 Updated Aug 24, 2023
  • CornerAPI Public

    API for a machine learning model trained to detect folded or torn corners and edges from scanned document images.

    Python 1 MIT 0 0 0 Updated Aug 24, 2023
  • PostitAPI Public

    API for a machine learning model trained to detect post-it/sticky notes from scanned document images.

    Python 1 MIT 0 0 0 Updated Aug 24, 2023

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…