Skip to content

minorg/doc2sdo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

doc2sdo

Extract entities from PDF and text documents and transform them to schema.org resources in RDF.

Installation

pip install doc2sdo

Usage

From the command line

doc2sdo path/to/your.pdf >output.ttl

From Python

from pathlib import Path
import sys

from doc2sdo import doc2sdo

for thing in doc2sdo(Path("/path/to/your.pdf")):
    thing.resource.graph.serialize(sys.stdout.buffer)

Development

Prerequisites

Install dependencies

script/bootstrap

Run tests

script/test

About

Extract entities from text documents and transform them to schema.org resources

Topics

Resources

License

Stars

Watchers

Forks