Skip to content

Karelian - Finnish dictionary. Scraped from hard-to-use data to easy-to-use data.

License

Notifications You must be signed in to change notification settings

stscoundrel/sanakirju

Repository files navigation

Sanakirju

Karelian - Finnish dictionary with over 90 000 words. Transforms hard-to-use data into easy-to-use format for Node.js.

Sanakirju is a starting point that offers you the complete dataset as JSON. Use it as you like, perhaps as a website, app, twitter bot or however other way you see fit.

Examples:

Install

yarn add sanakirju

Read from XML files.

All the dictionary entries are provided as CC BY 4.0 XML-dataset. Sanakirju scrapes this data from XML to JSON and returns the whole set.

const sanakirju = require('sanakirju')

// Get dataset from xml.
const dictionary = await sanakirju.fromXML()

console.log(dictionary)

Sources.

Words & translations are from Karjalan Kielen Sanakirja created by Institute for the Languages of Finland. The original material is licenced under Creative Commons International (CC BY 4.0).

The data format of the original entries has been altered by Sanakirju Simplifier tool