Middle Egyptian Dictionary Parser

Parses and combines 3 Middle Egyptian dictionaries (Mark Vygus (2012, 2018), Paul Dickson, and a third lexicon originally from OpenGlyph (that I found through the Morris Franken dataset for "Automatic Egyptian Hieroglyph Recognition by Retrieving Images as Texts")) for addition to a database.

Stages of the Project Thus Far:

Read in PDF files from Vygus, Dickson, Lexicon
Parse different display formatting and clean up
Display as unicode
Realize unicode is all unformatted -- implement a trigram model to add formatting lost when PDFs read in

4.1 - Given formatted texts, preprocess and parse all formatted trigrams

4.2 - Map formatted trigrams back to words in dictionary
Add caching and serialization to trigram model to speed up database generation
Attempt to implement formatted unicode only to realize there are no fonts for this aside from the Unicode 12 spec. Migrate application to RESJs, which takes somewhat longer to render but allows for glyph formatting.
Added formatted transliteration where dotted h's display in lieu of a capital H and such.
Work on improving parts of speech, which weren't standardized between the two texts
Realize that keyword search for translation is slow, but Mongo text indexing is not working. Create own keyword indexer for application.

9.1 - Iterate over all dictionary entries' translations, remove stop words and file entry under key words

9.2 - When a search is conducted, remove stop words from translation, and then conduct pre-performed searches of remaining words. Intersect or union returned entries based on user configurations.
Add an advanced search field over the gardiner signs that displays signs as the user searches to help new users onboard.
Added a Gardiner Sign List description page.

Planned Stages of the Project Going Forward:

Add mobile responsitivity
Add log-in with 2 types of user - admin & editor

2.1 - Editors can make approved appropriate changes to formatting pending admin approval

2.2 - Admins can view a queue of requested changes and approve or deny

2.3 - Changes are not pushed to the database until the entire queue is viewed, to prevent unnecessary expense
Begin working on tagger
Begin working on translation scheme.

Initial Documentation

To Create a Singular Dictionary:

Dictionary<string, DictionaryEntry> entries = new Dictionary<string, DictionaryEntry>();
VygusFactory fact = new VygusFactory();
fact.Create2018Instance(entries).ParseAll();

To Create All Dictionaries:

MiddleEgyptianDictionary med = new MiddleEgyptianDictionary();
med.CreateDictionaries();

To Create Keyword Generator that goes with dictionaries:

KeywordGenerator keywordGenerator = new KeywordGenerator();
keywordGenerator.GenerateKeywordsFromEntries(med.GetEntries());

Write dictionaries and keywords to Database:

DbManager manager = new DbManager();
var task1 = Task.Run(async () => { await manager.WriteEntriesToDbAsync(med.GetEntries()); });
var task2 = Task.Run(async () => { await manager.WriteKeywordsToDbAsync(keywordGenerator.GetKeywordSearchList()); });
task1.Wait();
task2.Wait();

In order to create the formatted dictionary from scratch, delete ~/data_output/gardinerToMDC.txt . To create the trigrams from scratch, delete ~/data_output/Trigrams.txt. Please do not delete ~/data_output/gardinerSignList.txt.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
MiddleEgyptianDictionary		MiddleEgyptianDictionary
.gitattributes		.gitattributes
.gitignore		.gitignore
MiddleEgyptianDictionary.sln		MiddleEgyptianDictionary.sln
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly