Skip to content

Retrofitted (domain-adapted) word vectors with Word2Vec skip-gram model

License

Notifications You must be signed in to change notification settings

aayux/retrogram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retrogram

License: MIT

Retrofitted word vectors with Word2Vec skip-gram model

This project is inspired from Mittens which extends the GloVe model to synthesize general-purpose representations with specialised datasets. The resulting representations of words are arguably more context aware than the pre-trained embeddings.

However, GloVe objective requires a co-occurence matrix of size V2 to be held in-memory, where V is the size of the domain adapted vocabulary. Needless to say, this method becomes difficult to scale with growing vocabulary size.

Replacing the GloVe model with skip-gram reduces the size of the matrix to V×E where E is the embedding dimension and depends on the pre-trained word-embeddings being utilised.