node2vec-arxiv

Experiment using node2vec on arXiv papers metadata.

Installation

Prerequisites

Python ≥ 3.6

Provision a Virtual Environment

Create and activate a virtual environment (conda)

conda create --name py36_node2vec-arxiv python=3.6
source activate py36_node2vec-arxiv

If pip is configured in your conda environment, install dependencies from within the project root directory

pip install -r requirements.txt

Get ArXiv dataset

The dataset used in this repository should be downloaded from Kaggle

Create a folder data from within the project root directory. Place the downloaded file arxivData.json in the data folder.

Running the code

Now that the environment is setup and the dataset is available, you can run the code using the following command:

python main.py

This will by default use the arxivData.json file as input and generate in the same data folder the following embedding files:

kg_node2vec_embed.emb: the embedding file with as first column the node id followed by the vector dimensions
kg_node2vec_label.tsv: a mapping of node id to node label

To simplify the visualisation we output as well embeddings and labels compliant with tensorflow projector tool. Note that we filter only to Author nodes for the purpose of the blog post.

kg_node2vec_tf_proj.tsv: an embedding file compliant with tensorflow project format (vectors without label nor id)
kg_node2vec_label.tsv: an label file compliant with tensorflow project format

Visualising the embeddings

Use Tensorflow projector to visualise the embeddings. You can load the data (embedding and label).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
src		src
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

LICENSE

LICENSE

README.md

README.md

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

node2vec-arxiv

Installation

Prerequisites

Provision a Virtual Environment

Get ArXiv dataset

Running the code

Visualising the embeddings

About

Releases

Packages

Languages

License

pyvandenbussche/node2vec-arxiv

Folders and files

Latest commit

History

Repository files navigation

node2vec-arxiv

Installation

Prerequisites

Provision a Virtual Environment

Get ArXiv dataset

Running the code

Visualising the embeddings

About

Topics

Resources

License

Stars

Watchers

Forks

Languages