Skip to content

Create a Gephi Citation Graph based on Text Analysis of PDFs from Zotero

Notifications You must be signed in to change notification settings

jaks6/citation_map

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Create a Citation Graph based on Simplistic Text Analysis

Inspired by A.R. Siders' R Script from this ResearchGate question

Based on dpapathanasiou's example script for pdfminer

Takes Zotero .CSV Article collections and creates Gephi-compatible files for Graph Edges and Nodes based on citations

screenshot

Principle:

  • Let A be a set of known articles
  • For any a in A, let title_a be its title, and text_a be its text content
  • For some x in A and y in A, x!=y:
    • cites(x,y) is true if title_y appears in text_x

For the above to work, we do some text normalization (removing punctuation, whitespace, special characters) and assume that the title_y would only appear in text_x if it appears in the references section...

Usage:

  1. Export list of articles as .csv from Zotero, (articles should have File attachments)
  2. Run analyze_papers.py zotero_file.csv
  3. Script should produce two files: Edges_titles.csv and Nodes_titles.csv in folder "gephi"
  4. Load them into Gephi with "Load Spreadsheet"

Notes

  • Tested with Python3
  • Uses the library pdfminer
  • You can specify number of processes the script uses to parse the PDFs with parameter --processes (default value is 4)

About

Create a Gephi Citation Graph based on Text Analysis of PDFs from Zotero

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages