Skip to content

This source code converts a given corpus in the PennTreebank format to the DCG format, being appropriate to run in Prolog.

Notifications You must be signed in to change notification settings

viniciusarruda/visl2dcg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VISL/PennTreebank to DCG converter

This source code converts a given corpus in the PennTreebank format to the DCG format, being appropriate to run in Prolog.

Adjustments and improvements

The project is still in development and upcoming updates will address the following tasks:

  • Enable PennTreebank format
  • Compute probability and frequency count for rules
  • Reorder the rules for better efficiency and remove loops
  • Generate the probability for the parse tree
  • Generate the grammar with argument structure
  • Add option for rule cut, pruning the rules with a frequency below a given threshold.

💻 Requirements

This project was tested with Python 3.8. To install the dependencies install the requirements:

pip install -r requirements.txt

☕ Using the DCG converter

To use the DCG converter just run the main.py script with the following arguments:

usage: main.py [-h] --file_path FILE_PATH --file_format {VISL,PennTreebank,TigerXML} --output_folder OUTPUT_FOLDER [--graphviz]

optional arguments:
  -h, --help            show this help message and exit
  --file_path FILE_PATH
                        File path in the specified format.
  --file_format {VISL,PennTreebank,TigerXML}
                        File format.
  --output_folder OUTPUT_FOLDER
                        Output folder.
  --graphviz            A boolean switch to render the tree in graphviz

Example of usage:

python main.py --file_path ../dataset/Bosque_CF_8.0.PennTreebank_utf8.ptb --file_format PennTreebank --output_folder ../output

About

This source code converts a given corpus in the PennTreebank format to the DCG format, being appropriate to run in Prolog.

Topics

Resources

Stars

Watchers

Forks

Languages