Skip to content

A neuro-symbolic workflow for generating controlled synthetic data for a code comment dataset

License

Notifications You must be signed in to change notification settings

HannaAbiAkl/NeSy-Code-Generation-Workflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NeSy-Code-Generation-Workflow

A neuro-symbolic workflow for generating controlled synthetic data for a code comment dataset

This is the official code repository for the paper: "NeSy is alive and well: A LLM-driven symbolic approach for better code comment data generation and classification ".

data

This directory contains three data files:

  • Seed data: The data provided by the IRSE 2023 shared task organizers to train the ML models.
  • ChatGPT-generated data: The data generated by a LLM assistant (ChatGPT in this case) to evaluate the overall increase in model performance after data augmentation.
  • Symbolic-generated data: The data generated by a script created by ChatGPT by learning symbolic rules to evaluate the overall increase in model performance after data augmentation.

experiments

This directory contains the code for training and evaluating ML models on all datasets. The code also contains data augmentation techniques using synthetic data.

src

This directory contains the source material such as the symbolic rules framework and the symbolic script for synthetic data generation.

About

A neuro-symbolic workflow for generating controlled synthetic data for a code comment dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published