Skip to content

java code naturalness calculation via n-gram language models.

License

Notifications You must be signed in to change notification settings

Ahmedfir/ngramlineloc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

n-gram Naturalness-based lines ranking

You can call ngram/tuna_fl_request.py to rank java file lines by cross-entropy (naturalness).

The script mainly calls the jar java-n-gram-line-level-1.0.0-jar-with-dependencies.jar built from this implementation: https://github.com/Ahmedfir/java-n-gram-line-level.git, which implementation uses Tuna APIs: https://github.com/electricalwind/tuna.git

##Typical usage:

  • input: List of java files from a project, whose lines will be ranked.
  • training input: the rest of java files from that same project.
  • output: ranked List of lines of the input files, by cross-entropy.

Example usage:

This library has been implemented to conduct the study of naturalness captured by CodeBERT:

@article{khanfir2022codebertnt,
    title={CodeBERT-nt: code naturalness via CodeBERT},
    author={Khanfir, Ahmed and Jimenez, Matthieu and Papadakis, Mike and Traon, Yves Le},
    journal={arXiv preprint arXiv:2208.06042},
    year={2022}
}

About

java code naturalness calculation via n-gram language models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages