Skip to content

AndrewDarnall/Distributed-Systems-Engineering-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Distributed-Systems-Engineering-Project

Temporary Description

The goal of the project is to experiment and fine-tune the CodeBERT Masked Language Model, to then deploy it to a simple microservice-based web application written using the SpringBoot framework

The Data

To fine-tune the CodeBERT MLM I used 36 opensource java-based projects, the choice of the projects was not random for a model which needs to succesfully predict a mask in a code must be trained on 'high quality' code, i.e. that written by hihghly experienced software engineers from either FAANG companies or from the Apache Software Foundation. The code from the projects will be used as both training set and test set, the specifics of which projects are used as what are in the experiment's notebooks

The Sources

  • GitHub

The Projects

  • Apache Flink
  • Apache Hadoop
  • Apache Kafka
  • Apache Tomcat
  • Apache Storm
  • Apache Hive
  • Apache Hbase
  • Apache Nifi
  • Apache Shiro
  • Apache Calcite
  • Apache Iceberg
  • Apache Ignite
  • Apache Kylin
  • Apache Maven
  • Apache Cassandra
  • Apache Curator
  • Apache Parquet
  • Apache IOTDB
  • Apache CXF
  • Apache Netbeans
  • Apache Atlas
  • Apache Flume
  • Apache Dubbo
  • Apache ActiveMQ
  • Apache Jena
  • Apache Pulsar
  • Apache OpenNLP
  • Apache TomEE
  • Apache Accumulo
  • Google Guava
  • Google Oauth-Java-Client
  • IBM Watson Java-SDK
  • Facebook Buisness SDK
  • Microsoft Authentication Library in Java
  • Springboot Framework

About

Fine-tuning CodeBERT MLM on Java-based enterprise projects

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published