The goal of the project is to experiment and fine-tune the CodeBERT Masked Language Model, to then deploy it to a simple microservice-based web application written using the SpringBoot framework
To fine-tune the CodeBERT MLM I used 36 opensource java-based projects, the choice of the projects was not random for a model which needs to succesfully predict a mask in a code must be trained on 'high quality' code, i.e. that written by hihghly experienced software engineers from either FAANG companies or from the Apache Software Foundation. The code from the projects will be used as both training set and test set, the specifics of which projects are used as what are in the experiment's notebooks
- GitHub
- Apache Flink
- Apache Hadoop
- Apache Kafka
- Apache Tomcat
- Apache Storm
- Apache Hive
- Apache Hbase
- Apache Nifi
- Apache Shiro
- Apache Calcite
- Apache Iceberg
- Apache Ignite
- Apache Kylin
- Apache Maven
- Apache Cassandra
- Apache Curator
- Apache Parquet
- Apache IOTDB
- Apache CXF
- Apache Netbeans
- Apache Atlas
- Apache Flume
- Apache Dubbo
- Apache ActiveMQ
- Apache Jena
- Apache Pulsar
- Apache OpenNLP
- Apache TomEE
- Apache Accumulo
- Google Guava
- Google Oauth-Java-Client
- IBM Watson Java-SDK
- Facebook Buisness SDK
- Microsoft Authentication Library in Java
- Springboot Framework