Skip to content

ISE-Research/NotebookCU

Repository files navigation

Code Comprehension Predictor Service

This repository contains the source code for a code comprehension predictor service.

Usage

To run the code, install the requirements by executing the following command:

pip install -r requirements.txt

After installing the requirements, you can run the CLI or API and start using the service.

Data Requirements

To use the functionalities provided in this repository, you will need certain CSV files containing notebook code and markdown cell data. These files can be found here (DistilKaggle: a distilled dataset of Kaggle Jupyter notebooks) and here (A Predictive Model to Identify Effective Metrics for the Comprehension of Computational Notebooks).

Use below download links to get started

Folder Structure

  • src: Contains the main code that provides code comprehension prediction and metrics evaluation.
    • src/core: includes the main python files of the project. These classes and functions do the actual work behind the interfaces.
    • src/utils: helper files used to manage the project like config.py where we manage all the configurations.
    • src/notebooks: base notebook files that support the paper's results.
  • dataframes: Contains basic data of selected jupyter notebooks for training models. For example, code.csv that contains the source codes used in each notebook, and markdown.csv that has the markdown cells data.
  • metrics: Contains CSV files with metrics of selected Jupyter notebooks for training the models. For instance, code_cell_metrics.csv contains metrics of each code cell in the notebook, markdown_cell_metrics.csv contains markdown cell metrics of each notebook, and notebook_metrics.csv holds the aggregated metrics of all cells in the notebook.
  • notebooks: Stores the notebooks provided to be predicted by the code.
  • models: Stores the trained models.
  • logs: Keeps the log files.
  • cache: Is used for cached data.

CLI

First, cd to the src directory and then execute cli.py file and start your journey.

cd src
export PYTHONPATH="$(pwd)"
python cli.py --help

Use --help for each command to get further instructions. Some use cases are provided below.

python cli.py
python cli.py extract-dataframe-metrics --help
python cli.py extract-dataframe-metrics --chunk-size 100 --limit-chunk-count 5
python cli.py extract-dataframe-metrics ../dataframes/markdown.csv ../metrics/markdown_cell_metrics.csv --chunk-size 100 --limit-chunk-count 5 --file-type markdown

python cli.py aggregate-metrics --help
python cli.py aggregate-metrics ../metrics/code_cell_metrics.csv ../metrics/markdown_cell_metrics.csv ../metrics/notebook_metrics_lite.csv

python cli.py extract-notebook-metrics --help
python cli.py extract-notebook-metrics ../notebooks/file.ipynb ../notebooks/results.json
python cli.py extract-notebook-metrics ../notebooks/file.ipynb ../notebooks/results.csv

python cli.py predict ../notebooks/file.ipynb cat_boost ../models/catBoostClassifier.withOutPT.sf50.sr20.combined_score.v2.model 
python cli.py predict ../notebooks/file.ipynb cat_boost ../models/catBoostClassifier.withPT.sf50.sr20.combined_score.v2.model --pt-score 10

FastAPI

First, cd to the src directory and then execute main.py file and start your journey.

cd src
export PYTHONPATH="$(pwd)"
python main.py

after this you can see the documentation of the apis at http://localhost:8000/docs.

Docker

Use below command to build and run the image using docker compose

docker compose up --build

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •