Skip to content

code for kaggle competition Microsoft malware classification

Notifications You must be signed in to change notification settings

kamilburda/kaggle_Microsoft_Malware

 
 

Repository files navigation

kaggle_Microsoft_Malware

Code that won [Kaggle Microsoft Malware Classification Competition] (https://www.kaggle.com/c/malware-classification). Great credits go to team mate daxiongshu for organizing everything!

Please see the PDF for our methods and running the code. It heavily used [XGBOOST] (https://github.com/dmlc/xgboost).

This is a fork of the original winning code ported to Python 3 with updated dependencies. This fork requires Python 3.6.1 to run properly.

Installation

  1. Clone the repository

  2. (Optional, highly recommended) Create a virtual environment

  3. Install the required packages:

    python -m pip install -r requirements.txt
    
  4. Install pypy for Python 3

  5. Set up your PATH variable so that pypy points to the executable that runs pypy (so that pypy may be run as pypy [arguments] without specifying the full path to pypy)

Usage

To train a model and perform predictions, see the PDF.

If you performed a custom split of the dataset into a train and a test set and you want to assess the prediction performance, run

prediction_performance.py [path to predictions] [path to true labels]

where [path to predictions] is a path to the CSV containing predictions generated by one of the models, and [path to true labels] is a path to the CSV containing test labels (having the same structure as the CSV for the train labels on the Kaggle site).

About

code for kaggle competition Microsoft malware classification

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.1%
  • Shell 0.9%