Skip to content

lschoe/random_forest

 
 

Repository files navigation

Random Forests in MPyC

CI Status

An implementation of machine learning on secure data. We allow a model to be trained and used on data that is kept private. We use the MPyC library to perform a secure multi-party computation (MPC) that trains a forest of decision trees using an algorithm that is similar to the C4.5 machine learning algorithm.

Installation

Install Python 3.7, then invoke:

pip install -r requirements.txt

Usage

The spect.py and balance.py files contain examples of how to specify a dataset and to train a random forest on this data. These examples can be run as follows:

python spect.py
python balance.py

Please keep in mind that these computations are much slower than their non-MPC counterparts.

Tests

Run the test by invoking:

pytest

Run tests in watch mode:

ptw [-c]

(The -c flag causes the screen to be cleared before each run.)

Profiling

pip install snakeviz
python -m cProfile -o spect.stats spect.py
snakeviz spect.stats

Thanks

This algorithm was developed as part of the SODA project. Many thanks to Mark Abspoel, Daniel Escudero and Nikolaj Volgushev for designing the decision tree algorithm for MPC (See chapter 6 of this SODA document). Many thanks to Berry Schoenmakers who developed MPyC and helped us throughout the implementation of this algorithm.

About

Random Forests in MPyC

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%