Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



37 Commits

Repository files navigation


This is a Go implementation of the random forest algorithm for classification and regression. Both the random forest and the decision tree are usable as standalone Go packages. The cli can fit a model from a csv file and make predictions from a previously fitted model. The csv parser is rather limited, only numeric feature values are accepted.



go get



A model can be fitted from a csv file, the label or target value should be the first column and the remaining columns should be numeric features. The file may contain a header row. If a header row is present, the column names will be used for variable names in the variable importance report (see below). For example, the iris data would appear as:


Assuming these data are in a file named iris.csv, a model would be fitted with the following command:

rf -d iris.csv -f iris.model


-d, --data arg example data

-f --final_model arg (=rf.model) file to output fitted model

--var_importance arg file to output variable importance estimates

--trees arg (=10) number of trees to include in forest

--min_split arg (=2) minimum number of samples required to split an internal node

--min_leaf arg (=1) minimum number of samples in newly created leaves

--max_features arg (=-1) number of features to consider when looking for the best split, -1 will default to √(# features)

--impurity arg (=gini) the measure to use for evaluating candidate splits, must be gini or entropy

--workers arg (=1) number of workers for fitting trees

-c, --classification force parser to use integer/numeric labels for classification

Regression is also supported, the csv parser will detect if the first column is numeric or categorical. If the class labels look like numbers:


the parser will get confused and fit a regression model (it can't tell the difference between "1" and 1), if this happens, try running with the --classification flag.


After the input data is parsed and the forest fitted, rf will write a diagnostic report to stderr.

Fit 10 trees using 150 examples in 0.00 seconds

Variable Importance
Petal.Length   : 0.55
Petal.Width    : 0.40
Sepal.Width    : 0.03
Sepal.Length   : 0.02

Confusion Matrix
               setosa         versicolor     virginica
setosa         50             1              0
versicolor     0              46             5
virginica      0              3              45

Overall Accuracy: 94.00%

The confusion matrix and overall accuracy are estimated from out of bag samples for each tree in the forest. The report will show up to 20 variables in the variable importance section in decreasing order of importance. If your data have more predictors, the importance estimates for all variables can be written to a csv file using the --var_importance flag.

For a regression model:

Fit 10 trees using 506 examples in 0.01 seconds

Variable Importance
rm             : 0.38
lstat          : 0.30
nox            : 0.12
crim           : 0.05
dis            : 0.03
ptratio        : 0.03
tax            : 0.02
age            : 0.02
black          : 0.02
rad            : 0.01
indus          : 0.01
zn             : 0.01
chas           : 0.00

Mean Squared Error: 15.677
R-Squared: 81.487%

The mean squared error is computed from out of bag samples for each tree in the forest. The variable importance is reported in the same manner as classification.


Predictions can be made from a previously fitted model. The data for making predictions should be in a csv file with a format similar to the data used to fit the model, however, the first column will be ignored.

rf -d iris.csv -p iris_predictions.csv -f iris.model


-d, --data arg example data

-p, --predictions arg file to output predictions

-f, --final_model arg (=rf.model) file with previously fitted model


Documentation for the two packages, forest and tree can be found on godoc. tree implements classification trees while forest implements random forests using tree. See rf.go in this repository for an example of using the forest package.




[1] Louppe, G. (2014) "Understanding Random Forests: From Theory to Practice" (PhD thesis)

[2] Breiman, “Random Forests”, Machine Learning, 45(1), 5-32, 2001


No releases published


No packages published
