Skip to content

Robustats is a Python library for high-performance computation of robust statistical estimators.

License

Notifications You must be signed in to change notification settings

FilippoBovo/robustats

Repository files navigation

Robustats

Robustats is a Python library for high-performance computation of robust statistical estimators.

The functions that compute the robust estimators are implemented in C for speed and called by Python.

Estimators implemented in the library:

  • Weighted Median (temporal complexity: O(n)) [1, 2, 3]
  • Medcouple (temporal complexity: O(n * log(n))) [4, 5, 6, 7]
  • Mode (temporal complexity: O(n * log(n))) [8]

How to Install

This library requires Python 3.

You can install the library using Pip.

pip install robustats

You can also install the library directly from GitHub using the following command.

pip install -e 'git+https://github.com/FilippoBovo/robustats.git#egg=robustats'

Otherwise, you may clone the repository, and install and test the Robustats package in the following way.

git clone https://github.com/FilippoBovo/robustats.git
cd robustats
pip install -e .
python -m unittest

How to Use

This is an example of how to use the Robustats library in Python.

import numpy as np
import robustats


# Weighted Median
x = np.array([1.1, 5.3, 3.7, 2.1, 7.0, 9.9])
weights = np.array([1.1, 0.4, 2.1, 3.5, 1.2, 0.8])

weighted_median = robustats.weighted_median(x, weights)

print("The weighted median is {}".format(weighted_median))
# Output: The weighted median is 2.1


# Medcouple
x = np.array([0.2, 0.17, 0.08, 0.16, 0.88, 0.86, 0.09, 0.54, 0.27, 0.14])

medcouple = robustats.medcouple(x)

print("The medcouple is {}".format(medcouple))
# Output: The medcouple is 0.7749999999999999


# Mode
x = np.array([1., 2., 2., 3., 3., 3., 4., 4., 5.])

mode = robustats.mode(x)

print("The mode is {}".format(mode))
# Output: The mode is 3.0

How to Contribute

If you wish to contribute to this library, please follow the patterns and style of the rest of the code.

Moreover, install the Git hooks.

git config core.hooksPath .githooks

Tips:

  • In C, use malloc to allocate memory to the heap, instead of creating arrays that allocate memory to the stack, as with large array we would incur in a segmentation fault due to stack overflow.
  • Avoid recursions where possible to limit the spatial complexity of the problem. In place of recursions, use loops.

References

[1] Cormen, Leiserson, Rivest, Stein - Introduction to Algorithms (3rd Edition).

[2] Cormen - Introduction to Algorithms (3rd Edition) - Instructor's Manual.

[3] Weighted median on Wikipedia.

[4] G. Brys; M. Hubert; A. Struyf (November 2004). "A Robust Measure of Skewness". Journal of Computational and Graphical Statistics. 13 (4): 996–1017.

[5] Donald B. Johnson; Tetsuo Mizoguchi (May 1978). "Selecting The Kth Element In X + Y And X1 + X2 +...+ Xm". SIAM Journal on Computing. 7 (2): 147–153.

[6] Medcouple implementation in Python by Jordi Gutiérrez Hermoso.

[7] Medcouple on Wikipedia.

[8] David R. Bickel, Rudolf Frühwirth. "On a fast, robust estimator of the mode: Comparisons to other robust estimators with applications", Computational Statistics & Data Analysis, Volume 50, Issue 12, 2006, Pages 3500-3530, ISSN 0167-9473.