The Matrix Profile has the potential to revolutionize time series data mining because of its generality, versatility, simplicity and scalability. In particular it has implications for time series motif discovery, time series joins, shapelet discovery (classification), density estimation, semantic segmentation, visualization, rule discovery, clustering etc.
src
: Folder containing the contextual matrix profile codebase (i.e., foldercmp
) and necessary packages (i.e.,distancematrix
).tests
: Contains Python-based test cases to validate source code.pyproject.toml
: Contains metadata about the project and configurations for additional tools used to format, lint, type-check, and analyze Python code. ( See here for reference)
Create virtual environment and activate it and install dependencies:
- Makefile
make setup
- Linux:
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt
- Windows:
python -m venv venv venv\Scripts\activate pip install -r requirements.txt
Run the main script through the console:
source .venv/bin/activate
python -m src.cmp.main
You should see in the terminal a message output like the following:
CONTEXT 1 : Subsequences of 05:45 h (m = 23) that start in [00:00,01:00) (ctx_from00_00_to01_00_m05_45)
99.997% 0.0 sec
- Cluster 1 (1.181 s) -> 1 anomalies
- Cluster 2 (0.508 s) -> 3 anomalies
- Cluster 3 (0.473 s) -> 4 anomalies
- Cluster 4 (0.658 s) -> 5 anomalies
- Cluster 5 (-) -> no anomalies green
At the end of the execution you can find the results in the results
folder.
Build the docker image.
- Makefile
make docker-build
- Linux:
docker build -t cmp .
Run the docker image.
- Makefile
make docker-run
- Linux:
docker run cmp
At the end of the execution you can find the results in the results
folder.
# 2) User Defined Context
# We want to find all the subsequences that start from 00:00 to 02:00 (2 hours) and covers the whole day
# In order to avoid overlapping we define the window length as the whole day of
# observation minus the context length.
# - Beginning of the context 00:00 AM [hours]
context_start = 17
# - End of the context 02:00 AM [hours]
context_end = 19
# - Context time window length 2 [hours]
m_context = context_end - context_start # 2
# - Time window length [observations]
# m = 96 [observations] - 4 [observation/hour] * 2 [hours] = 88 [observations] = 22 [hours]
# m = obs_per_day - obs_per_hour * m_context
m = 20 # with guess
# Context Definition:
# example FROM 00:00 to 02:00
# - m_context = 2 [hours]
# - obs_per_hour = 4 [observations/hour]
# - context_start = 0 [hours]
# - context_end = context_start + m_context = 0 [hours] + 2 [hours] = 2 [hours]
contexts = GeneralStaticManager([
range(
# FROM [observations] = x * 96 [observations] + 0 [hour] * 4 [observation/hour]
(x * obs_per_day) + context_start * obs_per_hour,
# TO [observations] = x * 96 [observations] + (0 [hour] + 2 [hour]) * 4 [observation/hour]
(x * obs_per_day) + (context_start + m_context) * obs_per_hour)
for x in range(len(data) // obs_per_day)
])
You can cite this work by using the following reference or either though this Bibtex file.
Chiosa, Roberto, et al. "Towards a self-tuned data analytics-based process for an automatic context-aware detection and diagnosis of anomalies in building energy consumption timeseries." Energy and Buildings 270 (2022): 112302.
This code is licensed under the MIT License - see the LICENSE file for details.