COSTI

Article: https://ieeexplore.ieee.org/abstract/document/10032331

COSTI performs classification of sequences of temporal intervals, with or without intensity values.

Example sequence from Musekey dataset (intensity value = velocity of a key press)

You can use COSTI when your data could be expressed as:

sequence	channel	start	end	intensity
int <0-inf>	int <1-inf>	float	float	float, optional
0	1	20.3	20.7	122.3
0	1	20.7	65.4	66
0	2	15	40.2	0.3
1	3	0	28.3	2736.3
...

Events for the same sequence in the same channel should not overlap. All sequences are treated as if they started from zero. Data does not need to be sorted or normalized.

Usage example

import time
import numpy as np
from sklearn.linear_model import RidgeClassifierCV

import costi
import load_data
from train_test_split import cross_validate
from transform_to_input_format import transform_to_input_format


if __name__ == '__main__':

    x, y = load_data.load_musekey_unconstrained()

    cv_folds = cross_validate(x, y, 10)
    x_train, y_train, x_test, y_test = cv_folds[0]

    start_time = time.time()

    train_timestamps, train_channels, train_values, train_examples_s = transform_to_input_format(x_train)
    test_timestamps, test_channels, test_values, test_examples_s = transform_to_input_format(x_test)
    
    max_sequence_duration = max(np.max(train_timestamps), np.max(test_timestamps))

    x_train_transformed, parameters = costi.fit_and_transform_train(
        train_timestamps, train_channels, train_values, train_examples_s, max_sequence_duration)

    classifier = RidgeClassifierCV(alphas=np.logspace(-3, 3, 10), normalize=True)
    classifier.fit(x_train_transformed, y_train)

    x_test_transformed = costi.transform_test(
        test_timestamps, test_channels, test_values, test_examples_s, parameters)

    score = classifier.score(x_test_transformed, y_test)
    elapsed_time = time.time()-start_time

    print(f"Accuracy: {score}")
    print(f"Time: {elapsed_time}")

Input format

For example input files, see data. Data from files can be loaded with load_data.py, and then transformed with transform_to_input_format.py, like in the example above. Alternatively, you can create

train_timestamps, train_channels, train_values, train_examples_s
test_timestamps, test_channels, test_values, test_examples_s

on your own. Please note that these are 1-d vectors, so the information about sequences is concatenated. train_timestamps is all timestamps where any value changes (an event starts or ends). train_channels is a number of a channel (counting from 0) where this change happens. train_values is the value of the change. So, if an event with intensity value equal to 15 begins, the value will be 15, but if it ends, it will be -15. If your data does not contain intensity values, set all values to 1 and -1, depending on whether the timestamps is of event start or of event end. Finally, train_examples_s is a sequence of indexes where information about i-th sequence starts. So, train_examples_s[0] is always equal to zero (start index of the first sequence), train_examples_s[1] is the index where data for the second sequence begins and train_examples_s[n]=train_timestamps.shape[0]=train_channels.shape[0]=train_values.shape[0].

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
musekey_example.png		musekey_example.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COSTI

Usage example

Input format

About

Releases

Packages

Languages

License

JakubBilski/costi

Folders and files

Latest commit

History

Repository files navigation

COSTI

Usage example

Input format

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages