Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle unevenly sampled time series #32

Open
JanJereczek opened this issue Apr 20, 2023 · 5 comments
Open

Handle unevenly sampled time series #32

JanJereczek opened this issue Apr 20, 2023 · 5 comments
Labels
design Library wide design discussion hard

Comments

@JanJereczek
Copy link
Contributor

JanJereczek commented Apr 20, 2023

Is your feature request related to a problem? Please describe.

Empirical data are often unevenly sampled. In the state of the art, an interpolation is often performed but this might bias the result.

In TransitionIndicators.jl we want to offer both options: (1) the user can interpolate their data and pass it or (2) directly pass their unevenly sampled time series, in which case no interpolation happens and all the difficulties are handled internally. (Option 2) might have reduced features compared to (Option 1) but introduces no error.

We won't focus too much on (Option 2) for now but defining the basics for the API to handle it should be a step in near future.

Describe the solution you'd like

Here goes an idea of solution. It does not rely on defining an iterator (maybe a downside?) but has a quite sparse syntax:

using TransitionIndicators

abstract type TimeSeries end

struct EvenspacedTS <: TimeSeries
    nt::Int64
    t::AbstractVector
    spacedim::Int64
    x::AbstractArray
    dt::Real
end

struct UnevenspacedTS <: TimeSeries
    nt::Int64
    t::AbstractVector
    spacedim::Int64
    x::AbstractArray
end

function TimeSeries(t, x)
    nt = length(t)
    spacedim = ndims(x) - 1
    if isequispaced(t)
        dt = mean(diff(t))
        return EvenspacedTS(nt, t, spacedim, x, dt)
    else
        println(
            "\n" *
            " Unevenly spaced time series are work in progress. " * 
            "If you spot bugs, please open an issue on GitHub!\n " * 
            "Bare in mind that unevenly spaced time series come with limited functionalities! \n"
        )
        return UnevenspacedTS(nt, t, spacedim, x)
    end
end

struct WindowView
    width::Int
    stride::Int
    windows::Vector{UnitRange{Int}}
end

function window_view(x, wv::WindowView)
    return [view(x, window) for window in wv.windows]
end

function WindowView(ts::EvenspacedTS; width::Int = default_window_width(ts), stride::Int = default_window_stride(ts))
    si = width:stride:ts.nt
    windows = [k-width+1:k for k in si]
    return WindowView(width, stride, windows)
end
default_window_width(ts::EvenspacedTS) = ts.nt ÷ 100
default_window_stride(ts::EvenspacedTS) = 1

function WindowView(ts::UnevenspacedTS; twidth::Real = default_window_width(ts), stride::Int = default_window_stride(ts))
    ibegin = find_nearest(twidth, ts.t)
    if ibegin < 1
        ibegin = 1
    end
    windows = [findspan(t, twidth, ts.t) for t in ts.t[ibegin:stride:end]]
    return WindowView(0, stride, windows)
end
default_window_width(ts::UnevenspacedTS) = last(ts.t) / 100
default_window_stride(ts::UnevenspacedTS) = 1

function findspan(t, twidth, t_uneven)
    i1 = find_nearest(t-twidth, t_uneven)
    i2 = find_nearest(t, t_uneven)
    if i1 == i2
        error("Time series not dense enough before t=$t for the chosen window width")
    else
        return i1:i2
    end
end

function find_nearest(t, tvec::AbstractVector)
    return argmin((tvec .- t).^2)
end

t = 0:0.1:100
x = sin.(t)
ts = TimeSeries(t, x)
wv = WindowView(ts)
y = zeros(eltype(x), length(wv.windows))
@btime map!(var, y, window_view(x, wv))

tu = cumsum(0.2 .* rand(length(t)))
xu = sin.(t)
tsu = TimeSeries(tu, xu)
wvu = WindowView(tsu)
yu = zeros(eltype(x), length(wvu.windows))
@btime map!(var, yu, window_view(xu, wvu))

Describe alternatives you've considered

Additional context

One open question is whether the window width on unevenly time series should be defined by the number of points or by a time span. I think the latter one is more rigorous.

@JanJereczek JanJereczek added the design Library wide design discussion label Apr 21, 2023
@Datseris
Copy link
Member

Datseris commented Apr 21, 2023

Oof, i've thought about this extensively but I haven't written anything here. Sorry, it would have saved you a lot of effort...

From my eyes, the plan is to create a package TimeseriesViewers.jl that defines a Timeseries type and then has on top of it the viewers, e.g. windowed viewers or sub-scale viewers.

Timeseries is an AbstractVector, a wrapper, that wraps a vector and a time vector. It has two concrete subtypes, an evenly and unevenly sampeld. The goal is to allow to ways to access Timeseries: by index, or by time. We can rely on Interpolations.jl to access the unevenly sampled version via interpolating.

The WindowViewers then produce views over timeseries with a given window an stride, in real time units, or integer units for convenience if the timeseris is evenly sampled.

The main takeaway here is that TransitionIndicators.jl will not care at all whether the timeseries is evenly sampled or not, because it gets in a Timeseries and a window viewer, with windows always defined in real time units. Thus, TransitionIndicators.jl works as it does now being agnostic to the nature of the timeseries. I think this is the best design.

@Datseris
Copy link
Member

Datseris commented Apr 21, 2023

it is unfortunate that you went though a lot of time to write your code, but I really think the best way forwards is to just rely on Interpolations.jl for creating windows of an unevenly sampled timeseries...

@JanJereczek
Copy link
Contributor Author

From a design perspective I agree with you. This would be the easiest. However, one might want to avoid interpolation in some cases, as it biases e.g. the AR1-regression coefficient and the power density spectrum.

Going the way you propose means closing that option, but it might be a choice we want to make. Let's talk about it live at some point.

@Datseris
Copy link
Member

However, one might want to avoid interpolation in some cases, as it biases e.g. the AR1-regression coefficient and the power density spectrum

Not sure what you mean here. Avoiding interpolation when? The user can choose to not construct the interpolation object, and rather just given in an abstract vector directly, in which case no interpolation would/coould happen.

@JanJereczek
Copy link
Contributor Author

We can rely on Interpolations.jl to access the unevenly sampled version via interpolating.

I am referring to this sentence. I understood that you (always) want to access unevenly-spaced time series by interpolations. But maybe that was not your point. Let's talk about it live to make sure we understand each other.

@Datseris Datseris added the hard label Oct 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Library wide design discussion hard
Projects
None yet
Development

No branches or pull requests

2 participants