Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multivariate time series data #65

Open
onchiptech opened this issue May 20, 2024 · 13 comments
Open

Multivariate time series data #65

onchiptech opened this issue May 20, 2024 · 13 comments

Comments

@onchiptech
Copy link

onchiptech commented May 20, 2024

How to pre-train the lag-llama model with multivariate time series data?

For example:

num_time_steps = 300
data = [
{
"start": pd.Timestamp("2020-01-01", freq="D"),
"target": np.random.randn(2, num_time_steps), # Two fields: temperature and humidity
}
]

dataset = ListDataset(data, freq="D", one_dim_target=False)

@RikiSot
Copy link

RikiSot commented May 20, 2024

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df
target	item_id
2021-01-01 00:00:00	-1.3378	A
2021-01-01 01:00:00	-1.6111	A
2021-01-01 02:00:00	-1.9259	A
2021-01-01 03:00:00	-1.9184	A
2021-01-01 04:00:00	-1.9168	A
...	...	...
2021-01-10 19:00:00	1.2349	J
2021-01-10 20:00:00	1.1525	J
2021-01-10 21:00:00	1.1485	J
2021-01-10 22:00:00	1.3248	J
2021-01-10 23:00:00	1.1657	J

Check the Gluonts documentation for more info

@CoCoNuTeK
Copy link

Could you explain to me please, when you mean multivariate, you still have only one target variable that you are predicting but just want to have other independent variables (features) that the target depends on?? To be able to capture more information apart from the patterns of the target variable alone??

@CoCoNuTeK
Copy link

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df
target	item_id
2021-01-01 00:00:00	-1.3378	A
2021-01-01 01:00:00	-1.6111	A
2021-01-01 02:00:00	-1.9259	A
2021-01-01 03:00:00	-1.9184	A
2021-01-01 04:00:00	-1.9168	A
...	...	...
2021-01-10 19:00:00	1.2349	J
2021-01-10 20:00:00	1.1525	J
2021-01-10 21:00:00	1.1485	J
2021-01-10 22:00:00	1.3248	J
2021-01-10 23:00:00	1.1657	J

Check the Gluonts documentation for more info

So the models only appear univariate, but if they are implemented using the Gluonts library, you can always add the static or dynamic extra features?

@onchiptech
Copy link
Author

Could you explain to me please, when you mean multivariate, you still have only one target variable that you are predicting but just want to have other independent variables (features) that the target depends on?? To be able to capture more information apart from the patterns of the target variable alone??

Yes, My dataset is stock prices, there are four inputs (features) open, high, low and close, and the model has to predict future "low"s(single target) only.

@ashok-arjun
Copy link
Contributor

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df
target	item_id
2021-01-01 00:00:00	-1.3378	A
2021-01-01 01:00:00	-1.6111	A
2021-01-01 02:00:00	-1.9259	A
2021-01-01 03:00:00	-1.9184	A
2021-01-01 04:00:00	-1.9168	A
...	...	...
2021-01-10 19:00:00	1.2349	J
2021-01-10 20:00:00	1.1525	J
2021-01-10 21:00:00	1.1485	J
2021-01-10 22:00:00	1.3248	J
2021-01-10 23:00:00	1.1657	J

Check the Gluonts documentation for more info

So the models only appear univariate, but if they are implemented using the Gluonts library, you can always add the static or dynamic extra features?

Could you explain to me please, when you mean multivariate, you still have only one target variable that you are predicting but just want to have other independent variables (features) that the target depends on?? To be able to capture more information apart from the patterns of the target variable alone??

Yes, My dataset is stock prices, there are four inputs (features) open, high, low and close, and the model has to predict future "low"s(single target) only.

The current model unfortunately only supports taking as input as the same variable to be predicted. It does not allow external covariates (other variables) at the moment.

@ashok-arjun
Copy link
Contributor

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df
target	item_id
2021-01-01 00:00:00	-1.3378	A
2021-01-01 01:00:00	-1.6111	A
2021-01-01 02:00:00	-1.9259	A
2021-01-01 03:00:00	-1.9184	A
2021-01-01 04:00:00	-1.9168	A
...	...	...
2021-01-10 19:00:00	1.2349	J
2021-01-10 20:00:00	1.1525	J
2021-01-10 21:00:00	1.1485	J
2021-01-10 22:00:00	1.3248	J
2021-01-10 23:00:00	1.1657	J

Check the Gluonts documentation for more info

You can however load multivariate time series data but at the end to univariate forecasting separately for each variable, precisely as described by @RikiSot.

@CoCoNuTeK
Copy link

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df
target	item_id
2021-01-01 00:00:00	-1.3378	A
2021-01-01 01:00:00	-1.6111	A
2021-01-01 02:00:00	-1.9259	A
2021-01-01 03:00:00	-1.9184	A
2021-01-01 04:00:00	-1.9168	A
...	...	...
2021-01-10 19:00:00	1.2349	J
2021-01-10 20:00:00	1.1525	J
2021-01-10 21:00:00	1.1485	J
2021-01-10 22:00:00	1.3248	J
2021-01-10 23:00:00	1.1657	J

Check the Gluonts documentation for more info

So the models only appear univariate, but if they are implemented using the Gluonts library, you can always add the static or dynamic extra features?

Could you explain to me please, when you mean multivariate, you still have only one target variable that you are predicting but just want to have other independent variables (features) that the target depends on?? To be able to capture more information apart from the patterns of the target variable alone??

Yes, My dataset is stock prices, there are four inputs (features) open, high, low and close, and the model has to predict future "low"s(single target) only.

The current model unfortunately only supports taking as input as the same variable to be predicted. It does not allow external covariates (other variables) at the moment.

I tought given its based on the GluonTS library it might also work with covariates, thanks for the info as i was about to start my preprocessing for covariate forecasting, saved me time.... Is there intention in the future to build a newer model/paper that will allow covariates + model trained on more time series? As transformers are great because you can use the missing values layer aswell so preprocessing is way easier as there is no need to impute values.

@ashok-arjun
Copy link
Contributor

ashok-arjun commented May 23, 2024

So you can pretrain with covariates if you have a large pretraining set, and then finetune with the same covariates. If you have a large enough pretraining set where the covariates are consistent (or at least finite), you can modify the code to add covariates, and the pretrain the model.

But it is not possible to use the released pretrained model downstream with covariates, as we did not consider covariates in our formulation. Building a foundation model that allows for the use of any new covariates for zero-shot inference is a difficult problem; as we've to come up with a design choice that allows to do so.

@onchiptech
Copy link
Author

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df
target	item_id
2021-01-01 00:00:00	-1.3378	A
2021-01-01 01:00:00	-1.6111	A
2021-01-01 02:00:00	-1.9259	A
2021-01-01 03:00:00	-1.9184	A
2021-01-01 04:00:00	-1.9168	A
...	...	...
2021-01-10 19:00:00	1.2349	J
2021-01-10 20:00:00	1.1525	J
2021-01-10 21:00:00	1.1485	J
2021-01-10 22:00:00	1.3248	J
2021-01-10 23:00:00	1.1657	J

Check the Gluonts documentation for more info

You can however load multivariate time series data but at the end to univariate forecasting separately for each variable, precisely as described by @RikiSot.

Thank you @ashok-arjun, forecasting separately for each variable approach works but it might miss out on valuable intra-feature relationship information. Instead of treating each variable independently, I want to create a unified time series by adjusting the frequency. Here’s how I want to proceed:

I have daily frequency data with 4 variables (let’s call them A, B, C, and D). I will hack this data by converting it to a 6-hour frequency, and create timestamps at 6-hour intervals (e.g., 00:00, 06:00, 12:00, 18:00). It will create a new univariate time series that includes adjusted values for A, B, C, and D at 6-hour intervals. But will it work seamlessly with lags?

@CoCoNuTeK
Copy link

So you can pretrain with covariates if you have a large pretraining set, and then finetune with the same covariates. If you have a large enough pretraining set where the covariates are consistent (or at least finite), you can modify the code to add covariates, and the pretrain the model.

But it is not possible to use the released pretrained model downstream with covariates, as we did not consider covariates in our formulation. Building a foundation model that allows for the use of any new covariates for zero-shot inference is a difficult problem; as we've to come up with a design choice that allows to do so.

That means the benchmarks with the other models were done on univariate predictiosn where tis basically testing how effectively can the models capture teh cyclic dependencies of the target variable without any further covariate information?

@CoCoNuTeK
Copy link

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df
target	item_id
2021-01-01 00:00:00	-1.3378	A
2021-01-01 01:00:00	-1.6111	A
2021-01-01 02:00:00	-1.9259	A
2021-01-01 03:00:00	-1.9184	A
2021-01-01 04:00:00	-1.9168	A
...	...	...
2021-01-10 19:00:00	1.2349	J
2021-01-10 20:00:00	1.1525	J
2021-01-10 21:00:00	1.1485	J
2021-01-10 22:00:00	1.3248	J
2021-01-10 23:00:00	1.1657	J

Check the Gluonts documentation for more info

You can however load multivariate time series data but at the end to univariate forecasting separately for each variable, precisely as described by @RikiSot.

Thank you @ashok-arjun, forecasting separately for each variable approach works but it might miss out on valuable intra-feature relationship information. Instead of treating each variable independently, I want to create a unified time series by adjusting the frequency. Here’s how I want to proceed:

I have daily frequency data with 4 variables (let’s call them A, B, C, and D). I will hack this data by converting it to a 6-hour frequency, and create timestamps at 6-hour intervals (e.g., 00:00, 06:00, 12:00, 18:00). It will create a new univariate time series that includes adjusted values for A, B, C, and D at 6-hour intervals. But will it work seamlessly with lags?

You can only load multivariate time series, which means that its still just one variable so no covariates but you can put multiple time series inside of one df for training, but thats different from being able to predict target using covariates as was explained by @ashok-arjun

@ashok-arjun
Copy link
Contributor

You can load a dataframe in long format with the column "item_id" as shown in the Colab Demo 1

import pandas as pd
from gluonts.dataset.pandas import PandasDataset

url = (
    "https://gist.githubusercontent.com/rsnirwan/a8b424085c9f44ef2598da74ce43e7a3"
    "/raw/b6fdef21fe1f654787fa0493846c546b7f9c4df2/ts_long.csv"
)
df = pd.read_csv(url, index_col=0, parse_dates=True)
df
target	item_id
2021-01-01 00:00:00	-1.3378	A
2021-01-01 01:00:00	-1.6111	A
2021-01-01 02:00:00	-1.9259	A
2021-01-01 03:00:00	-1.9184	A
2021-01-01 04:00:00	-1.9168	A
...	...	...
2021-01-10 19:00:00	1.2349	J
2021-01-10 20:00:00	1.1525	J
2021-01-10 21:00:00	1.1485	J
2021-01-10 22:00:00	1.3248	J
2021-01-10 23:00:00	1.1657	J

Check the Gluonts documentation for more info

You can however load multivariate time series data but at the end to univariate forecasting separately for each variable, precisely as described by @RikiSot.

Thank you @ashok-arjun, forecasting separately for each variable approach works but it might miss out on valuable intra-feature relationship information. Instead of treating each variable independently, I want to create a unified time series by adjusting the frequency. Here’s how I want to proceed:

I have daily frequency data with 4 variables (let’s call them A, B, C, and D). I will hack this data by converting it to a 6-hour frequency, and create timestamps at 6-hour intervals (e.g., 00:00, 06:00, 12:00, 18:00). It will create a new univariate time series that includes adjusted values for A, B, C, and D at 6-hour intervals. But will it work seamlessly with lags?

Yes, it would miss out on inter-variable information. But if you ultimately only care about forecasts, you might get great forecasts from just univariate models (which is what a lot of papers show). I'd recommend trying it out anyway.

Yes, that is one idea if you really want to consider inter-variable information. Yes, that would work seamlessly with lags as the "lags" we consider don't rely on the frequency itself; lags of many possible frequencies are considered.

Still, I'd recommend first trying to forecast variables independently and benchmarking with that method, so you can check if the inter-variable information increases forecast accuracy a lot or not.

@ashok-arjun
Copy link
Contributor

So you can pretrain with covariates if you have a large pretraining set, and then finetune with the same covariates. If you have a large enough pretraining set where the covariates are consistent (or at least finite), you can modify the code to add covariates, and the pretrain the model.
But it is not possible to use the released pretrained model downstream with covariates, as we did not consider covariates in our formulation. Building a foundation model that allows for the use of any new covariates for zero-shot inference is a difficult problem; as we've to come up with a design choice that allows to do so.

That means the benchmarks with the other models were done on univariate predictiosn where tis basically testing how effectively can the models capture teh cyclic dependencies of the target variable without any further covariate information?

Yes, that is correct. We limited the scope of this paper to that. But I agree that there's so much more that can be done, which I expect we'll see in future work:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants