Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data Preprocessing] Questions about the data preprocessing procedure. #2

Open
customtiy13 opened this issue Oct 24, 2023 · 4 comments

Comments

@customtiy13
Copy link

I would greatly appreciate it if you could elaborate on how to process the dataset.

In the Datasets section, it says that all datasets are processed as a sliding window view, and the format is composed of 4 numpy.ndarray objects.

Could you explain what these "x,y,x_offset,y_offset" mean? or better yet, release the preprocessing code.

Thank you very much for your time and attention to my inquiries.

@Echo-Ji
Copy link
Owner

Echo-Ji commented Oct 24, 2023

Hi, thank you for your attention. Here's a refined explanation:

  • x represents historical data samples, with each sample having the shape (#lookback_window, #nodes, #flow_types).
  • y denotes the labels, which are composed of future data samples. Each y sample has the shape (#predict_horizon, #nodes, #flow_types).
  • x_offset indicates the offsets related to the lookback window of x. For instance, if the current time is 10:00, the offset of 8:00 is -2 when working on an hourly basis but -4 when dealing with 30-minute intervals. It's important to note that we consider the most recent time index as having a 0 offset.
  • y_offset represents the offsets for the prediction horizon of y, which is typically set to 1 in our configuration.

For instance, if you are using data from the previous 12 time steps to forecast the next one, the offsets for x and y should be [-11, -10, -9, -8, -7, -6, -5, -4, -3, -2, -1, 0] and [1], respectively.

I hope this explanation addresses your question. If you have further questions, please do not hesitate to ask me. 😄

@Echo-Ji Echo-Ji closed this as completed Nov 4, 2023
@Echo-Ji Echo-Ji reopened this Mar 4, 2024
@jnjnjmjm
Copy link

Hi, I have some questions about the x_offsets sequence. I would appreciate it if you spend some time to answer my question.
I am debugging the code on NYCBike_1 dataset, and get a x_offsets sequence like [-73, -72, -71, -70, -69, -49, -48, -47, -46, -45, -25, -24, -23, -22, -21, -3, -2, -1, 0].
According to the description in the paper, I think the first 15 timesteps are data of past 3 days. In this case, the last 4 timesteps will be past 4 hours' data, rather than previous 2-hour data as described in the paper.
This problem made me confused. Is this part of the paper described accurately? Or I misunderstood the meaning of data?
Thank you for your time to address my doubts.

@Echo-Ji
Copy link
Owner

Echo-Ji commented Apr 16, 2024

Hi, I have some questions about the x_offsets sequence. I would appreciate it if you spend some time to answer my question. I am debugging the code on NYCBike_1 dataset, and get a x_offsets sequence like [-73, -72, -71, -70, -69, -49, -48, -47, -46, -45, -25, -24, -23, -22, -21, -3, -2, -1, 0]. According to the description in the paper, I think the first 15 timesteps are data of past 3 days. In this case, the last 4 timesteps will be past 4 hours' data, rather than previous 2-hour data as described in the paper. This problem made me confused. Is this part of the paper described accurately? Or I misunderstood the meaning of data? Thank you for your time to address my doubts.

You are right. It is previous 2-hour data in other three datasets, but 4-hour data in NYCBike_1 dataset.

@zhangruiouc
Copy link

Hello, when I was looking at the relevant configuration file of NYCBike2 NYCBike2.yaml, I found that input_length time 8 + 9 * 3 = 35, if you use the traffic flow information of the first two hours of the current moment and the traffic flow information near the current moment in the previous three days, according to the sampling rate of the dataset is 30min, isn't the input_length 2h/30min+9*3=31? Hope to get your answer, thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants