Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data_loader.py中的数据标准化问题 #414

Closed
tjnkyqcy opened this issue May 21, 2024 · 2 comments
Closed

data_loader.py中的数据标准化问题 #414

tjnkyqcy opened this issue May 21, 2024 · 2 comments

Comments

@tjnkyqcy
Copy link

data_loader.py文件中:
if self.scale:
train_data = df_data[border1s[0]:border2s[0]]
self.scaler.fit(train_data.values)
data = self.scaler.transform(df_data.values)
为什么只对训练数据train_data 计算标准化参数,而用这些参数对全体数据df_data进行了标准化?难道不是用全体数据计算fit然后对全体数据进行标准化么?

@ztxtech
Copy link

ztxtech commented May 21, 2024

因为模型拟合的训练集。如果按照全体数据集标准化,标准化参数改变,模型拟合效果变差,因为拟合的不是同一个目标。

@wuhaixu2016
Copy link
Collaborator

感谢解答,这里主要的考虑是为了避免【数据泄露】,因为在实际应用过程中,我们只知道训练数据,未知测试数据,及时是测试数据的mean和std也是无法拿到的,所以只在训练数据集上进行归一化信息的提取。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants