Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'm having a decoding problem while converting data. #112

Open
ZZZZZZZZeng opened this issue Nov 28, 2022 · 1 comment
Open

I'm having a decoding problem while converting data. #112

ZZZZZZZZeng opened this issue Nov 28, 2022 · 1 comment
Assignees

Comments

@ZZZZZZZZeng
Copy link

When I convert the yelp data set on windows,I'm having a decoding problem while converting data。

Traceback (most recent call last):
File "run.py", line 40, in
datasets.convert_inter()
File "D:\学业\研究生\数据集\数据集转换程序\RecSysDatasets-master\conversion_tools\src\extended_dataset.py", line 4581, in convert_inter
for _ in fin:
UnicodeDecodeError: 'gbk' codec can't decode byte 0x8b in position 1909: illegal multibyte sequence

@Sherry-XLL Sherry-XLL self-assigned this Feb 7, 2023
@Sherry-XLL
Copy link
Member

@ZZZZZZZZeng Hello, thanks for your attention to our repository.

UnicodeDecodeError is occurred because the data format and platform do not match. The default encoding of Python depends on the platform. If it is in a Windows platform, the default encoding is gbk. While the file is encoded by utf-8, this error will be reported. The solution is to add encoding='utf-8' where you report this error.

Please comment if you have further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants