Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

自定义的数据集,一直报错,sad.. #3665

Closed
MissQueen opened this issue May 9, 2024 · 1 comment
Closed

自定义的数据集,一直报错,sad.. #3665

MissQueen opened this issue May 9, 2024 · 1 comment
Labels
solved This problem has been already solved.

Comments

@MissQueen
Copy link

数据集是这样的:

[
{
"instruction": "回答以下问题:你对计算机科学的未来有何看法?\n",
"output": "我认为计算机科学的未来有可能会导致两个极端的发展方向,取决于我们如何管理正在进行的技术进步。一方面,计算机科学将进一步促进我们在各个领域的创新和发展,改变我们的生命方式和行业。另一方面,人工智能、机器学习和大数据的发展可能带来负面影响,例如自动化会导致工作失业和社会不平等等问题。\n在技术进步方面,我认为以下是计算机科学未来的发展趋势:\n1. 人工智能和机器学习:随着深度学习和神经网络等技术的发展,机器学习将成为计算机科学的主流领域。人工智能将在医疗、金融、制造业、交通等领域中发挥重要作用,自动化将大大帮助提高效率和产量。\n2. 区块链:作为比特币的支持技术,区块链正在为所有行业提供更大的安全性和透明度。预计它将在物流、金融、政府等领域中发挥重要作用。\n3. 量子计算:量子计算机的出现将使计算速度比经典计算机快得多,这将改变我们处理数据和信息的方式。这将为医疗、材料科学、加密和能源等领域提供更准确和更快的解决方案。\n但是,这些发展也可能带来一些负面影响,例如自动化和机器学习可能导致一些工作和行业流失,中小型企业将很难应对这些变化,从而导致社会的不平等。此外,随着人工智能和机器学习如此普及,保护个人隐私的压力也将增加。\n因此,我认为我们需要采取措施来最大程度地减轻这些负面影响,例如:\n1. 技能转换和教育:为了适应自动化和技术更替,我们需要投资于教育和培训,以确保人们拥有最新的技能和知识。\n2. 政策制定:政府应该采取措施帮助受到技术变化影响的社区和行业。\n3. 隐私和数据管理:政府和企业必须采取措施保护用户数据。只有对数据的透明度和保护,才能在计算机科学的进展中取得成功。\n最后,我认为我们应该采取一种平衡的方式来管理计算机科学的发展。我们需要明确看到技术带来的好处,同时也要以人类需求为导向,谨慎地平衡技术的发展和社会的责任。"
},
{
"instruction": "编写一个函数,将一个字符串中的所有单词的首字母转换为大写。\nhello world, how are you today?",
"output": "首先,我们需要将字符串拆分为单词,这可以使用 Python 的 split() 函数来完成。然后,我们可以遍历所有单词并将其首字母转换为大写,使用 Python 的 capitalize() 函数来完成。最后,我们可以使用 join() 函数将所有单词连接成一个字符串。\n下面是一个实现该功能的 Python 函数:\n\ndef capitalize_words(string):\n words = string.split() #将字符串拆分为单词\n capitalized_words = [word.capitalize() for word in words] #遍历所有单词并将它们的首字母转换为大写\n return ' '.join(capitalized_words) #将所有单词连接成一个字符串\n\n使用上面的函数,我们可以将给定的字符串 "hello world, how are you today?" 转换成 "Hello World, How Are You Today?",如下所示:\n\nstring = \"hello world, how are you today?\"\ncapitalized_string = capitalize_words(string)\nprint(capitalized_string) #输出:Hello World, How Are You Today?\n\n希望这能帮助你!"
}
]
#####在dataset_info.json里面添加的内容是这样的:
"security_tongyong_1": {
"file_name": "security_tongyong_cqia_ultrainteract_240419_r4_199999.json",
"columns": {
"prompt": "instruction",
"response": "output"
}
}
#####load的时候一直报这个错:
`
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py", line 121, in _generate_table
s
pa_table = paj.read_json(
File "pyarrow/_json.pyx", line 308, in pyarrow._json.read_json
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: JSON parse error: Column() changed from object to array in row 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/datasets/builder.py", line 1973, in _prepare_split_single
for _, table in generator:
File "/opt/conda/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py", line 161, in _generate_table
s
pa_table = pa.Table.from_pydict(mapping)
File "pyarrow/table.pxi", line 1812, in pyarrow.lib._Tabular.from_pydict
File "pyarrow/table.pxi", line 5275, in pyarrow.lib._from_pydict
File "pyarrow/array.pxi", line 374, in pyarrow.lib.asarray
File "pyarrow/array.pxi", line 344, in pyarrow.lib.array
File "pyarrow/array.pxi", line 42, in pyarrow.lib._sequence_to_array
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowTypeError: Expected bytes, got a 'dict' object

`
心态崩了....哪里出错了T^T

@hiyouga
Copy link
Owner

hiyouga commented May 9, 2024

@hiyouga hiyouga added the solved This problem has been already solved. label May 9, 2024
@hiyouga hiyouga closed this as completed May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved.
Projects
None yet
Development

No branches or pull requests

2 participants