feat: 增加qa_generation.py中的加载器，以支持结构化数据的问答对生成 #22

drunken-boat · 2023-05-06T10:00:42Z

DoctorGLM/DoctorGLM/qa_generation.py非常好用！！

我更改了qa_generation.py，用其他领域的结构化数据生成了问答对json。如果也有人遇到这个情况，可以使用如下代码。需要更改file_path和templ

（似乎不能算pr于是写在issue里了，如有不妥欢迎交流！）

from langchain.document_loaders import DataFrameLoader

file_path = 'data.csv'

df = pd.read_csv(file_path)
df.head()

loader = DataFrameLoader(df,page_content_column="from_name")

docs = loader.load()

idx = 0
qa_dict ={}

for d in docs:
	idx += 1
	# text = d.page_content
	# text = d.page_content
	text = d
	= f"""你是一个聪明的助理。

		给你一段xx相关的技术标准，你必须依据表格想出一个问题和一个对应的答案。

		你想出的问题可以被用来测试xx的专业能力。

		你想出的问题和答案必须和所给文本相关。

		当你想出问题和答案后，你必须用以下格式回复：

		```
		[
			"问题": "$你想出的问题放在这",
			"答案": "$你想出的答案放在这"
		]
		```

		所有在 ``` 中间的内容就是你要回答的格式。

		请想出一个问题与一个答案，用以上指定的列表回复，对于以下文本：
		----------------
		{text}"""

	response, history = model.chat(
		 tokenizer, templ, history=[], max_length=2048)

	while_count = 0
	if_good = True
	while ('以下哪' in response) or ('语言模型' in response) or ('文本' in response) or ('以下是' in response):
			response, history = model.chat(
				tokenizer, templ, history=[], max_length=2048)
			while_count += 1
			if while_count > 10:
				if_good = False
				break
	print(response)

	try:
		if if_good:
			question = response.split('答案：')[0][3:]
			answer = response.split('答案：')[1]
			qa = {}
			qa['问题'] = question
			qa['答案'] = answer
			qa_dict[idx] = qa
		else:
			pass
	except:
			pass
	json.dump(qa_dict, open('qa_dict.json', 'w', encoding='utf-8'),
				  indent=4, ensure_ascii=False)
	
print("json加载完成")

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 增加qa_generation.py中的加载器，以支持结构化数据的问答对生成 #22

feat: 增加qa_generation.py中的加载器，以支持结构化数据的问答对生成 #22

drunken-boat commented May 6, 2023

feat: 增加qa_generation.py中的加载器，以支持结构化数据的问答对生成 #22

feat: 增加qa_generation.py中的加载器，以支持结构化数据的问答对生成 #22

Comments

drunken-boat commented May 6, 2023