Skip to content
This repository has been archived by the owner on Apr 3, 2024. It is now read-only.

Task Fine-Tuning - Datasets, Examples, etc #19

Open
6 tasks
Glavin001 opened this issue Feb 23, 2023 · 3 comments
Open
6 tasks

Task Fine-Tuning - Datasets, Examples, etc #19

Glavin001 opened this issue Feb 23, 2023 · 3 comments

Comments

@Glavin001
Copy link

Glavin001 commented Feb 23, 2023

🎯 Goal: Help a developer go from idea to production-ready custom large-language model in record time!

Problem

In the LLM landscape, LangChain has support for:

There remains a gap for Fine-Tuning support, both education, tooling, and usable examples (like the Prompts in Hub).

When to use fine-tuning?

I found @daveshap 's YouTube video OpenAI Q&A: Finetuning GPT-3 vs Semantic Search - which to use, when, and why? incredibly informative, especially this comparison:

Fine-tuning Semantic Search/Embeddings
Slow, difficult, expensive Fast, easy, cheap
Prone to confabulation Recalls exact information
Teaches new task, not new information Adding new information in a cinch
Requires constant retraining Adding new vectors is easy
Not scalable Infinitely scalable
Does not work for Question-Answering Solve half of Question-Answering

🎯 There is still a purpose to fine-tuning: when you want to teach a new task/pattern.

For example, patterns which fine-tuning helps with:

  • ChatGPT: short user query => long machine answer
  • Email
  • Novel / Fiction

I think Langchain and the community has an opportunity to build tools to make dataset generation easier for fine-tuning, provide educational examples, and also provide ready-made datasets for bootstrapping production-ready applications.

Proposal

  • Recreate examples @daveshap made using Langchain and add results to the Hub!
  • Question-Answering with related docs (from semantic search) and a personality (multiple options?)
  • Debugging issues from code errors
@Glavin001
Copy link
Author

@daveshap : What do you think about this idea? I've been inspired by learning from your YouTube videos recently while using Langchain. I think it would be an incredible win for the community to combine our efforts to building incredible products with LLMs!

@dhruv-anand-aintech
Copy link

Isn't this a matter of integrating SetFit (https://huggingface.co/blog/setfit) into LangChain?

@rreddy-flowinc
Copy link

rreddy-flowinc commented May 5, 2023

would love this incorporation^

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants