Task Fine-Tuning - Datasets, Examples, etc #19

Glavin001 · 2023-02-23T23:13:51Z

🎯 Goal: Help a developer go from idea to production-ready custom large-language model in record time!

Problem

In the LLM landscape, LangChain has support for:

✅ Prompt Engineering => langchain-hub/prompts
- ✅ Chains / Agents => langchain-hub/chains
🚧 Semantic Search / Embeddings => Customized Embedding Hub - Examples, Datasets, Pre-Trained Matrices #18
❌ Fine-Tuning

There remains a gap for Fine-Tuning support, both education, tooling, and usable examples (like the Prompts in Hub).

When to use fine-tuning?

I found @daveshap 's YouTube video OpenAI Q&A: Finetuning GPT-3 vs Semantic Search - which to use, when, and why? incredibly informative, especially this comparison:

Fine-tuning	Semantic Search/Embeddings
Slow, difficult, expensive	Fast, easy, cheap
Prone to confabulation	Recalls exact information
Teaches new task, not new information	Adding new information in a cinch
Requires constant retraining	Adding new vectors is easy
Not scalable	Infinitely scalable
Does not work for Question-Answering	Solve half of Question-Answering

🎯 There is still a purpose to fine-tuning: when you want to teach a new task/pattern.

For example, patterns which fine-tuning helps with:

ChatGPT: short user query => long machine answer
Email
Novel / Fiction

I think Langchain and the community has an opportunity to build tools to make dataset generation easier for fine-tuning, provide educational examples, and also provide ready-made datasets for bootstrapping production-ready applications.

Proposal

Recreate examples @daveshap made using Langchain and add results to the Hub!
- Creative Writing Coach (Part 1, Part 2)
- Email Generator
- Write a coherent novel (Part 4)
Question-Answering with related docs (from semantic search) and a personality (multiple options?)
Debugging issues from code errors

Glavin001 · 2023-02-23T23:16:27Z

@daveshap : What do you think about this idea? I've been inspired by learning from your YouTube videos recently while using Langchain. I think it would be an incredible win for the community to combine our efforts to building incredible products with LLMs!

dhruv-anand-aintech · 2023-03-17T07:02:26Z

Isn't this a matter of integrating SetFit (https://huggingface.co/blog/setfit) into LangChain?

rreddy-flowinc · 2023-05-05T18:37:31Z

would love this incorporation^

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task Fine-Tuning - Datasets, Examples, etc #19

Task Fine-Tuning - Datasets, Examples, etc #19

Glavin001 commented Feb 23, 2023 •

edited

Glavin001 commented Feb 23, 2023

dhruv-anand-aintech commented Mar 17, 2023

rreddy-flowinc commented May 5, 2023 •

edited

Task Fine-Tuning - Datasets, Examples, etc #19

Task Fine-Tuning - Datasets, Examples, etc #19

Comments

Glavin001 commented Feb 23, 2023 • edited

Problem

When to use fine-tuning?

Proposal

Glavin001 commented Feb 23, 2023

dhruv-anand-aintech commented Mar 17, 2023

rreddy-flowinc commented May 5, 2023 • edited

Glavin001 commented Feb 23, 2023 •

edited

rreddy-flowinc commented May 5, 2023 •

edited