LLM Finetuning For Amharic Ad Generation

About

This project aims to fine-tune an llm so that it can understand the Amharic language and create an Advertisement in Amharic given a brand information, product brief. It'll utilize messages exported from 25 publicly available channels to extend the pre-training phase of the model as well as fine-tune the model to generate ads later on.

Usage

At this point in time, you'll need the raw data of the channel messages in a directory named data/raw. Then you can follow the following steps to clean the data and make it appropriate for the model:

pip install -r requirements.txt
inside parse_and_save.ipynb, run the function process_raw_data to get only the necessary data from the raw data which are id, text, date
inside cleaning.ipynb run the function clean_parsed_data to get the cleaned data which has removed emojis, symbols, newlines, extra spaces

To test the inference of the model being used you'll need to follow this steps:

Accept Llama2 license on huggingface and download it like this:

git lfs install
git clone https://huggingface.co/meta-llama/Llama-2-7b-hf

Download the amharic finetune from huggingface like this:

git lfs install
git clone https://huggingface.co/iocuydi/llama-2-amharic-3784m

Clone this github repository
Then inside inference/run_inf.py:

change the MAIN_PATH to the path to folder you downloaded from step 1
change the peft_model to the path you cloned in the step 2
Go to your llama2 folder(from step 1) and replace the tokenizer related files with the one you find from the 2nd step
set quanitzation=True inside the main function before the load_model function call

Finally run the inference/run_inf.py file

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.vscode		.vscode
Front_end		Front_end
Frontend		Frontend
RAG		RAG
cache/TIKVAH_text		cache/TIKVAH_text
fine_tuning		fine_tuning
flask-backend		flask-backend
model		model
notebooks		notebooks
output/logs		output/logs
playground		playground
rag		rag
src		src
wandb		wandb
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
requirements2.txt		requirements2.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Finetuning For Amharic Ad Generation

About

Usage

References

About

Releases

Packages

Languages

abdimussa87/LLM_Finetuning_For_Amharic_Ad_Generation

Folders and files

Latest commit

History

Repository files navigation

LLM Finetuning For Amharic Ad Generation

About

Usage

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages