Best LLM model for captionning big dataset #59

BenDes21 · 2024-02-25T10:53:00Z

BenDes21
Feb 25, 2024

Hi there,

Im going to caption my big dataset ( 50K HD image ) for my realistic SDXL checkpoint.

I would like to know for you, what's the best llm model for caption this kind of image ( basically human doing an action / pose in a specific locations ) and also if I should have this format of captions : "girl, bedroom, bikini, sit down, bed, black hair, plant in the background etc.... " instead of " the image show a girl in a bedroom sit down on a bed with plant in the background etc.... " for better results.

thanks !

jhc13 · 2024-02-28T16:52:58Z

jhc13
Feb 28, 2024
Maintainer

what's the best llm model for caption this kind of image

Either CogVLM or CogAgent seems to have the strongest general performance.

and also if I should have this format of captions : "girl, bedroom, bikini, sit down, bed, black hair, plant in the background etc.... " instead of " the image show a girl in a bedroom sit down on a bed with plant in the background etc.... " for better results

I think both formats are fine. The captioning model might not always follow the format you specify, though.

4 replies

BenDes21 Feb 28, 2024
Author

Thanks ! Unfortunately my setup is not enough powerful for Cog models, do you think llava 1.6 can be ok for realistic datasets of girl models ? Thanks

jhc13 Feb 28, 2024
Maintainer

I haven't tested LLaVA 1.6 much yet, but I heard it's good. It will probably work fine for your relatively simple use case. I am still waiting for it to be supported in Transformers so I can add it to TagGUI.

BenDes21 Feb 29, 2024
Author

thanks! do you know when it's will be available for taggui ? :)

jhc13 Feb 29, 2024
Maintainer

I'm waiting for it to be added to Transformers (pull request).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best LLM model for captionning big dataset #59

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Best LLM model for captionning big dataset #59

BenDes21 Feb 25, 2024

Replies: 1 comment · 4 replies

jhc13 Feb 28, 2024 Maintainer

BenDes21 Feb 28, 2024 Author

jhc13 Feb 28, 2024 Maintainer

BenDes21 Feb 29, 2024 Author

jhc13 Feb 29, 2024 Maintainer

BenDes21
Feb 25, 2024

Replies: 1 comment 4 replies

jhc13
Feb 28, 2024
Maintainer

BenDes21 Feb 28, 2024
Author

jhc13 Feb 28, 2024
Maintainer

BenDes21 Feb 29, 2024
Author

jhc13 Feb 29, 2024
Maintainer