Replies: 1 comment 4 replies
-
Either CogVLM or CogAgent seems to have the strongest general performance.
I think both formats are fine. The captioning model might not always follow the format you specify, though. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there,
Im going to caption my big dataset ( 50K HD image ) for my realistic SDXL checkpoint.
I would like to know for you, what's the best llm model for caption this kind of image ( basically human doing an action / pose in a specific locations ) and also if I should have this format of captions : "girl, bedroom, bikini, sit down, bed, black hair, plant in the background etc.... " instead of " the image show a girl in a bedroom sit down on a bed with plant in the background etc.... " for better results.
thanks !
Beta Was this translation helpful? Give feedback.
All reactions