Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About region caption #48

Open
mu-jin-meng opened this issue May 8, 2024 · 1 comment
Open

About region caption #48

mu-jin-meng opened this issue May 8, 2024 · 1 comment

Comments

@mu-jin-meng
Copy link

the generated results only describe the content and not the answer for the specified prompt.
1715157993410
result:
1552a25d96b1424058997d306117c77

@kanguyen-vn
Copy link

The model wasn't really trained to perform region-level reasoning; it was only train to do region-level captioning. If you look in these region-level dataset classes, they only use the REGION_QUESTIONS and REGION_GROUP_QUESTIONS prompt templates from here as questions for LLM training, and they're all captioning questions. If you want region-level reasoning capabilities, GLaMM might not be the best solution for you. If you don't really need segmentation masks in the output, I'd try something like Shikra, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants