Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection score of the segment #42

Open
zhouzhou-zheng opened this issue May 22, 2024 · 2 comments
Open

Detection score of the segment #42

zhouzhou-zheng opened this issue May 22, 2024 · 2 comments

Comments

@zhouzhou-zheng
Copy link

When testing, I input a 150s video into the model.

Test Scenario 1: The input video is of a woman dancing, and the query text is "a woman is dancing." The model correctly detects the corresponding segment, which meets expectations.

Test Scenario 2: The input video does not contain any clips of a woman dancing; it is just a video of a woman sitting on a chair. The query text is "a woman is dancing," yet the model still detects a corresponding segment, which does not meet expectations.

Test Scenario 3: The input is a combination of videos from Scenario 1 and Scenario 2. The query text is "a man is playing basketball." There are no men or basketball scenes in the video, but among the top 10 results, there are still segments with high scores.

My question is, for a test video and a query text, is there always a highly scored positive segment detected? What is the reason for this phenomenon? Is it because during your training, each video always has at least one segment that corresponds to the query text as a positive example?

@wjun0830
Copy link
Owner

When testing, I input a 150s video into the model.

Test Scenario 1: The input video is of a woman dancing, and the query text is "a woman is dancing." The model correctly detects the corresponding segment, which meets expectations.

Test Scenario 2: The input video does not contain any clips of a woman dancing; it is just a video of a woman sitting on a chair. The query text is "a woman is dancing," yet the model still detects a corresponding segment, which does not meet expectations.

Test Scenario 3: The input is a combination of videos from Scenario 1 and Scenario 2. The query text is "a man is playing basketball." There are no men or basketball scenes in the video, but among the top 10 results, there are still segments with high scores.

My question is, for a test video and a query text, is there always a highly scored positive segment detected? What is the reason for this phenomenon? Is it because during your training, each video always has at least one segment that corresponds to the query text as a positive example?

Great point! the mentioned problem isnt addressed in this work.
You may want to try our new model https://github.com/wjun0830/CGDETR
This work may partially address the mentioned problem!

@wjun0830
Copy link
Owner

And we agree with your opinion that the problem is because the input videos always include video segments that correspond to text queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants