Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Would the model work on videos where there are multiple actions and these are to be temporally detected? #10

Open
burhanusman opened this issue Dec 18, 2018 · 1 comment

Comments

@burhanusman
Copy link

Hi,
Nice project.
I was wondering would the model perform well on videos where an agent performs multiple actions throughout the video. Especially when these actions might have an inherent order (like receiving the bill and paying cash).

I am also confused with output. Does the model output one label for a set of 32 frames?
Thanks a lot!

@seiffertacfr
Copy link

Hi,
Sorry I missed your comment. Yes the output is currently a single label for 32 frames, although the input doesn't necessarily have to be 32 frames long.
To approach your problem, you could have the model constantly running, starting a new sequence every n frames (ie overlapping windows of frames) and outputting a sequence of the most likely actions it is observing. However, in order to include any high level knowledge, such as inherent action order, you would need to use this model as part of a higher order system, such as a Hidden Markov Model or Markov Chain, where the system learns or is told prior how various states, or 'actions' are related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants