Skip to content

PyTorch code for the CVPR'2020 paper "Screencast Tutorial Video Understanding"

Notifications You must be signed in to change notification settings

KunpengLi1994/PsTuts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Screencast Tutorial Video Understanding

PyTorch code for the paper "Screencast Tutorial Video Understanding" [pdf], which is accepted by CVPR 2020. Project page

Introduction

Screencast tutorials are videos created by people to teach how to use software applications or demonstrate procedures for accomplishing tasks. It is very popular for both novice and experienced users to learn new skills, compared to other tutorial media such as text, because of the visual guidance and the ease of understanding. In this paper, we propose visual understanding of screencast tutorials as a new research problem to the computer vision community. We collect a new dataset of Adobe Photoshop video tutorials and annotate it with both low-level and high-level semantic labels. We introduce a bottom-up pipeline to understand Photoshop video tutorials. We leverage state-of-the-art object detection algorithms with domain specific visual cues to detect important events in a video tutorial and segment it into clips according to the detected events. We propose a visual cue reasoning algorithm for two high-level tasks: video retrieval and video captioning. We conduct extensive evaluations of the proposed pipeline. Experimental results show that it is effective in terms of understanding video tutorials. We believe our work will serves as a starting point for future research on this important application domain of video understanding.

pipeline

The general structure of our visual cue reasoning (VCR) method for text-to-video retrieval and tutorial video captioning is shown as follows. The tutorial encoding is generated considering correlations between different visual cues as well as video frames.

model

Text-Tutorial Clip Retrieval

Code, extracted feature, pretrained model and doc for text-to-tutorial clip retrieval task are in the matching_code/ folder.

Tutorial Clip Captioning

Code, extracted feature, pretrained model and doc for tutorial clip captioning task are in the captioning_code/ folder.

Source Data

The source videos as well as annotations can be downloaded from: https://drive.google.com/drive/folders/1osWW6dnsnvlWNseOtivIdhdpVct1r38x?usp=sharing, where "video_clips.zip" include video clips after the temporal segmentation and "whole_video.zip" includes the original complete tutorials.

Reference

If you found this code useful, please cite the following paper:

@inproceedings{li2020pstuts,
  title={Screencast Tutorial Video Understanding},
  author={Li, Kunpeng and Fang, Chen and Wang, Zhaowen and Kim, Seokhwan and Jin, Hailin and Fu, Yun},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}

License

Apache License 2.0

About

PyTorch code for the CVPR'2020 paper "Screencast Tutorial Video Understanding"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published