Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

怎么在我自己的数据集上推理? #30

Open
DestoryVIP opened this issue Nov 24, 2021 · 3 comments
Open

怎么在我自己的数据集上推理? #30

DestoryVIP opened this issue Nov 24, 2021 · 3 comments

Comments

@DestoryVIP
Copy link

首先感谢您们的工作,我在尝试使用MSVD的预训练模型权重在我自己准备的数据集上进行视频摘要推理。

我首先使用resnet152提取我自己的数据集的高维特征,然后组装成npy。
image
之后按照msvd数据集中的caption_test.json,填入我自己数据集中相应的id和file_name,(因为我只要进行推理任务,而且不需要评估,所以annotation为空)。
image

之后将正确的路径填写到config/video_caption/msvd/base_caption.yaml,但是现在它运行报错,没有1297.npy这个文件。
所以我想提问,为什么我修改了caption_test.json,它还是不读取我的caption_test.json里的数据?

@DestoryVIP
Copy link
Author

我想请问一下,你们的pkl文件中代表的内容是什么?video_id,tokens_ids,target_ids是什么意思?

@HanielF
Copy link

HanielF commented Jan 31, 2022

但是现在它运行报错,没有1297.npy这个文件

You can check line 80 in xmodaler/datasets/videos/msvd.py, and replace it with feat_path = os.path.join(self.feats_folder, 'video' + video_id + '.npy').

Part of code as follows:

······
    def __call__(self, dataset_dict):
        dataset_dict = copy.deepcopy(dataset_dict)
        video_id = dataset_dict['video_id']

        feat_path  = os.path.join(self.feats_folder, video_id + '.npy')
        content = read_np(feat_path)
······

@winnechan
Copy link
Collaborator

我想请问一下,你们的pkl文件中代表的内容是什么?video_id,tokens_ids,target_ids是什么意思?

If you want to do the inference with your own data, you should preprocess your *.json files with the provided script https://github.com/YehLi/xmodaler/blob/master/tools/msvd_preprocess.py to generate the *.pkl files according to your *.json files since the class MSVDDataset only read *.pkl files

datalist = pickle.load(open(self.anno_file, 'rb'), encoding='bytes')

video_id is a pre-defined unique id for each video (you can customize this for your own data), token_ids is the vocabulary index sequence of the previous n-1 input words (assuming that the total number of words in the sentence is n) based on the vocabulary, and the target_ids is the the vocabulary index sequence of the last n-1 input words as the ground-truth predictions for training, such that we can train the model to predict the i-th word given the previous i-1 words.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants