How to generate pose.pkl for other datasets? #2

AndrewChiyz · 2021-02-25T03:42:25Z

Hi, thank you for releasing the code.

I have a few questions on 3D pose data preparation. According to the paper, the 3D pose of images in DeepFashion and iPER datasets are predicted by using “MeTRAbs, Metric-Scale Truncation-Robust Heatmaps for Absolute 3D Human Pose Estimation“, I wonder how to generate the same data structure as you did on DeepFashion and iPER. For example, when I load the poses.pkl of iPER dataset, and I use the following script to read the data,

import pickle
with open('poses.pkl', 'rb') as fb:
pose_data = pickle.load(fb)

print(pose_data.keys())
print(pose_data['001'].keys())
print(pose_data['001']['24'].keys())
print(pose_data['001']['24']['1'].keys())
print(len(pose_data['001']['24']['1']['0001']))

Output:
dict_keys(['001', '002', '003', '004', '005', '006', '007', '008', '009', '010', '011', '012', '013', '014', '015', '016', '017', '018', '019', '020', '021', '022', '023', '024', '025', '026', '027', '028', '029', '030'])
dict_keys(['10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '1', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '2', '30', '31', '32', '33', '3', '4', '5', '6', '7', '8', '9'])
dict_keys(['1', '2'])
dict_keys(['0001', '0002', '0003', '0004', '0005', '0006', '0007', '0008', '0009', '0010', '0011', '0012', '0013', '0014', '0015', '0016', '0017', '0018', '0019', '0020', '0021', '0022', '0023', '0024', '0025', '0026', '0027', '0028', '0029', '0030', '0031', '0032', '0033', '0034', '0035', '0036', '0037', '0038', '0039', '0040', '0041', '0042', '0043', '0044', '0045', '0046', '0047', '0048', '0049', '0050', '0051', '0052', '0053', '0054', '0055', '0056', '0057', '0058', '0059', '0060', '0061', '0062', '0063', '0064', '0065', '0066', '0067', '0068', '0069', '0070', '0071', '0072', '0073', '0074', '0075', '0076', '0077', '0078', '0079', '0080', '0081', '0082', '0083', '0084', '0085', '0086', '0087', '0088', '0089', '0090', '0091', '0092', '0093', '0094', '0095', '0096', '0097', '0098', '0099', '0100', '0101', '0102', '0103', '0104', '0105', '0106', '0107', '0108', '0109', '0110', '0111', '0112', '0113', '0114', '0115', '0116', '0117', '0118', '0119', '0120', '0121', '0122', '0123', '0124', '0125', '0126', '0127', '0128', '0129', '0130', '0131', '0132', '0133', '0134', '0135', '0136', '0137', '0138', '0139', '0140', '0141', '0142', '0143', '0144', '0145', '0146', '0147', '0148', '0149', '0150', '0151', '0152', '0153', '0154', '0155', '0156', '0157', '0158', '0159', '0160', '0161', '0162', '0163', '0164', '0165', '0166', '0167', '0168', '0169', '0170', '0171', '0172', '0173', '0174', '0175', '0176', '0177', '0178', '0179', '0180', '0181', '0182', '0183', '0184', '0185', '0186', '0187', '0188', '0189', '0190', '0191', '0192', '0193', '0194', '0195', '0196', '0197', '0198', '0199', '0200', '0201', '0202', '0203', '0204', '0205', '0206', '0207', '0208', '0209', '0210', '0211', '0212', '0213', '0214', '0215', '0216', '0217', '0218', '0219', '0220', '0221', '0222', '0223', '0224', '0225', '0226', '0227', '0228', '0229', '0230', '0231', '0232', '0233', '0234', '0235', '0236', '0237', '0238', '0239', '0240', '0241', '0242', '0243', '0244', '0245', '0246', '0247', '0248', '0249', '0250', '0251', '0252', '0253', '0254', '0255', '0256', '0257', '0258', '0259', '0260', '0261', '0262', '0263', '0264', '0265', '0266', '0267', '0268', '0269', '0270', '0271', '0272', '0273', '0274', '0275', '0276', '0277', '0278', '0279', '0280', '0281', '0282', '0283', '0284', '0285', '0286', '0287', '0288', '0289', '0290', '0291', '0292', '0293', '0294', '0295', '0296', '0297', '0298', '0299', '0300', '0301', '0302', '0303', '0304', '0305', '0306', '0307', '0308', '0309', '0310', '0311', '0312', '0313', '0314', '0315', '0316', '0317', '0318', '0319', '0320', '0321', '0322', '0323', '0324', '0325', '0326', '0327', '0328', '0329', '0330', '0331', '0332', '0333', '0334', '0335', '0336', '0337', '0338', '0339', '0340', '0341', '0342', '0343', '0344', '0345', '0346', '0347', '0348', '0349', '0350', '0351', '0352', '0353', '0354', '0355', '0356', '0357', '0358', '0359', '0360', '0361', '0362', '0363', '0364', '0365', '0366', '0367', '0368', '0369', '0370', '0371', '0372', '0373', '0374', '0375', '0376', '0377', '0378', '0379', '0380', '0381', '0382', '0383', '0384', '0385', '0386', '0387', '0388', '0389', '0390', '0391', '0392', '0393', '0394', '0395', '0396', '0397', '0398', '0399', '0400', '0401', '0402', '0403', '0404', '0405', '0406', '0407', '0408', '0409', '0410', '0411', '0412', '0413', '0414', '0415', '0416', '0417', '0418', '0419', '0420', '0421', '0422', '0423', '0424', '0425', '0426', '0427', '0428', '0429', '0430', '0431', '0432', '0433', '0434', '0435', '0436', '0437', '0438', '0439', '0440', '0441', '0442', '0443', '0444', '0445', '0446', '0447', '0448', '0449', '0450', '0451', '0452', '0453', '0454', '0455', '0456', '0457', '0458', '0459', '0460', '0461', '0462', '0463', '0464', '0465', '0466', '0467', '0468', '0469', '0470', '0471', '0472', '0473', '0474', '0475', '0476', '0477', '0478', '0479', '0480', '0481', '0482', '0483', '0484', '0485', '0486', '0487', '0488', '0489', '0490', '0491', '0492', '0493', '0494', '0495', '0496', '0497', '0498', '0499', '0500', '0501', '0502', '0503', '0504', '0505', '0506', '0507', '0508', '0509', '0510', '0511', '0512', '0513', '0514', '0515', '0516', '0517', '0518', '0519', '0520', '0521', '0522', '0523', '0524', '0525', '0526', '0527'])
19

So, the iPER dataset contains videos for 30 people, 2 videos are collected for each person. The first video contains 527 frames. For each frame, 19 3d keypoints are generated. What is the meaning of the second output? BTW, how to generate the poses.pkl data file for Deepfashion dataset or other custom dataset?

Sorry for the simple questions.

Thank you!

MKnoche · 2021-02-25T08:56:23Z

The second output is the clothing, some people in iPER wear several different clothing styles.

The keys in poses.pkl follow the same structure as the dataset, For iPER, a frame is saved as <data_dir>/<person>/<clothing>/<video>/<frame>.png, and the corresponding pose is poses['<person>'][<clothing>][<video>][<frame>].

AndrewChiyz · 2021-02-25T09:28:15Z

Thanks for your prompt reply! :)

As mentioned in the paper, there are 103 clothing styles in the iPER dataset, so why the poses.pkl data only contain 33 clothing styles? Does it mean the training set is split by clothing styles but not by persons?

I wonder which MeTRAbs pre-trained model have you applied. I try to generate 3D pose estimation results by using the metrabs_multiperson_smpl_combined model (provided by https://github.com/isarandi/metrabs). It outputs a very large value in the third dimension of the 3D keypoints, say 1800~2000, but I have checked the poses.pkl, the third dimension values just range from -20 to 30. So I wonder is there any settings related to that difference?

Thank you!

MKnoche · 2021-02-25T11:51:42Z

Person 001 has 33 clothing styles, the other persons have a different number of clothing styles. For training, reposing is only performed between pairs which show the same person and the same clothing style.

MeTRAbs returns the result in the camera coordinate system in meters, this must be transformed into pixel space before training or applying the model. You get the x and y coordinates by projecting the 3D pose to the image using the camera's intrinsic parameters. The z coordinate is scaled with the same factor as x and y and then shifted, such that the mean z coordinate of a pose is 0.

isarandi · 2021-02-25T17:51:31Z

Hi Andrew, thanks for your interest in our work! I can give you a few more details to how the poses were generated, as I did this part of the work. There were actually multiple pose estimation models involved.

But first of all, in principle, you can use any sufficiently high-quality 3D human pose estimation method to create pseudo-ground truth if no poses are available for your dataset.

The exact model I used to generate these particular poses has not been released, because the development of MeTRAbs ran in parallel to this work. However, for completeness, the steps were as follows:

A MeTRAbs variant was trained with a ResNet-v2-152 backbone on Human3.6M, MPI-INF-3DHP, CMU-Panoptic and 3DPW for 3D supervision and COCO+MPII for 2D supervision. The "variant" refers to the fact that this model had a minor modification compared to the final MeTRAbs paper (the last ResNet block was not shared between the 2D and 3D prediction branches, but this turned out to be unnecessary).
The model from Step 1 was then applied to the iPER dataset with 10-crop test-time augmentation to generate "pseudo-ground truth" poses.
To generate poses.pkl, the absolute 3D poses obtained in Step 2 were converted into a 2.5D-like representation, with pixel-based XY coordinates and the Z coordinate corresponding to depth (actually a logarithmic representation, but this is less important). The Z coordinate was centered to be zero-mean. These are the poses that guide the warping and go into the pose encoder.
Steps 1-3 were also applied to the Fashion dataset, however here the model training was augmented with upper and lower body image cropping, since the Fashion dataset contains images, where not the whole body is visible (as opposed to iPER, where all images are full-body).
A MeTRo (root-relative) model with a ResNet-v2-50 backbone was trained on iPER pseudo-ground truth from Step 2 plus COCO and MPII for 2D supervision. This model is used for evaluation of the generated images regarding how similar the estimated pose is compared to the pose predicted from the ground truth image. This is the model, whose checkpoint is linked in the readme.

In principle, I could share these models as well, but they require somewhat different code from that in the MeTRAbs repo (as I've been converting a lot of it to TensorFlow 2). Furthermore, the released 3DPW Challenge winning version of MeTRAbs is actually stronger than the preliminary model used in this work. One important issue, though, is that the currently released model in the MeTRAbs repo only predicts the SMPL joints (as those were needed for 3DPW), while this pose warping work benefits a lot from the COCO/CMU-Panoptic facial keypoints too (eyes, ears, nose) for correctly handling head orientation. But I will release a COCO-joints based MeTRAbs model in the near future too.

AndrewChiyz · 2021-02-26T03:34:13Z

Thanks a lot for the detailed reply! @isarandi @MKnoche :)

So the MeTRAbs model for 3D pose estimation is trained with minor modification and the 3D pseudo-ground-truth pose is estimated by ensembling results with 10-crop testing-time data augmentation. Then a MeTRo model is trained by using the pseudo gt annotations on iPER plus COCO and MPII for 2D supervision.

Could you please provide more details on the evaluation protocol, for example, how to split the dataset into Train and Test sets (for iPER and DeepFashion, respectively)? and how to get the quantitative results in terms of SSIM and LPIPS in Table 2? Are these results calculated by averaging the values (say the SSIM) of all the generated images or frames in the test set?

Thank you!

AndrewChiyz changed the title ~~Training~~ How to generate pose.pkl for other datasets? Feb 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to generate pose.pkl for other datasets? #2

How to generate pose.pkl for other datasets? #2

AndrewChiyz commented Feb 25, 2021 •

edited

MKnoche commented Feb 25, 2021

AndrewChiyz commented Feb 25, 2021

MKnoche commented Feb 25, 2021 •

edited

isarandi commented Feb 25, 2021 •

edited

AndrewChiyz commented Feb 26, 2021

How to generate pose.pkl for other datasets? #2

How to generate pose.pkl for other datasets? #2

Comments

AndrewChiyz commented Feb 25, 2021 • edited

MKnoche commented Feb 25, 2021

AndrewChiyz commented Feb 25, 2021

MKnoche commented Feb 25, 2021 • edited

isarandi commented Feb 25, 2021 • edited

AndrewChiyz commented Feb 26, 2021

AndrewChiyz commented Feb 25, 2021 •

edited

MKnoche commented Feb 25, 2021 •

edited

isarandi commented Feb 25, 2021 •

edited