Some question about model trained on 768 size TED dataset #54

Zenobia7 · 2022-08-01T07:06:19Z

First of all, thank you very much for providing the code, but I have encountered some small problems in the process of retraining, so I would like to ask you how to deal with it. Questions to consult are as follows:
1、why reconstruction mode and train model with almost same L1 loss value?
2、Using the 768 size TED dataset, it is normal that some parts with more detailed information, such as hands and faces, are not recovered too well. If the current situation occurs, can you help to provide some solutions?
3. When the motion trend is obvious, the optical flow map is not very accurate.
4. Are there any precautions that need to take in preparing new dataset?
The above are all my questions at present. Looking forward to your reply

AliaksandrSiarohin · 2022-08-01T21:12:42Z

Hi, sorry but your questions is really confusing:

I don't get the question. There is no L1 in train mode.
What is 768?
Could you provide example?
Depends on what objects will be in the new dataset.

Zenobia7 · 2022-08-04T08:24:55Z

I used the reconstruction results of train mode to calculate the L1 loss and the reconstruction results of avd mode are almost the same, so I think avd mode is not effective
I cut TED dataset with 768*768 size
The new dataset is based on half-speaker video objects. Some videos of the new dataset are below,The new data sets are highly heterogeneous and diverse
https://user-images.githubusercontent.com/28126038/182800076-b9e4dea5-d927-41cd-ab7d-038e2cfccbf3.mp4
https://user-images.githubusercontent.com/28126038/182800140-632904d1-27e7-4a4a-9ec2-142fc59e01b5.mp4
https://user-images.githubusercontent.com/28126038/182800340-c7f54217-72a0-4a01-99d4-6cd7c4ec64e9.mp4

3.train mode visualization Results

0gks6ceq4eQ.004737.004870.mp4.mp4

avd mode visualization Results

0gks6ceq4eQ.004737.004870.mp4.mp4

train log visualization

Is it convenient for you to provide the training log? I want to compare it with my log. Thank you. Is there anything unclear

AliaksandrSiarohin · 2022-08-04T08:35:35Z

Reconstruction does not make sense for avd, since it specifically designed for cross identity, where the shapes of the objects could be different.
There are no explicit handling of parts that is not visible most of the time, I guess you will have to device some way of handling that.
I can't see what bothers you in optical flow map.
Unfortunately I don't have logs anymore.

Zenobia7 · 2022-08-05T02:50:06Z

Reconstruction does not make sense for avd, since it specifically designed for cross identity, where the shapes of the objects could be different.

There are no explicit handling of parts that is not visible most of the time, I guess you will have to device some way of handling that.

I can't see what bothers you in optical flow map.

Unfortunately I don't have logs anymore.

Thank you for your prompt reply.

Since there is no problem with the optical flow diagram, does it mean that there will be a problem that the details of the reconstruction are not clear? Is the reason that the reconstruction details are not clear is that the generator is not strong enough or the information of the optical flow diagram is not fully utilized?
Do you think it is OK for me to use half-speaker videos with complex background and inconsistent height in my self-built data set? It seems to me that Loss is decreasing rapidly at present, and then it will not decrease