Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some doubts after testing Champ #72

Open
ZZfive opened this issue Apr 16, 2024 · 8 comments
Open

Some doubts after testing Champ #72

ZZfive opened this issue Apr 16, 2024 · 8 comments
Assignees

Comments

@ZZfive
Copy link

ZZfive commented Apr 16, 2024

Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned.

grid_wguidance.mp4

Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset.

91D23ZVV6NS.mp4

Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement.

In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent.
image

However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters.

animation.mp4

Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks

@zhou-linpeng
Copy link

Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned.

grid_wguidance.mp4
Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset.

91D23ZVV6NS.mp4
Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement.

In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent. image

However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters.

animation.mp4
Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks

Can you show your grid_wguidance.mp4 results, I think it's your conditional map flickering that causes results like this

@ZZfive
Copy link
Author

ZZfive commented Apr 16, 2024

Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned.
grid_wguidance.mp4
Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset.
91D23ZVV6NS.mp4
Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement.
In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent. image
However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters.
animation.mp4
Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks

Can you show your grid_wguidance.mp4 results, I think it's your conditional map flickering that causes results like this

Since the reference image used was not found to be an RGBA image, error happened due to a size mismatch when saving grid_wguidance.mp4. Therefore, grid_wguidance.mp4 could not be provided as above. I just discovered this problem. After converting the reference image to RGB, got grid_wguidance.mp4, as shown below. As you guessed, grid_wguidance.mp4 has serious flickering. I follow the doc to perform the data preprocessing process. What problems may cause the flickering in the condition map?

grid_wguidance.mp4

@zhanghongyong123456
Copy link

zhanghongyong123456 commented Apr 16, 2024

Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned.
grid_wguidance.mp4
Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset.
91D23ZVV6NS.mp4
Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement.
In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent. image
However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters.
animation.mp4
Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks

Can you show your grid_wguidance.mp4 results, I think it's your conditional map flickering that causes results like this

Since the reference image used was not found to be an RGBA image, error happened due to a size mismatch when saving grid_wguidance.mp4. Therefore, grid_wguidance.mp4 could not be provided as above. I just discovered this problem. After converting the reference image to RGB, got grid_wguidance.mp4, as shown below. As you guessed, grid_wguidance.mp4 has serious flickering. I follow the doc to perform the data preprocessing process. What problems may cause the flickering in the condition map?

grid_wguidance.mp4

My results are also particularly flashing. This is my grid..mp4

grid_wguidance.mp4

@faiimea
Copy link

faiimea commented Apr 16, 2024

I followed the data_process process for each step, and both the background flicker and the facial distortion appeared in my generated video. At the same time, I used the transferd_result processed by data_process and the reference image provided in the source code for video generation, and the above problems also occurred. I suspect that the alignment of the video with the image may be causing the problem.

I want to know if there is any way to solve the facial distortion problem and the flicker of the background. Also, I want to ask what images are stored under the 'champ/transferd_result/visualized_imgs' path. Now what I observe is a superposition of normal_image and reference image, but I don't know what that means. Please let me know if I did something wrong that caused the visualized_imgs error.

@zhou-linpeng
Copy link

Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned.
grid_wguidance.mp4
Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset.
91D23ZVV6NS.mp4
Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement.
In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent. image
However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters.
animation.mp4
Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks

Can you show your grid_wguidance.mp4 results, I think it's your conditional map flickering that causes results like this

Since the reference image used was not found to be an RGBA image, error happened due to a size mismatch when saving grid_wguidance.mp4. Therefore, grid_wguidance.mp4 could not be provided as above. I just discovered this problem. After converting the reference image to RGB, got grid_wguidance.mp4, as shown below. As you guessed, grid_wguidance.mp4 has serious flickering. I follow the doc to perform the data preprocessing process. What problems may cause the flickering in the condition map?

grid_wguidance.mp4

grid_wguidance_anyone.mp4

Here is my result, you can apply some deflicker methods to your condition maps

@ZZfive
Copy link
Author

ZZfive commented Apr 16, 2024

Without data preprocessing, a random picture is used as ref_image and the provided motion_6 for inference. The result is as follow. The consistency of the character's movements is very good, but the character's face is greatly damaged. It should be due to the lack of preprocessing and the human body information in ref_image and the figure in motion are not aligned.
grid_wguidance.mp4
Because the paper mentioned that champ was tested on the UBC fashion dataset, in order to test the data preprocessing process, the following video was selected as the guidance motion from the UBC fashion dataset.
91D23ZVV6NS.mp4
Based on the data preprocessing doc, after completing the environment setup, the required depth, normal, semantic_map and dwpose features can be successfully obtained from the motion guidance video. But I encountered a problem. The obtained semantic_map was missing two frames for some reason. Have you encountered this during data preprocessing? Because the 14s motion guidance video has a total of 422 frames, the difference between the two frames before and after is small. For the two missing frames in semantic_map, directly copy the previous frame to supplement.
In the figure below, the left side is the first frame of the guidance motion video, its size is 960×1254. The right side is the reference image, its size is 451×677. The middle is the depth in the first frame of the guidance motion video after data preprocessing, you can see that the image size is aligned to 451×677, and the human body parts are also more consistent. image
However, using the preprocessed data based on the above reference image and guidance motion video for inference, the result is very bad, as shown below. There is a lot of jitter in the video, and there are serious distortions in the faces and bodies of the characters.
animation.mp4
Can somebody tell me the reason for the poor performance or provide some suggestions for improvement? Thanks

Can you show your grid_wguidance.mp4 results, I think it's your conditional map flickering that causes results like this

Since the reference image used was not found to be an RGBA image, error happened due to a size mismatch when saving grid_wguidance.mp4. Therefore, grid_wguidance.mp4 could not be provided as above. I just discovered this problem. After converting the reference image to RGB, got grid_wguidance.mp4, as shown below. As you guessed, grid_wguidance.mp4 has serious flickering. I follow the doc to perform the data preprocessing process. What problems may cause the flickering in the condition map?
grid_wguidance.mp4

grid_wguidance_anyone.mp4
Here is my result, you can apply some deflicker methods to your condition maps

What deflicker methods can i try? Can you tell me?

@faiimea
Copy link

faiimea commented Apr 16, 2024

The first video uses the ref-07.png and motion-02
The second video uses the ref-07.png and processed video

322714326-39fcd7dc-9795-462a-9409-9a5f35141ca1.mp4
grid_wguidance.mp4

And the face distortion like this:
截屏2024-04-16 下午3 14 17

@subazinga
Copy link
Contributor

We will release a SMPL smoothing feature soon, maybe this week, to solve the flicker problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants