SMPL motion question #93

moo1o · 2024-04-24T06:34:03Z

hello. Thanks to referring to the SMPL & Rendering document, I was able to obtain good motion information. I have two questions here:

Reason or meaning of fitting video and reference image to obtain SMPL information
The motion sequence rendered with blender using the obtained video SMPL is different from the motion of the actual video. What is the reason? Also, how can we make it as similar to actual video motion as possible?

Leoooo333 · 2024-04-26T03:45:12Z

Hi @moo1o . Thanks for your good questions! Both are important for the story of Champ.

Question 1:

The SMPL used here is to enhance the 3D consistency of the condition maps. There are also many great work to directly predict normal, depth, and so on from a single RGB Image. But these work somehow ignore the temporal and 3D consistency. For example, when we do SMPL fitting to a video, the adjacent frames maintain the geometry consistency for human body: you can find a whole body with a head and two hands🕺, instead of flickered condition maps without 3D constraints on human from those prediction networks.
SMPL is way more than a mesh. It has been widely used in both industry and academy. So the use of SMPL extends Champ's downstream applications. Like there are many Text(audio)-2-SMPL, Text-to-Texture map work, which could easily integrated into Champ at inference time. If you are familiar with Blender, you could do lots of creative things with Champ. You can try this CEB Blender adds-on.

Question 2:

Yeah they are not perfectly aligned. You may refer to works of Human Mesh Recovery to follow the SOTA. But actually the 4D-Humans we use in Champ has enough accuracy and robustness, except the hands and faces expression. If you want more realistic motions, you may try directly predict depth, normal, and use the semantic map from SMPL.

AricGamma assigned subazinga Apr 26, 2024

Provide feedback