About LF-VILA code in PatchEmbed3D of video encoder #36

musicman217 · 2024-03-23T02:42:55Z

the padding seems not right, or maybe i made a mistake

# padding
        _, _, D, H, W = x.size() 
        if H % self.patch_size[0] != 0: 
            x = F.pad(x, (0, 0, 0, self.patch_size[1] - H % self.patch_size[1]))
        if W % self.patch_size[1] != 0:
            x = F.pad(x, (0, 0, 0, 0, 0, self.patch_size[0] - D % self.patch_size[0]))

owing to patch_size=[1, 8, 8] where 8x8 is HxW in implementation, should it be padded in H and W dimension?
condition H % self.patch_size[0] != 0 and W % self.patch_size[1] != 0 make me lost
thanks a lot!

The text was updated successfully, but these errors were encountered:

musicman217 changed the title ~~About LF-VILA code in PatchEmbed3D of vidoe encoder~~ About LF-VILA code in PatchEmbed3D of video encoder Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About LF-VILA code in PatchEmbed3D of video encoder #36

About LF-VILA code in PatchEmbed3D of video encoder #36

musicman217 commented Mar 23, 2024 •

edited

About LF-VILA code in PatchEmbed3D of video encoder #36

About LF-VILA code in PatchEmbed3D of video encoder #36

Comments

musicman217 commented Mar 23, 2024 • edited

musicman217 commented Mar 23, 2024 •

edited