EPSilon proposes an efficient point sampling strategies in the avatar generation based on the monocular video, which results in comparable results to the state-of-the-art models while reducing the inference latency significantly.
Along with the input image, we visualize the reconstructed image with RGB representation and mesh representation. Moreover, we show the depth image of the mesh, and obtained
In the figure above, we visualize the novel view generation and novel pose generation of four subjects in People Snapshot datasets. While achieving 20 times faster rendering speed compared to the baseline, our model robustly generates the novel contents of given subjects.