Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity

https://export.arxiv.org/abs/2009.02119

motion, GAN, graphics, KAIST, ETRI, 2020