Nimos
Well-Known Member
Some might remember the video I posted in another thread. Now someone has improved the ability to add voice to them as well, based on a single image.
We proposed EMO, an expressive audio-driven portrait-video generation framework. Input a single reference image and the vocal audio, e.g. talking and singing, our method can generate vocal avatar videos with expressive facial expressions, and various head poses, meanwhile, we can generate videos with any duration depending on the length of input video.
It's difficult not to be impressed by it and wonder what this will be like in 5 years given the speed this is going.
We proposed EMO, an expressive audio-driven portrait-video generation framework. Input a single reference image and the vocal audio, e.g. talking and singing, our method can generate vocal avatar videos with expressive facial expressions, and various head poses, meanwhile, we can generate videos with any duration depending on the length of input video.
It's difficult not to be impressed by it and wonder what this will be like in 5 years given the speed this is going.