| dc.description.abstract |
The term ”text-to-motion generation” pertains to the procedure of producing sequences
of human motion by using textual input. The work at hand presents considerable challenges,
mostly stemming from the wide range of potential motion, the inherent sensitivity of human
perception to such motion, and the inherent complexities associated in effectively articulat-
ing and characterising it. The existing generative methods for text-to-motion synthesis are
characterised by either substandard quality or constrained expressiveness. To overcome this,
we introduce, a diffusion model-based framework for text-driven motion generation. The
model presents multiple benefits: Firstly, it employs probabilistic mapping to generate mo-
tions by refining inputs through denoising processes while introducing variations. Secondly,
it features multi-level manipulation capabilities, allowing it to interpret detailed instructions
regarding body parts and facilitate the synthesis of motions of various lengths in response to
text prompts that change over time. The performance of model is evaluated using metrics
such as FID (Fréchet Inception Distance), Diversity, and MultiModality, which measure the
quality and diversity of generated motion samples |
en_US |