We present MS2Mesh-XR, a novel multi-modal sketch-to-mesh generation pipeline that enables users to create realistic 3D objects in extended reality (XR) environments using hand-drawn sketches assisted by voice inputs. In specific, users can intuitively sketch objects using natural hand movements in mid-air within a virtual environment. By integrating voice inputs, we utilize Control-Net to infer realistic images based on the drawn sketches and interpreted text prompts. Users can then review and select their preferred image, which is subsequently reconstructed into a detailed 3D mesh using the Convolutional Reconstruction Model. In particular, our proposed pipeline can generate a high-quality 3D mesh in less than 20 seconds, allowing for immersive visualization and manipulation in run-time XR scenes. Leveraging natural user inputs and cutting-edge generative AI capabilities, our approach can significantly facilitate XR-based creative production and enhance user experiences.
MS2Mesh-XR integrates hand-drawn sketches with voice inputs to rapidly generate realistic 3D meshes for natural user interactions in XR environments.
@misc{tong2024ms2meshxrmultimodalsketchtomeshgeneration,
title={MS2Mesh-XR: Multi-modal Sketch-to-Mesh Generation in XR Environments},
author={Yuqi Tong and Yue Qiu and Ruiyang Li and Shi Qiu and Pheng-Ann Heng},
year={2024},
eprint={2412.09008},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.09008},
}