MS2Mesh-XR: Multimodal Sketch-to-Mesh Generation in XR Environments

Abstract

We present MS2Mesh-XR, a novel multi-modal sketch-to-mesh generation pipeline that enables users to create realistic 3D objects in extended reality (XR) environments using hand-drawn sketches assisted by voice inputs. In specific, users can intuitively sketch objects using natural hand movements in mid-air within a virtual environment. By integrating voice inputs, we utilize Control-Net to infer realistic images based on the drawn sketches and interpreted text prompts. Users can then review and select their preferred image, which is subsequently reconstructed into a detailed 3D mesh using the Convolutional Reconstruction Model. In particular, our proposed pipeline can generate a high-quality 3D mesh in less than 20 seconds, allowing for immersive visualization and manipulation in run-time XR scenes. Leveraging natural user inputs and cutting-edge generative AI capabilities, our approach can significantly facilitate XR-based creative production and enhance user experiences.

Pipeline

MS2Mesh-XR integrates hand-drawn sketches with voice inputs to rapidly generate realistic 3D meshes for natural user interactions in XR environments.

Results

BibTeX

@misc{tong2024ms2meshxrmultimodalsketchtomeshgeneration,
      title={MS2Mesh-XR: Multi-modal Sketch-to-Mesh Generation in XR Environments}, 
      author={Yuqi Tong and Yue Qiu and Ruiyang Li and Shi Qiu and Pheng-Ann Heng},
      year={2024},
      eprint={2412.09008},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.09008}, 
}