Tencent unveils its latest AI model HunyuanWorld-Voyager, which only needs one image to transform it into an interactive virtual world.
In a technical paper and a GitHub contribution, Tencent reveals the AI model. The Voyager model converts still images into 3D worlds in which you can ‘move around’. Voyager enables 3D reconstruction without traditional reconstruction processes.
From one Image to 3d
To train the model, Tencent collected more than 100,000 video clips from various datasets. Voyager creates a series of RGB and depth videos from a single image and integrates visual and geometric information to build a virtual world that moves with you. You can view some demos via the GitHub page.
The output that Voyager produces is not technically ‘real’ 3D. Voyager creates a series of short 2D frames, which are stitched together to mimic the illusion of a 3D world. The cache grows automatically as more frames are generated. Invisible and modified points are added, and redundant information is removed. This keeps the world you can ‘walk around’ in consistent.
A notable feature of Voyager is the simultaneous generation of RGB and depth images. This eliminates the need for a separate 3D reconstruction process. Thanks to this mechanism, developers can immediately use 3D content in applications such as simulations, virtual environments, and digital product presentations.
According to Tencent, Voyager can also be used for applications such as 3D style transfer, video depth estimation, or creating virtual worlds for training and simulation. Tencent claims with benchmarks that its model scores high on camera control, spatial consistency, and visual quality.
Genie-us
The code is openly available via GitHub and Hugging Face, but Tencent doesn’t just release the model freely. The company’s license restrictions exclude the European Union, the United Kingdom, and South Korea. For commercial applications that can reach more than 100 million users, Tencent imposes additional license conditions.
Tencent’s Voyager seems very similar to Google Genie 3, unveiled by Google last month. Google Genie 3 designs the virtual world in real-time as you move, and can also remember your previous steps.