Hackers News

Structured 3D Latents for Scalable and Versatile 3D Generation

Structured 3D Latents
for Scalable and Versatile 3D Generation

* Generated by TRELLIS, using its image to 3D assets cabilities.

TL;DR: A native 3D generative model built on a unified Structured Latent representation and Rectified Flow Transformers,
enabling versatile and high-quality 3D asset creation.

We introduce a novel 3D generation method for versatile and high-quality 3D asset creation.
The cornerstone is a unified Structured LATent (SLAT) representation which allows decoding to different output formats, such as Radiance Fields, 3D Gaussians, and meshes.
This is achieved by integrating a sparsely-populated 3D grid with dense multiview visual features extracted from a powerful vision foundation model,
comprehensively capturing both structural (geometry) and textural (appearance) information while maintaining flexibility during decoding.
We employ rectified flow transformers tailored for SLAT as our 3D generation models and train models with up to 2 billion parameters on a large 3D asset dataset of 500K diverse objects.
Our model generates high-quality results with text or image conditions, significantly surpassing existing methods, including recent ones at similar scales.
We showcase flexible output format selection and local 3D editing capabilities which were not offered by previous models.
Code, model, and data will be released.

NOTE: The appearance and geometry shown in this page are rendered from 3D Gaussians and meshes, respectively.
GLB files are extracted by baking appearance from 3D Gaussians to meshes.

Generation | Text to 3D Asset

All text prompts are generated by GPT-4. Click on the cards to view extracted GLB files.

Generation | Image to 3D Asset

Image prompts are either generated by DALL-E 3 or extracted from SA-1B. Click on the cards to view extracted GLB files.

TRELLIS
can generates variants of a given 3D asset coherent with given text prompts.

Editing | Local Manipulation

TRELLIS
can manipulate targeted local regions of a given 3D asset according to given text or image prompts.

Application | 3D Art Designs

Compositing the high-quality 3D assets generated by TRELLIS,
complex and vibrant 3D art designs can be created with ease.

We introduce Structured LATents (SLAT),
a unified 3D latent representation for high-quality, versatile 3D generation. SLAT
marries sparse structures with powerful visual representations. It defines local latents on active voxels intersecting the object’s surface.
The local latents are encoded by fusing and processing image features from densely rendered views of the 3D asset, while attaches them onto active voxels.
These features, derived from powerful pretrained vision encoders, capture detailed geometric and visual characteristics, complementing the coarse structure provided by the active voxels.
Different decoders can then be applied to map SLAT to diverse 3D representations of high quality.

Building on SLAT, we train a family of large 3D generation models, dubbed TRELLIS, with text prompts or images as conditions.
A two stage pipeline is applied which first generates the sparse structure of SLAT, followed by generating the latent vectors for non-empty cells.
We employ rectified flow transformers as our backbone models and adapt them properly to handle the sparsity in SLAT.
We train Trellis with up to 2 billion parameters on a large dataset of carefully-collected 3D assets.
TRELLIS can create high-quality 3D assets with detailed geometry and vivid texture, significantly surpassing previous methods.
Moreover, it can easily generate 3D assets with different output formats to meet diverse downstream requirements.

If you find our work useful, please consider citing:

@article{xiang2024structured,
title = {Structured 3D Latents for Scalable and Versatile 3D Generation},
author = {Xiang, Jianfeng and Lv, Zelong and Xu, Sicheng and Deng, Yu and Wang, Ruicheng and
Zhang, Bowen and Chen, Dong and Tong, Xin and Yang, Jiaolong},
journal = {arXiv preprint arXiv:2412.01506},
year = {2024}
}

admin

The realistic wildlife fine art paintings and prints of Jacquie Vaux begin with a deep appreciation of wildlife and the environment. Jacquie Vaux grew up in the Pacific Northwest, soon developed an appreciation for nature by observing the native wildlife of the area. Encouraged by her grandmother, she began painting the creatures she loves and has continued for the past four decades. Now a resident of Ft. Collins, CO she is an avid hiker, but always carries her camera, and is ready to capture a nature or wildlife image, to use as a reference for her fine art paintings.

Related Articles

Leave a Reply