Food3D: Text-Driven Customizable 3D Food Generation
with Gaussian Splatting

Qian Jiang ① Shaowen Yao ① Shuqiang Jiang ②

① Yunnan University

② Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences

🔗 Code Link 🧑‍🔬 Research Group Homepage

Abstract

Realistic 3D food creation is essential for nutritional assessment, advertising, and other applications. The existing text-to-3D models initialize a 3D representation and employ a text-to-image model as supervision to render it, ultimately obtaining the final 3D representation. In this work, we introduce a Food3D that is a novel framework for 3D food generation to address two main limitations of current models. First, the Limitations of Initialization in 3D Generation: Poor initialization can result in the generated 3D food lacking crucial details and realism, thereby reducing its quality. To solve this, we propose a general method based on Mamba initialization, named Food3D-G. This method improves the initialization process and enhances the overall quality of generated 3D food. Second, the Limitations of Text-to-Image Models: Current text-to-3D models often use text-to-image models for supervision. However, there is a significant gap between the images generated by these models and real images, particularly for complex foods. They fail to capture the fine details and textures, which affects the quality of the generated 3D food. To address this, we propose a customizable method for generating personalized 3D food, named Food3D-C. Food3D-C utilizes a dual-branch diffusion model to capture more details, improving results for complex foods. In Food3D, both proposed methods utilize 3D Gaussian splatting (3D GS) and a Schedulable Interval Score Matching (S-ISM) algorithm to improve shape and texture generation. Extensive experiments show that Food3D achieves state-of-the-art performance, with significant improvements in detail, shape accuracy, and realism.

Network

Food3D-C

Comparison between Food3D-C and baselines methods.

DreamFusion Magic3D Fantasia3D LucidDreamer Food3D-C(Ours)

a photo of Scrambled eggs with tomatoes, a classic Chinese dish.

a photo of Kung Pao Chicken, a classic Chinese dish.

a photo of Fried shredded pork with green pepper, a classic Chinese dish .

a photo of Stir-fried potato shreds with green pepper , a classic Chinese dish .

Food3D-G

The following figure shows the 3D results initialized with different methods. This result indicates that the food generated by other methods differs from the prompt and struggles to match it. However, our MambaInit produced better result.

Comparison between MambaInit and baselines methods.

Shap-E Point-E DreamInit MambaInit(Ours)

A hamburger and a cup of cola on a plate.

A hamburger and a bag of French fries placed on a plate.

Bread and milk on a plate.

A glass of cola and a glass of juice.

Comparison between Food3D-C and baselines methods.

Comparison between Food3D and baseline methods.

The top two rows show the comparison results of Food3D-C with baselines, while the bottom two rows show the comparison results of Food3D-G with baselines.

Comparison between Food3D-C and image-to-3D baseline methods.

Detailed information within the red boxes has been magnified.

Food3D: Text-Driven Customizable 3D Food Generation with Gaussian Splatting

Abstract

Network

Food3D-C

Comparison between Food3D-C and baselines methods.

DreamFusion Magic3D Fantasia3D LucidDreamer Food3D-C(Ours)

Food3D-G

Comparison between MambaInit and baselines methods.

Shap-E Point-E DreamInit MambaInit(Ours)

Comparison between Food3D-C and baselines methods.

Comparison between Food3D and baseline methods.

Comparison between Food3D-C and image-to-3D baseline methods.

Food3D: Text-Driven Customizable 3D Food Generation
with Gaussian Splatting