Dongjian Yu1, Weiqing Min2 3, Qian Jiang1, Xing Lin1, Xin Jin1 * , Shuqiang Jiang2 3
1 Yunnan University 2 State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences 3 University of Chinese Academy of Sciences * Corresponding author
Accepted to CVPR 2026 (Highlight) πππ
Paper Dataset CodeAccurate estimation of food nutrition plays a vital role in promoting healthy dietary habits and personalized diet management. However, most existing food datasets primarily focus on Western cuisines and often rely on depth sensors, limiting their applicability in real-world scenarios.
To address these challenges, we introduce OmniFood8K, a large-scale multimodal dataset containing 8,036 real-world food scenes with detailed nutritional annotations and multi-view images. Our dataset significantly expands coverage of Chinese cuisine and supports practical RGB-only nutrition estimation.
Overview of the OmniFood8K dataset: data collection process and category distribution.
Additionally, we construct NutritionSynth-115K, a large-scale synthetic dataset introducing compositional variations while preserving accurate nutritional annotations, enabling robust model training.
We propose an end-to-end framework for predicting nutritional information from a single RGB image.
Extensive experiments across OmniFood8K and Nutrition5K datasets demonstrate that our method outperforms existing approaches, providing a practical solution for daily dietary assessment.
For inquiries or collaboration opportunities, please contact: