OmniFood8K: Single-Image Nutrition Estimation via Hierarchical Frequency-Aligned Fusion

Dongjian Yu1,      Weiqing Min2 3,      Qian Jiang1,      Xing Lin1,      Xin Jin1 * ,      Shuqiang Jiang2 3     

1 Yunnan University      2 State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences      3 University of Chinese Academy of Sciences      * Corresponding author

Accepted to CVPR 2026 (Highlight) πŸŽ‰πŸŽ‰πŸŽ‰

Paper Dataset Code

Overview

Accurate estimation of food nutrition plays a vital role in promoting healthy dietary habits and personalized diet management. However, most existing food datasets primarily focus on Western cuisines and often rely on depth sensors, limiting their applicability in real-world scenarios.

To address these challenges, we introduce OmniFood8K, a large-scale multimodal dataset containing 8,036 real-world food scenes with detailed nutritional annotations and multi-view images. Our dataset significantly expands coverage of Chinese cuisine and supports practical RGB-only nutrition estimation.

Dataset Overview

Overview of the OmniFood8K dataset: data collection process and category distribution.

OmniFood8K Dataset Overview

Dataset Highlights

Additionally, we construct NutritionSynth-115K, a large-scale synthetic dataset introducing compositional variations while preserving accurate nutritional annotations, enabling robust model training.

Method

OmniFood8K Network Overview

We propose an end-to-end framework for predicting nutritional information from a single RGB image.

Experiments

Extensive experiments across OmniFood8K and Nutrition5K datasets demonstrate that our method outperforms existing approaches, providing a practical solution for daily dietary assessment.

OmniFood8K Network Overview

πŸ“§ Contact Us

For inquiries or collaboration opportunities, please contact:

yudongjian@stu.ynu.edu.cn