ABSTRACT: This study develops an advanced recommendation system for personalized nutritional solutions using machine learning techniques to optimize dietary planning. Based on a dataset that includes individual characteristics such as age, weight, height, body mass index, medical conditions, and nutritional goals, predictive models were built using algorithms like LightGBM, Random Forest, XGBoost, SVC, and Multi-Layer Perceptrons. Among these, LightGBM demonstrated the best performance, achieving a weighted F1-Score of 0.972 and an AUC of 0.997, showing high discrimination and accuracy. The system was complemented by an interactive interface designed to facilitate adoption by non-technical users, offering realtime recommendations. The results highlight the potential of artificial intelligence to transform personalized nutrition, improving health and well-being through data-driven interventions.
Keywords: LightGBM, machine learning, nutrition plan, personalized diet, random forest, recommendation system, SVC, XGBoost.
RESUMEN: El presente trabajo desarrolla un sistema de recomendación avanzado para soluciones nutricionales personalizadas, utilizando técnicas de aprendizaje automático para optimizar la planificación dietética. A partir de un conjunto de datos que incluye características individuales como edad, peso, altura, índice de masa corporal, condiciones médicas y objetivos nutricionales, se construyeron modelos predictivos basados en algoritmos como LightGBM, Random Forest, XGBoost, SVC y Multi-Layer Perceptrons. Entre estos, LightGBM mostró el mejor rendimiento, alcanzando un Fl-Score ponderado de 0.972 y un AUC de 0.997, demostrando alta capacidad de discriminación y precisión. El sistema fue complementado con una interfaz interactiva diseñada para facilitar la adopción por parte de usuarios no técnicos, ofreciendo recomendaciones en tiempo real. Los resultados subrayan el potencial de la inteligencia artificial para transformar la nutrición personalizada, mejorando la salud y el bienestar a través de intervenciones basadas en datos.
Palabras clave: Aprendizaje automático, dieta personalizada, LightGBM, plan de nutrición, random forest, sistema de recomendación, SVC, XGBoost.
Article
RECOMMENDATION SYSTEM FOR PERSONALIZED NUTRITIONAL SOLUTIONS USING MACHINE LEARNING TECHNIQUES
Sistema de Recomendación para Soluciones Nutricionales Personalizadas usando Técnicas de Aprendizaje Automático
Received: 13 July 2024
Accepted: 29 December 2024
Healthy nutrition is essential for preventing overweight, obesity, and non-communicable diseases. This requires a balanced intake of foods and beverages that meet the body's nutritional needs. However, the increased consumption of processed products rich in sugars, saturated fats, and sodium has significantly changed dietary patterns, increasing the risk of health problems. Maintaining a varied and balanced diet not only helps prevent malnutrition but also promotes overall and sustained well-being [1,2].
The exact composition of a healthy diet varies according to individual characteristics such as age, gender, lifestyle, and physical activity level, as well as cultural context and eating habits [2,3]. The growing concern for health and well-being has driven the need for personalized nutritional solutions that adapt to each person's specific needs. In this context, recommendation systems emerge as a key tool, capable of analyzing data and providing specific nutritional advice based on preferences, dietary restrictions, and health goals.
More specifically, recommendation systems are fundamental tools for assisting users in decision-making by filtering and prioritizing information based on their preferences. These systems use algorithms that analyze user behavior and compare their preferences with those of other users [4,5]. Notable examples include YouTube, Netflix, LinkedIn, Amazon, MercadoLibre, Airbnb, and Springer.
The effectiveness of recommendation systems has been demonstrated across various fields, ranging from computing, marketing, and entertainment to areas like medicine and nutrition, where they personalize experiences and enhance user satisfaction.
In Peña et al. [6], the challenge of inconsistency in adopting best practices in software processes is addressed, impacting decision-making. The proposed system employs artificial intelligence techniques, such as case-based reasoning and association rules, to generate effective recommendations. Additionally, Galván Espinoza [7] explores the use of data analysis and machine learning techniques to optimize digital marketing strategies, increasing visibility and campaign reach.
Collaborative filtering techniques have proven highly effective in enhancing user experiences on digital platforms. In Huang [8], a movie recommendation system for company X was developed using a user-similarity-based model to offer personalized solutions and improve customer satisfaction.
Meanwhile, Valecillos Girand [9] focused on enhancing the shopping experience in the e-commerce platform Aprovecha.com, applying both user-based and item-based collaborative filtering, with the former being more effective at increasing sales. Another prominent example is found in the fashion industry [10], where recommendation systems facilitate outfit combinations using a database of ratings and applying segmentation algorithms like GrabCut, FloodFill, and models such as Random Forest, SVM, and Neural Networks, resulting in a more personalized and effective user experience.
These applications have parallels in the medical field, where innovative recommendation systems aim to improve care for patients and caregivers. In Meng et al.[11], a hybrid online medical service recommendation method was proposed, combining Local Differential Privacy (LDP) and Location-Sensitive Hashing (LSH) to protect user privacy. Similarly, Li et al.[12] develops the ADCareOnto ontology, offering personalized care for caregivers of Alzheimer's patients by integrating user profiles and clinical guidelines, supported by a validated virtual assistant. Furthermore, Hendri et al.[13] presents a web application using C4.5 and K-Nearest Neighbor (KNN) algorithms to recommend medical specialists based on reported symptoms. Likewise, Guevara and Coral [14] describes a recommendation system for home appliances based on KNN, optimizing data preprocessing and distance metrics to provide personalized recommendations.
The impact of recommendation systems extends to the agri-food sector, revolutionizing personalized nutrition approaches. A notable example is a biotechnology company offering personalized nutrition solutions by analyzing the gut microbiome using artificial intelligence and machine learning to assess bacterial types and quantities and generate tailored dietary recommendations [15]. Similarly, in Zeevi et al.[16], glucose monitoring in 800 individuals revealed variability in responses to identical meals, indicating the need for personalized approaches. A machine learning algorithm integrating personal data (biomarkers, diet, microbiota) was developed and validated to predict individual glycemic responses accurately and reduce postprandial glucose levels.
Moreover, a diet recommendation system was developed to address nutrition and health issues, tailored to individual needs such as weight loss, weight gain, or health maintenance. Likewise, machine learning algorithms such as Random Forest and K-Means were used to classify foods into personalized categories. Based on user data (age, height, weight, dietary preferences, and goals), BMI is calculated and recommendations are generated [17,18,19]. In Choudhari and Thakur [20], a machine learning method (KNN) considers physical data such as user symptoms, while Toledo et al.[21] employs deep learning and knowledge of suitable diets based on body prakriti according to different seasons to recommend the best possible diet. In Kadam et al.[22], AHPSort is proposed as a method to filter foods according to user characteristics, and an optimization model suggests preferred meals. In contrast, Eliyas and Ranjana [23] developed a dietary assessment solution using a hybrid approach, combining content-based filtering to recommend foods based on user interests and collaborative filtering to suggest potentially attractive options.
Beyond building highly accurate models, interpretability is crucial for understanding the factors influencing the generated recommendations. To address this, SHAP (SHapley Additive exPlanations) values were used, ensuring that the proposed personalized diets are not only precise but also understandable. SHAP values assign fair contributions to each feature, correcting inconsistencies in traditional methods [24].
In this work, we present the development of a machine learning (ML)-based recommendation system designed to propose personalized nutritional plans for individuals interested in adopting a fitness lifestyle. The ML models were built using a dataset containing user demographic data, body mass index (BMI), nutritional status, medical conditions, and specific goals. Based on this information, the model generates diets tailored to individual needs and characteristics, promoting a highly personalized and effective approach to achieving health and wellness goals.
To build the models, a dataset was obtained from a public online repository (GitHub repository https://github.com/RishaRane/Workout_Recommendation_System ). The dataset consists of 14,589 records containing detailed tabular information about individual characteristics and dietary recommendations. Table 1 describes the input variables relevant to the recommendation system.

As part of preprocessing, categorical variables such as "Diabetes," "Hypertension," and "Sex" were encoded using one-hot encoding. This technique converts categorical values into binary variables, creating new columns named Sex_Male, Hypertension_Yes, and Diabetes_Yes to numerically represent category presence (0 or 1). For multi-class categorical variables such as "Level," "Fitness Goal," "Fitness Type," and the target variable "Diet," a label encoder was applied, assigning a unique numeric value to each category.
Figure 1 presents the histogram showing that the majority class (Diet 4) represents 34.7% of the total data, while the minority class (Diet 2) accounts for only 2.9%, reflecting a 12:1 imbalance, which will be addressed in the following sections.

This section presents the methods used to develop predictive models aimed at classifying each input into one of the possible diet categories based on the provided characteristics. Content-based filtering is a machine learning approach used in recommendation systems that makes decisions based on the similarity of item characteristics and user preferences [25]. In the context of a diet recommendation system, this method uses each user's characteristics and available diets to find the most suitable match.
The tested algorithms listed in Table 2 were implemented using the Python programming language. As observed in the previous section, the class distribution presents a significant imbalance. To address this, we used algorithms specifically designed for learning with imbalanced data, available in Scikit-learn libraries. To handle class imbalance, the algorithms LightGBM, Random Forest, and XGBoost were employed, as they are supervised learning models based on decision trees used for multi-class classification problems. Additionally, two more algorithms were developed: SVC (Support Vector Classifier) and MLP (Multi-Layer Perceptron). Although they are not specifically designed to handle imbalanced data like the previously mentioned models, they can be adjusted to manage such data effectively.
The Light Gradient Boosting Machine (LightGBM) framework employs gradient boosting techniques and is widely used for classification and regression problems [26]. Random Forest implements ensemble voting (for classification) or averaging (for regression) to make robust and precise predictions [26]. Similarly, the XGBoost (Extreme Gradient Boosting) method applies classification, regression, and ranking techniques. Its key feature is the sequential training of multiple decision trees, where each new tree focuses on correcting the errors made by previous ones [27]. On the other hand, SVC (Support Vector Classifier) works by creating a hyperplane or decision boundary that separates data into different classes with the maximum possible margin. Its main objective is to maximize the separation margin using data points called support vectors [28]. Lastly, MLP (Multi-Layer Perceptron) is a neural network-based model that learns through weight adjustments via backpropagation, using derivatives and optimization techniques such as gradient descent [29].

To optimize the model hyperparameters, GridSearch was used alongside five-fold cross-validation. This approach evaluates all possible hyperparameter combinations, dividing the training data into five subsets: three for training and two for validation, rotating the subsets in each iteration.
The dataset was split into two subsets: 80% for training and 20% for testing. Since the dataset is imbalanced, evaluation metrics were chosen to ensure a fair comparison with the state of the art. These include F1-score, precision, recall, and AUC, as well as metrics better suited for imbalanced data, such as the geometric mean. Precision indicates how many of the recommendations made are truly useful. Recall measures the proportion of relevant recommendations identified by the system out of all possible relevant ones. The F1-score is the harmonic mean of precision and recall, balancing both metrics to evaluate overall model performance. AUC (Area Under the ROC Curve) measures the model's ability to distinguish between relevant and irrelevant recommendations, making it one of the most widely used classification metrics. Geometric mean captures global performance by considering both sensitivity and specificity, making it particularly suitable for imbalanced data scenarios.
The Accuracy metric was not used because the dataset is unbalanced, and this metric does not allow differentiating the performance of the classifier for the category of interest. The weighted average approach was chosen, giving importance to the majority classes, but without ignoring the weight of the minority classes, given that a poor recommendation in minority cases, such as diets for people suffering from diabetes and/or hypertension, could have a significant impact on their health. Also, confusion matrices were generated to illustrate how the model classifies dietary recommendations in the different categories.
To ensure model transparency and interpretability, SHAP (SHapley Additive exPlanations) [30] was used. This game theory-based technique assigns contribution values to each input feature, allowing an analysis of how each variable influences dietary recommendations. Two key SHAP values were employed:
Mean absolute SHAP values, indicating the average importance of each feature.
SHAP beeswarm plots, showing the distribution of these contributions across predictions.
A Python-based application was developed using the Tkinter library to allow users to intuitively enter their demographic data and specific health conditions. This system provides real-time personalized diet recommendations. The program interface consists of a table organized into three main columns:
The performance of the models in terms of the defined evaluation metrics (Precision, Recall, F1-Score, AUC, and Geometric Mean) is presented in Table 3, and the code in Python is available online (GitHub repository https://github.com/IA-UNAL/Sistema-de-recomedacion-de-dietas-basado-en-machine-learning ).

Table 3 shows that the LightGBM model achieved the best performance among these metrics, attaining a weighted average F1-Score of 0.972 and an AUC of 0.997. Figure 2 presents the ROC curves and AUC values for all models, providing a visual understanding of model performance based on their discrimination capability. The AUC indicates that the model has a higher ability to correctly identify relevant dietary patterns for different user dietary goals. On the other hand, the Precision 0.972 and Recall 0.973 metrics reflect the system's ability to generate accurate dietary recommendations, avoiding incorrect suggestions.

The confusion matrix presented in Figure 3 illustrates how the best model, LightGBM, classifies dietary recommendations into different categories. This visualization clearly shows the number of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN), indicating model performance.

In the SHAP value analysis, Figure 4 shows that age is the most influential variable in the model, with an average absolute impact of 2.17. BMI and Level closely follow, with mean values of 1.78 and 1.76, respectively. In contrast, gender and training type have minimal contributions to the model's predictions, making them the least relevant variables.

Additionally, Figure 5 illustrates the individual impact of each variable on predictions. Age again stands out as the most influential variable, with a wide dispersion in SHAP values, which can be positive or negative depending on the specific feature value. High age values (in red) tend to increase predictions, while low values (in blue) reduce them.

Finally, Figure 6 presents the interactive interface that integrates the LightGBM model. Through this interface, users can dynamically interact with the system, adjusting individual characteristics according to their goals and receiving personalized dietary recommendations in real time. Thanks to the implementation of the LightGBM model, the system efficiently adapts suggestions, providing suitable meal plans tailored to the specific nutritional needs of each user, improving both the experience and recommendation accuracy.

The development of a diet recommendation system based on artificial intelligence proves to be an effective tool for addressing nutritional issues in a personalized manner. Through machine learning models and appropriate evaluation metrics, the system generates accurate recommendations tailored to each user's individual characteristics while remaining interpretable for users. This approach not only promotes health and well-being but also highlights the potential of artificial intelligence to transform how people manage their diet. The integration of an interactive interface enhances accessibility and user experience, making the system practical and scalable for implementation in different contexts.
Future studies may explore the incorporation of additional data, such as genetic or microbiome information, to further increase recommendation accuracy. Additionally, evaluating the long-term impact of this system on adherence to dietary plans and improvements in health indicators is suggested.








