Abstract: The aim of the present study was to develop a predictive model of academic achievement (school success or failure) by applying a decision tree analysis. A cross-sectional study was carried out to design a system for the early detection of academic failure. 219 adolescents (aged 14 to 16) participated and information on their socioeconomic status, body mass index (BMI) percentile, physical activity, leisure time spent in front of screens, enjoyment, hope, anger, anxiety, boredom, behavioral engagement, emotional engagement, cognitive engagement, self-perceived school performance and intention to go to university was collected as input variables in decision tree analysis. 6 failure and 3 success groups were found able to predict academic performance. Good accuracy was obtained in the training (80.11 %) and validation (81.40 %) datasets of the decision tree. It is possible to predict academic failure or success by assessing weight status, physical activity, anger and hope during school attendance, intention to go to university and self-perceived school performance.
Keywords: high schools, academic achievement, prediction, physical activity level, decision.
Resumen: El objetivo del presente estudio fue desarrollar un modelo de predicción del rendimiento académico (éxito o fracaso escolar) mediante la aplicación de un análisis de árbol de decisión. Se realizó un estudio transversal para diseñar un sistema de detección temprana del fracaso escolar. Participaron 219 adolescentes (de 14 a 16 años) y se recabó información de su estatus socioeconómico, percentil de índice de masa corporal (IMC), actividad física, tiempo de ocio frente a pantallas, niveles de disfrute, esperanza, ira, ansiedad, aburrimiento, compromiso conductual, compromiso emocional, compromiso cognitivo, rendimiento escolar autopercibido e intención de ir a la universidad, como variables de entrada en el análisis del árbol de decisión. Se encontraron 6 grupos de fracaso y 3 de éxito capaces de predecir el rendimiento académico. Se obtuvo una buena precisión en los conjuntos de datos de entrenamiento (80.11 %) y validación (81.40 %) del árbol de decisión. Es posible predecir el fracaso o el éxito académico mediante la evaluación del estado de peso, la actividad física, la ira y la esperanza durante la asistencia a la escuela, la intención de ir a la universidad y el rendimiento escolar autopercibido.
Palabras clave: educación secundaria, rendimiento académico, predicción, nivel de actividad física, árbol de decisión.
Academic achievement prediction in secondary education by decision tree analysis
Predicción del rendimiento académico en educación secundaria mediante el análisis de árboles de decisión

Recepción: 08 Marzo 2022
Aprobación: 17 Abril 2023
Publicación: 02 Enero 2024
School failure is a polysemic term often associated with not achieving an academic goal, which usually means not passing certain subjects or achieving a minimum degree. Without a doubt, being able to prevent such situations would reduce the students and families’ frustration and would be a great advance for society, so that all students could have a good education (Alexander et al., 1997, 2001; Cairns et al., 1989). To academic failures, several authors have proposed different forms of early detection based on various factors, such as emotions, physical fitness, sedentary lifestyle or academic commitment, which are detailed below (Alzina & Escoda, 2012; D’Mello et al., 2008; Pekrun et al., 2002; Weiner, 1982). It should be noted that these forms of early detection are created to detect school failure early enough so that professionals in the education system can intervene and improve the student’s situation.
A factor that has been shown to be highly relevant in predicting both academic performance and school failure is the socio-economic status of students’ families (Parr & Bonitz, 2015; Trujillo-Torres et al., 2020). Concretely, school failure in Spain is not distributed equally in the socioeconomic stratification, since the student’s social class can affect school failure and performance, since the percentage of school failure is higher in working class than in middle class children (Martínez-García, 2011).
Pekrun et al. (2002) defined student emotions as the students’ personal experience when performing academic activities, identified them as a very important part of their personal motivation in achieving academic success and avoiding school failure. Along the same lines, D’Mello et al. (2008) highlighted the fact that knowing students’ emotions was important to carry out a good teaching process, due to the links between cognition and emotion. These relationships have been described by explaining that students experience confusion when they face obstacles in their objectives, or detect contradictions, incongruities or anomalies in the teaching process (Festinger, 1962; Graesser & Olde, 2003). If confusion is not resolved it can lead to irritation, frustration, anger and sometimes even rage. It is therefore understandable that multiple studies have concluded that negative emotions such as anger, anxiety and boredom are negatively correlated with academic performance (Pekrun, 2006; Pekrun et al., 2011). However, too low values have also not been found to be positive for emotions such as anger (Lane et al., 2005; Pekrun et al., 2011). On the other hand, a learner can experience a range of positive emotions (such as enjoyment) when challenges are faced, knowledge is uncovered, and concepts are mastered. Students who are actively engaged in the learning process can have a flow-like experience, when they are so engrossed in the material that time and fatigue disappear (Csikszentmihalyi, 2014). In fact, a positive emotion such as hope is considered to have sufficient potential to redirect underachieving students (Dixson, 2019).
Another factor that has been linked to academic performance is the practice of physical activity (PA) and some related concepts such as sedentary behavior or physical fitness. In a longitudinal study, Pellicer-Chenoll et al. (2015) concluded that the cluster of students with higher PA and fitness had a lower body mass index (BMI) and higher academic performance compared to classmates who performed less PA. In turn, the cluster of students with the lower PA showed lower levels of physical fitness, higher BMI and lower academic performance than the rest of the student’s profiles. Several studies have found this kind of relationship between PA (Marques et al., 2017; Morales, Pellicer-Chenoll, et al., 2011; Rasberry et al., 2011; A. Singh et al., 2012; Sullivan et al., 2017) or physical fitness (Coe et al., 2013; Van Dusen et al., 2011; Wittberg et al., 2009) and academic performance. However, there are even studies that argue that there is no conclusive evidence on the beneficial effects of PA on students’ overall cognitive and academic performance (Rasberry et al., 2011). These discrepancies between the different studies may be due to different conceptions of academic performance and the way of measuring and considering the practice of PA. In general, the relationship between the two variables is considered positive or non-existent (Singh et al., 2019).
The possibility of sedentary habits having a negative influence on academic performance (apart from the fact that PA could have a positive influence) has also been explored. Peiró-Velert et al. (2014), observed the influence of the time spent in the sedentary use of screens (e.g., video games, mobile phones, television…) on academic performance. The results showed that there was an inversely proportional relationship between academic performance and the use of screens.
Student engagement has also been explored as a possible factor with an influence on academic failure. Carini et al. (2006), corroborate that student engagement is positively linked to desirable learning outcomes such as critical thinking and grades. Dogan (2015) analyzed this factor divided into three engagement dimensions: cognitive, behavioral and emotional. His results showed that cognitive engagement predicted academic performance, but emotional and behavioral engagement were not predictors. Other studies mentioned that behavioral engagement is important to achieve positive academic results and prevent dropping out (Connell & Wellborn, 1991; Finn, 1989).
As explained above, it is now known that these factors in isolation have an impact on academic performance. However, a limited number of studies have been published with the aim of developing an early detection system of academic failure using most of the above factors.
Casillas et al. (2012), examined the combined effects of predictor variables on estimating academic failure. His findings highlight the importance of using several predictor factors (i.e., psychosocial, and behavioral) to achieve an accurate estimation of students at risk of drop out.
Davis et al. (2014), conducted a study that assessed the extent to which several social-emotional skills learning (academic self-efficacy motivation, social connections, importance of school and school management, management of psychological and emotional distress, and academic stress) could be used as predictors of academic outcomes. Their results indicated that the combination of social-emotional learning subscales effectively discriminated between students who made positive progress toward high school graduation and those identified as having dropped out.
Zhang et al. (2018), focused their efforts on finding out the type of predictive model that achieved the best accuracy. They made a comparison between classification models, such as naive Bayes, support vector machines, decision tree and multilayer perceptron. They obtained better results with the last two. In fact, other studies have used decision trees as the analysis method to predict school drop-outs and have found results with a relatively high accuracy in their estimations (Quadri & Kalyankar, 2010; Veitch, 2004). It should be noted that this analysis for this type of study is highly relevant and important compared to other predictive analyses due to its efficacy and multiple benefits. For example, decision tree can handle different kinds of input data (i.e., nominal, numeric, and text), it is easy to understand, and it can process erroneous data set values, among others (Rokach & Maimon, 2014).
As a summary of the factors most frequently used to predict academic failure or success, we refer to one of the most recent reviews, conducted by Alyahyan and Düştegör (2020), which provides information on the variables most commonly used in this type of study, based on York’s (2015) (York et al., 2015) definition of academic success. According to this review, student demographics and psychological factors have proved to be two of the most widely used for prediction, along with prior academic achievement and students’ environment factors.
As can be seen, although the theoretical basis seems to have a solid foundation on the most influential factors, other relevant factors such as PA, BMI or sedentary habits have not been addressed. There is, however, ample literature that has demonstrated their influence on academic achievement. This could be because the studies carried out with these types of variables have focused on examining academic performance rather than academic failure or pass. For this reason, it is quite possible that these factors have not appeared as relevant because they have not been analyzed.
As has been explained above, some studies have used relatively novel analysis methods (such as decision tree) to predict academic failure using combinations of psychological and demographic variables. However, no studies have been published on developing early detection systems of academic failure using combinations of not only psychological and demographic characteristics but also variables related to lifestyles, like the practice of PA and sedentary activities. Therefore, we consider our work to be novel, as it includes the most influential factors on the prediction of students’ academic success studied, according to Alyahyan and Düştegör (2020), so far plus other elements and aspects that had not been considered in an interrelated manner until now, as BMI, PA or sedentary habits. Furthermore, the application of a multifactorial and non-linear analysis, such as the decision tree, avoids the limitations that linear analyses may entail (e.g., reduction of statistical power when many factors are added or multicollinearity) and, in addition, this analysis provides classification and prediction results that can be easily interpreted visually. Last, the importance of this type of study lies in the fact that if a combination of factors that can identify school failure is found, in turn, actions can be promoted to help avoid them and thus seek to achieve success.
The aim of this study was therefore to develop a predictive model of academic achievement (school success or failure) by means of decision tree analysis, using emotions on attending to school, school engagement, PA, leisure time spent in sedentary activities that require the use of screens, socio-demographic characteristics, and school adjustment variables.
A cross-sectional study was carried out to design a system for the early detection of academic failure in students of third and fourth grades of secondary education in Spain. For this, students completed a set of questionnaires at the beginning of one quarter to measure PA, hours spent on screen sedentary activities, socioeconomic status, emotions related to class attendance, school engagement and school adjustment. These variables together with the BMI were used as input variables to design a classification tree to predict the academic results (success or failure) at the end of the quarter (output variable).
The sample was composed of 219 adolescents (aged 14 to 16). The participants were recruited from the compulsory secondary education schools in Valencia (Spain). The inclusion criteria were: i) should be between 14 and 16 years old (both inclusive), ii) not be neurologically or intellectually unable to understand and complete the questionnaires, and iii) refusal of their progenitors to participate in the study. The participants’ characteristics are reported in Table 1.

The procedures applied in this study were approved previously by the Institutional Review Board of the University of Valencia (Code: 1503291) while also meeting the requirements set out in the Declaration of Helsinki (1975, subsequently revised in 2008). The parents of the participants supplied their written informed consent before participating in the experiment.
All the measures were taken at the participants’ high school in their habitual classroom. The researchers explained how to complete the set of questionnaires and resolved the students’ doubts. The time required to fill out all the questionnaires was between 50-60 minutes.
It should be noted that the questionnaires were administered at the beginning of the quarter. This allowed establishing a predictive relationship between the input variables and academic failure. The academic qualifications for each student obtained at the end of the quarter were anonymized. Their academic performance was codified as “suspended” if the student did not pass one of the subjects (academic failure) or “approved” if the student passed all the subjects in the quarter (academic success).
Family Affluence Scale (FAS) II
This is a questionnaire to determine the socioeconomic status (SES) of the families of adolescents in European and North American countries (Currie et al., 2008). Four objective questions were asked to quantify vehicles, vacation trips, personal bedroom, and computers, to estimate family wealth. The answers were codified from 0 (minimum number of vehicles, trips…) to 3 (maximum number of vehicles, trips…) depending on the number of possible responses of each item. The final score of the scale was computed as the mean value of the items’ scores, 0 being the lowest SES and 2.25 the highest.
Weight and height were self-reported by the participants and BMI (kg/m2) was calculated. It should be noted that Sherry et al. (2007), reported good validation results of the self-reported weight and height in adolescents. Growth tables (Kuczmarski et al., 2000) were also used to calculate the BMI percentile (adjusted for age and sex).
Physical Activity Questionnaire for Adolescents (PAQ-A)
PAQ-A was first validated by Kowalski et al. (1997), as a modified version of the PA questionnaire for older children. They found a good convergent validity of this questionnaire in measuring the general PA level of high school students. Later, Martínez-Gómez et al. (2009), validated the Spanish version of the PAQ-A obtaining moderate relationships with accelerometer data (rho = 0.34 – 0.39). This questionnaire is useful for measuring PA levels from very low to very intense in the last 7 days and is appropriate for teenagers between the ages of 13 and 18. It consists of eight questions that assess different aspects of the PA performed by the adolescent in different periods of the day. PAQ-A is a simple questionnaire, easy to complete and manage in the school environment. The overall result of the test is a score of 1 to 5 points (1 the lowest value and 5 the highest value) to determine the level of PA performed by each teenager.
Adolescent Sedentary Activity Questionnaire (ASAQ)
ASAQ is used to measure time spent in a range of sedentary behaviors outside school hours during a normal week (Hardy et al., 2007). In brief, in this questionnaire, participants answer questions about fifteen sedentary habits, with details of how long (hours and minutes) they carry out each one every week. For this study, only the seven items referring to leisure sedentary activities that required the use of a screen technology were used. The final score for the leisure time spent in sedentary activities in front of screens was computed as the total time reported in each of
these scores in minutes.
Emotional Scales Questionnaire Related to Class Attendance (AEQ)
The AEQ was first designed by Pekrun et al. (2011), to measure the emotions experienced by students in relation to class attendance. The complete questionnaire consists of 24 scales measuring several emotions that are organized in three sections to assess class-related, learning-related, and test-related emotions. The items are answered based on a 5-point Likert scale, where the lowest value (1) refers to total disagreement with the statement, while the highest value (5) corresponds to complete agreement with what the item expresses. For this study, we selected the 42 items of the scales for enjoyment (8 items), anger (7 items), anxiety (11 items), boredom (9 items) and hope (7 items) during class assistance (class-related emotions) in the Spanish version. This version of the questionnaire was validated by Rosas (2015), who found good parameters regarding reliability and structural and construct validity. The final score for each subscale is computed as the mean value of its items.
School Engagement Measure Questionnaire (SEM)
The School Engagement Measure questionnaire consists of 19 items, with a Likert format with a range of 5 points (Fredricks & McColskey, 2012). The Spanish version of the questionnaire was validated by Díaz et al., (2016), who determined that 16 items were clustered into three engagement subscales: behavioral (4 items; e.g., “I pay attention in class”), emotional (5 items; e.g., “I am interested in the work at school”) and cognitive (7 items; “When I read a book, I ask myself questions to make sure I understand what it is about”) engagement. The cognitive engagement refers to the level of taking part in school life and developing complex reasoning skills (Doğan, 2014). The concept of emotional engagement includes the student’s interest in school is accepted as the student’s reactions in the classroom and the student’s level of interest, boredom, unhappiness, happiness, and anxiety (Skinner et al., 1990). Finally, behavioral engagement is linked with participation in academic, social, or extracurricular activities. The score of each dimension is calculated as the mean value of the items assigned to that subscale.
The Brief Multidimensional School Adjustment Scale assesses the degree to which the adolescent is integrated into the school environment (Rubia et al., 2010). It consists of 10 items with a Likert format with a range of 6 points, which are divided into 3 dimensions: i) problems of adaptation to the school environment (items 6, 7, 8, 9 and 10), ii) self-perception of school performance (items 1, 2 and 5), and iii) intention to go to university (items 3 and 4). Of all the dimensions, only the self-perception of school performance and intention to go to university were used and computed as the mean of the items that make them up. It measures a positive and integrated self-concept as a student, as well as expectations of continuing with higher education (Rubia et al., 2010).
Data analysis was performed using the Matlab R2018a program (Mathworks Inc., Natick, USA). First, a classification tree was applied to obtain a prediction model of academic performance (i.e., students suspend some subjects) using as input variables SES, BMI percentile, PA, leisure time spent in front of screens, enjoyment, hope, anger, anxiety, boredom, behavioral engagement, emotional engagement, cognitive engagement, self-perceived school performance and intention to go to university.
The classification tree was validated using a subsample of the dataset. This technique consists of dividing the total number of available cases (i.e., 219) into two data sets: training (i.e., 80% of cases; n = 176) and validation (i.e., 20% of cases; n = 43). No significant differences were found between the training and validation datasets in the variables used in this study. The training dataset was then used to obtain the decision tree, while the validation dataset was used to verify its validity. The decision or classification tree is a method that divides the sample into two subgroups using an independent explanatory variable. For this, a cut-off point of the explanatory variable is established, which divides the sample into two sub-nodes based on the value of the subjects in the variable, i.e. the cases that are above the threshold from a group and the cases that are below form the other group. This process is repeated for each subgroup, until all the cases are correctly classified.
In this study, the CART algorithm with the deviance was used as the as the split criterion, which is a binary algorithm that divides each group into two subgroups. In addition, to avoid possible over-training (avoiding low external validity) a condition was laid down during the training process that each node should have at least 10 cases to reduce the final number of nodes and divisions. The classification tree thus divides adolescents according to discriminating variables to classify all participants according to whether they passed all subjects (i.e., approved) or not (i.e., suspended).
Once the model was obtained, it was applied to the validation data set to obtain classification performance variables. The accuracy of the classification and the suspended and approved prognostic values were computed. These variables were calculated as described in Eqs. (1, 2 and 3).

In the following section, the results obtained are provided with respect to the descriptive data of the sample, as well as those of the decision tree. Descriptive data are shown in Table 2.

The classification tree obtained with the training dataset is shown in Figure 1. The subjects assigned in this dataset are divided into subgroups using the input variables until the terminal nodes are reached. There were 9 terminal nodes, each representing a group of participants of the training data set. These terminal nodes are represented in Figure 1 as G1, G2, […], G9 and were used to classify students based on academic failure (i.e., suspend) or success (i.e., pass).
As an example, the first node (i.e., G1) is explained below to clarify Figure 1. G1 consisted of 21 students, of whom 20 had suspended and 1 had passed the evaluation. This is therefore considered an academic failure node with a participant grouping accuracy (from the training data set) of 95 % (i.e., 20 divided by 21). As the 21 students in this node presented an intention to go to the university lower than
5.25 points and a self-perceived school performance lower than 2.5 points, future students with these characteristics will have a high risk of school failure that should be reduced.

Table 3 shows the characteristics of each terminal node to clarify the results. In the first column, each terminal node classifies students by academic success or failure, based on the proportion of people who pass or suspend from the training data set that were classified in each of them. The following columns show the variables in each terminal node to describe the characteristics of each one.

Table 4 reported the performance of the decision tree to classify students from the training and validation data sets. From these results the decision tree was seen to perform well both in the training (80.11% accuracy) and validation (81.40% accuracy) datasets.

The purpose of this manuscript was to develop an early detection system of academic failure by means of decision tree analysis using as potential input variables the emotions of students during class attendance, school engagement, intention to go to university, self-perceived school performance, socio-economic status, BMI percentile, PA and leisure time spent in sedentary activities that require the use of screens (e.g., watching TV). As supported by the results, the decision tree demonstrates a good classification accuracy as it was possible to implement this system in schools as an easy way of detecting adolescents at risk of suspending a subject. This system only requires the students to provide weight, height, PAQ-A (8 items), the items related with anger (7 items) and hope (7 items) during school attendance (i.e., from AEQ) as well as the items related with intention to go to university (2 items) and self-perceived school performance (3 items) from the brief multidimensional school adjustment scale. Therefore, although all the input variables used in this study are important factors in academic performance, not all of them are necessary to adequately predict school failure.
School failure is determined by the difficulties students encounter in reaching educational goals (Eisenberg et al., 2006; Enguita et al., 2010). As this is influenced by multiple factors, it is decisive to know which factors interfere to a greater or lesser extent in academic failure (Yu et al., 2018) to be able to develop early detection systems. To the authors knowledge, several studies have carried out an analysis focused on predicting academic failure (Casillas et al., 2012; Respondek et al., 2017; Yu et al., 2018). Casillas et al. (2012) use variables such as academic achievement, psychosocial characteristics and behavioral indicators, and Respondek et al. (2017) use perceived academic control and emotions (e.g., enjoyment, boredom and anxiety). Nevertheless, PA as well as the use of screen media technology during sedentary leisure activities have not been taken into account, despite the relationship found between these variables and academic performance (Morales, Pellicer-Chenoll, et al., 2011; Peiró-Velert et al., 2014).
By including all these variables in an analysis, a decision tree with high accuracy (i.e., 80.11 % in training data and 81.4 % in validation data) was obtained in our study. These results support previous studies, such as that of Zhang et al. (2018), who observed that the best methods to predict academic performance (five categories) were decision tree and multilayer perceptron (accuracy 57.41 and 62.04 % for the validation data set, respectively) as against Naïve Bayes or support vector machines (35.65 and 48.61 %, respectively). Vairachilai (2020), concluded that Naïve Bayes and decision tree had higher accuracy (77 and 71 %, respectively) than support vector machine (38 %). Finally, in the study by Ashraf et al. (2018), the decision tree was found to be the most accurate predictive analysis (97.3% accuracy). It should be noted that Ashraf et al., did not apply a cross-validation procedure and their results (i.e., very high accuracy) could be indicative of an overtrained model. All in all, it can be said that the decision tree is a suitable tool for predicting academic performance and the performance parameters of our study are similar or higher than those presented in other studies with a similar classification purpose.
It should be noted that the decision tree obtained in this study determines the characteristics of the student groups that approved or suspended all the subjects. The results determined nine groups (terminal nodes of the decision tree), of which six represent groups of students that failed in some subject while the other three represent students who passed all the subjects.
Thus, there are six combinations of characteristics that lead to academic failure in adolescents. For example, students allocated in G1 (Figure 1) did not present a high intention to go to university and showed low self-perceived school performance, resulting in many probabilities (i.e., around 95 %) of suspending some of the subjects of the course (i.e., G1 in figure 1). But what is really interesting is that, on the left branch of the tree, the students with the intention of going to university lower than
5.25 need to have a relatively high score in self-perceived school performance (≥ 2.5), to experience at least a minimum level of anger (≥ 1.88) during some events attending school and to perform a moderate level of PA (from 1.88 to 2.42 points) to reach academic success (i.e., pass all subjects). Students with the intention of going to university lower than 5.25 who present any other combination of the above variables will have a high probability of failing a subject (i.e., G1, G2, G3 and G5 of figure 1). This fact is highly relevant for suggesting strategies to help these students, who have a moderate desire to go to university, to pass high school assessments. For example, if a student allocated in G3 is detected, it could be interesting to promote active lifestyles with the objective of moderately increasing the PA performed by the student in the hope of changing from G3 (failure) to G4 (success).
As can be seen in this decision tree, the intention to go to university is a variable of great importance, as it is necessary to know the rest of the factors that influence students. In contrast, in a previous study it had not been considered as an effective predictor variable (Fernandez-Lasarte et al., 2019). This may be due to the fact that this study has not analyzed this factor using non-linear analyses, which modulate the weight of each predictor variable by interrelating it with the other variables.
These results are in line with previous studies. On the one hand, it may seem contradictory that a very low value of anger is an important characteristic in determining school failure in some students (i.e., G2) since multiple studies mention that a high degree of anger is negative for academic performance (Pekrun, 2006; Pekrun et al., 2011). Nevertheless, some studies conducted, such as those by Pekrun et al. (2011) and Lane et al. (2005) concluded that a certain amount of anger can have positive effects on academic performance, while both very high and very low values of anger are negative for students’ academic achievement. This could be explained by the students’ control of their emotional intelligence, since Parker et al. (2004) suggested that individuals with a high level of emotional intelligence are aware of the positive effects of anger on academic performance and are able to regulate their mood to reach appropriate states and achieve academic success. Therefore, if a student with a very low level of anger (i.e., allocated in G2) is detected, it would be interesting to analyze his or her ability in the emotional intelligence domain so that he or she can control emotions such as boredom (Pekrun et al., 2010), hopelessness (Titz, 2001) and anger, which, without proper control, can destabilize the emotional state and thus avoid school failure.
On the other hand, the results extracted from Groups 3, 4 and 5 corroborate that it is necessary for students to have a moderately active life in order to be academically successful. G3 (failure) reaffirms that lack of PA is positively correlated with school failure (Pellicer-Chenoll et al., 2015) and, inversely, G4 (success) corroborates that the moderate practice of PA is positively related to academic success (Morales, González, et al., 2011; Morales, Pellicer-Chenoll, et al., 2011; Pellicer-Chenoll et al., 2015). These results can be explained by neurophysiological reasons (Hillman et al., 2005; Tomporowski et al., 2008; van Praag, 2009), since physical exercise improves blood flow to the brain and thus cognitive functions are improved, or by psychosocial reasons (Sallis et al., 1999; Sigfúsdóttir et al., 2007), since PA is positively associated with mental health, self-esteem, emotional well-being and self-concept, which may have a positive influence on academic performance (Pellicer-Chenoll et al., 2015). Even so, this correlation between PA and academic performance is not entirely linear and Morales, Pellicer-Chenoll, et al. (2011) have already suggested that high levels of PA do not lead to improvements in academic performance, which would support the results found in G5 (failure). In other word, the relationship between PA and school performance could be non-linear, since the moderate levels are those most linked with school achievements (i.e., quadratic function).
Considering the right branch of the tree, it should be noted that there are two groups of characteristics that lead to academic success: i) intention to go to university equal or higher than 5.25 and experience hope during class attendance at least up to 3.79 points; and ii) intention to go to university equal or higher than 5.25, experience low-to-moderate levels of hope during class (i.e., <3.79), to have a BMI percentile lower than 82.3 and to have high self-perceived school performance (i.e., ≥3.5). A good way of helping students motivated to go to university is to be sure that they experience hope during class attendance (i.e., help these students to be allocated in G9) to dramatically improve their academic success. These results are in line with other studies, which observed a positive correlation between hope and academic performance and proposed the application of interventions to increase academic hope due to its benefits (Feldman & Kubota, 2015).
However, if hope cannot be encouraged to high levels in some students, a healthy lifestyle should be promoted to control the BMI percentile and prevent students from becoming overweight (i.e., 85 ≤ 94 percentile) or obese (>95 percentile) (Kuczmarski et al., 2000) so that they do not fail academically (i.e., G8). These results are in line with previous studies in which having a higher percentage of BMI can cause disconnection or lack of commitment to academic work (Finn et al., 2018; Peterson et al., 2012). Those who are overweight or obese may have lower productivity due to health problems, which may be related to social, psychological and affective issues (Shaw et al., 2015). Since childhood overweight and obesity are related to PA, diet, and sedentary lifestyle, those students with healthy lifestyles could therefore have a lower BMI and probably achieve good academic performance (Pellicer-Chenoll et al., 2015). The results obtained in previous studies are therefore corroborated, confirming that obesity is important in determining academic performance and consequently that healthy lifestyles are also important factors. However, although healthy habits are important for academic success, the results of this study determine that the educational system should also try to make the students have a moderate-to-high self-perceived school performance in order to achieve academic success (i.e., G7) and avoid failure (i.e., G6), as shown in both branches of the tree. Therefore, if there is not high motivation to go to university and high hope feelings, then teachers should encourage this aspect more strongly by promoting moderate-to-high self-perceived performance in order to avoid academic failure. One way to improve self-perceived school performance could be to adjust the difficulty of classroom tasks to the level of the students, trying to find tasks that are difficult for them but at the same time accessible and passable. This would reduce stress and maintain psychological and emotional health (Solberg et al., 1998). Furthermore, this factor is even more relevant when observing the results of previous studies, where it has been found that a decrease in self-perceived school performance leads to a decrease in academic expectations (García-Escalera et al., 2020). This suggests that, for example, if a student is in G6, if his or her self-perceived school performance is not promoted, he or she could move to G1, increasing the probability of academic failure (see the ratio of failures and passes in these groups in Figure 1 to understand this example). In fact, Godoy et al., (2013) concluded that a negative academic self-perception can be a determining factor in academic performance, and they mention the need to work on students’ self-perception at school because this could promote the improvement of academic performance (Moreira et al., 2016).
As has been demonstrated in the above paragraphs, knowing the characteristics of these groups would not only be interesting for understanding the nature of failing in the academic process, but also for implementing strategies to help students to be successful in school. The decision tree thus has two main practical applications: to detect students at risk of academic failure and to provide an individualized orientation to design strategies to avoid failure.
A strong point of this study was its use of a decision tree approach for testing multifactorial combinations including variables related to active lifestyles. Also, it was carried out in only one city, so that its findings should be corroborated and extended to other places with a higher number of participants to increase the generalizability of the results. Another limitation is the limited number of input variables, since other interesting variables such as dietary habits or school social relationships could increase the performance of the decision tree in classifying students according to their academic results.
This predictive model of academic achievement presented in this study can have a practical transfer of knowledge to the education system, as it could be used as a tool to detect school failure. This tool would make it easier to guide the intervention of education professionals to improve their situation (i.e., teachers and specialist professionals). In order to implement it properly, school management should make these questionnaires available, as well as an explanation of their use. In addition, to further improve their usability, policy-makers could create a learning course or create a web application that explains, step by step, the procedure and provides feedback to address each detected situation of possible academic failure. In any case, a detailed explanation of how to apply this system with the tools currently available is presented below. These steps should only be followed for those students whose grades are low or who the teacher/professional feels may need it:
How to reference this article: Villarrasa-Sapiña, I., García-Massó, X., Liébana, E., & Monfort Torres, G. (2024). Academic achievement prediction in secondary school by decision tree analysis. Educación XX1, 27(1), 253-279. https://doi.org/10.5944/educxx1.33351




