Classification of Toddler Nutrition Using C4.5 Decision Tree Method

Nutrition is very much needed in the growth of toddlers. It is very important to give babies a balanced nutritional intake at the right stage so that the baby grows healthy and is accustomed to a healthy lifestyle in the future. Children under five years of age are a group that is vulnerable to health and nutrition problems. In determining the nutritional status, it can be done in a system manner using the C4.5 decision tree classification method and entering several variables or attributes. The dataset tested was 853 toddlers. Classification is carried out to determine the nutritional status based on the weight/age (BB/U), height/age (TB/U) and weight/height (BB/TB) categories. The attributes used for the classification of BB/U are gender, weight and age. The attributes used for TB/U are gender, body length or height, and age. The attributes used for BB/TB are gender, weight, body length or height, and age. The average accuracy of the BB/U category is 90.16%, the average accuracy of the TB/U category is 76.64%, and the average accuracy of the BB/TB category is 83.83%.


Introduction
Nutrients are organic substances required for normal functioning of the body's systems, growth and health maintenance. It is very important to give babies a balanced nutritional intake at the right stage so that the baby grows healthy and is accustomed to a healthy lifestyle in the future. Children under five years of age are a group that is vulnerable to health and nutrition problems, so that the toddler years are an important period of growth and need serious attention [1]. Based on the results of the 2018 Ministry of Health's Basic Health Research, 17.7% of infants under 5 years of age (toddlers) still experience nutritional problems. This figure consisted of Under-fives who suffered from malnutrition by 3.9% and those suffering from malnutrition by 13.8% [2]. The nutritional status of toddlers can be measured anthropometry, anthropometric indices are often used, namely: body weight for age (BB/U), height for age (TB/U), body weight for height (BB/TB). The weight index based on age (BB/U) is the most commonly used indicator because it has the advantage of being easy and quicker to understand by the general public. The reference standard used for determining nutritional status by anthropometry is based on the Decree of the Minister of Health No. 920/Menkes/SK/VIII/2002, to use the reference book of the "World Health Organization-National Center for Health Statistics" (WHO-NCHS) by looking at the Z-score.
In determining the nutritional status, it has been done manually by the Community health centers, so patients have to come physically to the Community health centers. This is of course very troublesome especially in the current pandemic situation and conditions. Determining nutritional status can be done automatically using a classification approach. One approach that can be taken is to use the C4.5 decision tree method. The C4.5 method is an algorithm that works by applying the concept of a decision tree. A decision tree is a predictive model using a tree structure or hierarchical structure. The concept of a decision tree is to transform data into a decision tree with decision rules.
In previous research on the comparison of the performance of the C4. Harry Budi Santoso, stated that the C4.5 algorithm has better performance than Naive Bayes with the level of accuracy obtained using the C4.5 algorithm of 96.4%, while the accuracy rate of Naïve Bayes is 95.11% [3].
Based on research [4] on the classification of typhoid fever (TF) and dengue hemorrhagic fever (DHF) by applying the C4.5 decision tree algorithm. It can be concluded that by using the k-folds cross validation test, the highest average accuracy value is 91.875% using 32 test data and 128 training data.
From the description above, a study was conducted using the C4.5 decision tree method in determining the nutritional status of children under five. It is hoped that applying the C4.5 decision tree method can help classify the nutritional status of toddlers to determine the growth of children under five.

Methodology
The methodology used in this study is as follows (Figure 1). The research began to prepare the dataset, then the dataset went through the cleaning process and continued with data selection. The next stage, the data will be divided into testing data and training data. Training data will be used to form a decision tree, while testing data will be used to evaluate the system being created. In the next sub-section, it will be explained in detail about the stages that are passed.

Dataset
The dataset used in this study is the monitoring data on the nutritional status of toddlers, obtained from the Kebong Health Center, Kelam Permai District, Sintang The BB/U category has 4 classification labels namely Best, Good, Bad and Worst. The TB/U category has 4 classification labels namely High, Normal, Short, and Very Short.

International Journal of Applied Sciences and Smart Technologies
While the BB/TB category has 4 classification labels namely Fat, Normal, Thin, and Very Thin (Table 1).

Data Cleaning
Data cleaning is a process for cleaning unused data [5]. In this study, some data were deleted because were incomplete. An example of deleted data is that it does not have a BB/TB label, has no PB/TB value, and does not have a TB/PB conversion value.

Data Selection
In   Table 2.

Dividing the Dataset
The dataset is divided into testing data and training data using -folds validation. The number of is chosen by the user where the values of are 3, 5, 7 and 9 folds. If the value of = 3, then the data is divided into 3 parts, 2 parts used for training data and 1 part for testing data, and likewise for dividing the value of 5, 7 and 9 folds.

Modeling C4.5 Decision Tree
Every fold is modeled using the C4.5 decision tree method, so that there are models for each folds. The C4.5 decision tree method classifies the data by looking for the value of Entropy, Information Gain, Split Info and Gain Ratio. Tree formation begins with finding the highest Gain Ratio value to become the root node, then for leaf nodes it is carried out recursively until a decision tree is formed [6].
The following is an example of a tree formation step: 1. Prepare the data that will be used for the formation of the C4.5 decision tree model.
In this example, 9 data on children under five are used for the classification of the BB / U category with the attributes used according to Table 3. 2. Separating data into training data such as Table 4 and testing data as in Table 5 with a total of 3 folds.   3. Calculating entropy using formula (1), information gain using formula (2), split info using formula (3), and calculating the gain ratio using formula (4) for each attribute. The entropy is formulated as

International Journal of Applied Sciences and Smart Technologies
Description of formula (1) follows: is the set of cases, is the number of partitions and is the proportion of to . The gain is formulated as Description of formula (2) follows: is Sample, is attribute, is the number of partitions of the attribute set , | | is the number of samples on the partition, and | | is the number of samples in . Now we formulate the Split Info as Description of formula (3) follows: is the subset resulting from solving using attribute which has as many as values. Then, we have the Gain Ratio as GainRatio( , ) = Gain( , ) SplitInfo( , ) .
Next, look for the root node candidates by looking for the highest information gain value for each attribute. Determine the root node by finding the highest gain ratio value for each candidate. The highest gain ratio value is found in the weight attribute with a variable value of 4.6, thus the root node of the tree is Weight B.
with a value of 4.6. The decision tree formed from the calculation is shown in Figure 2.  (Table 6).

Evaluation
Several experiments were carried out to evaluate this system. Each experiment was carried out by dividing the data into 3, 5, 7 and 9 folds. Each experiment was carried out for each category, namely the categories BB/U, TB/U and BB/TB. The experiments are shown in Table 7.  (Table 8). This shows that the system can classify the BB/U category well. While the TB/U category trial showed the average accuracy rate was 76.64% and the highest accuracy occurred at folds 7 (Table 9).  Based on the test results, we observe that the C4.5 decision tree works well for classifying the categories of BB/U, TB/U and BB/TB using the selected attributes.
Although a minority of cases cannot be classified properly.

Conclusion
Based on the results of the nutritional classification of children under five using the C4.5 decision tree method, the following conclusions can be drawn: 1. The C4.5 decision tree classification method can be used to classify the nutrition of toddlers quite well.
2. The average accuracy for each category is as follows: a. The BB/U category classification has an average accuracy of 90.16%.
b. The TB/U category classification has an average accuracy of 76.64%.
c. The BB/TB category classification has an average accuracy of 83.83%.