Classification of Lung and Colon Cancer Histopathological Images Using Convolutional Neural Network (CNN) Method on a Pre-Trained Models

Cancer is a severe illness that can affect many young and older people. In Indonesia, lung cancer is the leading cause of cancer-related death, whereas colon cancer, with more than 1.8 million cases worldwide in 2018, is the third most common cancer. This study intends to create a model to categorize histological images of lung and colon cancer into five labels to aid medical professionals' categorization job. This study uses a pre-trained model idea known as VGG19 in its CNN (Convolutional Neural Network) technique. The dataset uses 25,000 histological graphic pictures with a ratio of 80% training data and 20% testing data. The classification system for lung and colon cancer contains five categories: lung benign tissue, lung adenocarcinoma, lung squamous cell carcinoma, colon adenocarcinoma, and colon benign tissue. The training result revealed a 99.96% accuracy rate and a 1.5% loss rate. The model can be rated as excellent based on these results.


Introduction
Cancer is a dangerous condition that can affect both young and older people.Cancer has abnormal characteristics that enable it to target cells or other bodily organs without the affected person knowing it.Estimates of cancer incidence and mortality by sex and for the 18 age groups in 2020 for the 185 countries or regions with a population of more than 150,000 in the same year.When the cells lining the lung airways divide improperly

International Journal of Applied Sciences and Smart Technologies
Volume 5, Issue 1, pages 133-142 p-ISSN 2655-8564, e-ISSN 2685-9432 134 and uncontrolled to generate abnormal tissue, lung cancer results.The most common cancer that causes death is lung cancer [1].
Lung cancer is the leading cause of mortality from cancer in Indonesia [2].In contrast to colorectal cancer, commonly referred to as colon cancer, this type of cancer develops in the colon, or rectum.The rectum and large intestine are digestive system components of the colon that contribute in the production of energy and the elimination of waste.According to statistics from the American Institute for Cancer Research, colon cancer is the third most prevalent cancer worldwide.There were almost 1.8 million infections in 2018 [3].In addition to diet, lack of fibre, smoking, and alcohol use, age is the most significant risk factor for colon cancer.Symptoms of colorectal cancer include changes in bowel habits, stomach pain, blood in the stool, anaemia, fatigue, loss of appetite, and weight loss.
Artificial intelligence (AI) technology is used in the medical industry as a decisionsupport tool for identifying diseases and helps speed up picture analysis.Computer-aided diagnostics can analyze medical photos [3].The Convolutional Neural Network (CNN) method has been used in several earlier research to demonstrate that cancer may be classified using AI technology, and the resulting model accuracy is good.While compared to manual evaluation by medical experts, AI technology performs computationally more quickly while categorizing lung cancer photos.Modelling the lung cancer categorization system takes two hours of computation [4].In comparison, a physical examination by medical staff takes 10-14 days to identify lung cancer.
There are several different pre-trained models available on CNN.A few examples of these architectures are Le-Net, Alex-Net, Google-Net, Conv-Net, and Res-Net.In classifying biomedical-based images, the Alex-Net structure is more likely to reach a high accuracy of 90% [5].In contrast to the ResNet architecture, research utilizing a biomedical-based dataset (Diagnosis of Colonic Adenocarcinoma) was effective in attaining an accuracy of 93% using the ResNet architecture [6].
The pre-trained model is used in this study since it performs well for classification.

Methods
Three convolutional layers and two fully connected layers will be combined to create the convolutional neural network (CNN) approach for classification in this study.
Additionally, it will take advantage of the VGG pre-trained transfer learning architecture.
Transfer learning is an approach that makes use of current network infrastructures.There is no need to start from scratch because the CNN architecture utilized for transfer learning has already been learned from prior data.The use of this design will impact the categorization outcomes.

Convolutional Neural Network
A pooling layer, a few convolutional layers (+ReLU), more convolutional layers (+ReLU), and another pooling layer are common CNN architectures.The image gets smaller and smaller as it moves through the network, but it also usually gets deeper and deeper (i.e., with more feature maps) because of the convolutional layers.The final layer of the stack-for example, a softmax layer that outputs estimated class probabilitiesoutputs the prediction after adding a standard feedforward neural network made up of a few fully connected layers and ReLUs at the top [7].

VGGNet
Reusing the lowest layers of a pre-trained model is frequently helpful to develop an image classifier but need more training data.The VGGNet [8] program, created by K.
Simonyan and A. Zisserman, was second in the ILSVRC 2014 challenge.It featured a relatively straightforward and traditional design, consisting of 2 or 3 convolutional layers, a pooling layer, 2 or 3 more convolutional layers, a pooling layer, and so on (for a total of just 16 convolutional layers), plus a final dense network with two hidden layers and the output layer.Despite using multiple filters, it only used 3 filters [7].
According to research [9]

Research Workflow
The research process is shown in Fig. 1

Preprocessing
Data preparation is a step that the user must complete before they edit or add data to a dataset.Because not all incoming data has the same format, the objective is to make understanding easier while reducing confusion during data entry.Preprocessing eliminates the possibility of inaccurate or unnecessary data influencing statistics.80% of the data are used for training and 20% for testing during the preprocessing stage.The dataset is divided into 20,000 training data for image prediction (training and validation), 4,500 testing data, and 500 dummy data.Afterwards, the information is kept in Google Drive to simplify the image classification process.After that, the picture settings are made, and a data generator is made to produce training and test data.

Building CNN Architecture Model
Following preprocessing, the next step is the creation of the CNN model.The current study uses pre-trained models.Hence it does not create a model from scratch.The pre-trained principle is to replace the starting layer with the desired layer, often known as fine-tuning.The model can be seen in Table 1.

Transfer Learning Process
Applying the pre-trained model to carry out the transfer learning process comes after choosing the pre-trained model.The frozen layer on the pre-trained model is where the transfer learning process starts.Information on the frozen layer In the context of CNN, using the frozen layer is how to manage the updated weights.A layer's weight cannot be modified once it has frozen.This method can decrease training data computation time while maintaining accuracy.

Results of Training and Testing
The average accuracy of the AlexNet and ResNet models is above 90%, based on several prior studies.This work aims to develop a classification system for lung and colon cancer International Journal of Applied Sciences and Smart Technologies Volume 5, Issue 1, pages 133-142 p-ISSN 2655-8564, e-ISSN 2685-9432 135 from histological images using various pre-trained models and to assess which model performs best given the histopathological images used.
the CNN architecture produces good results in case study examples of age estimation.The estimating method with a categorization strategy produces satisfactory results.The researchers' challenge with the CNN architecture is to International Journal of Applied Sciences and Smart Technologies Volume 5, Issue 1, pages 133-142 p-ISSN 2655-8564, e-ISSN 2685-9432 136 develop the optimum loss function with the most Gaussian distribution.Based on the study's review results, the CNN architecture that provides the best prediction is VGG-16.
's flowchart, which begins with collecting dataset, preprocessing (gathering lung and colon cancer image datasets, scaling images, and dividing datasets), building CNN models with sequential and pre-trained models, training and testing data, storing models, implementing models into the Flask framework, designing the application GUI, and predicting image.

3. 1 .
Collecting DataThe 25,000-image "Lung and Colon Cancer Histopathological Images" dataset from Kaggle served as the source for the image collection.The image collection has five class labels: colon adenocarcinoma, lung squamous cell cancer, benign lung tissue, and colon adenocarcinoma.International Journal of Applied Sciences and Smart Technologies Volume 5, Issue 1, pages 133-142 p-ISSN 2655-8564, e-ISSN 2685-9432 137

Data International Journal of Applied Sciences and Smart Technologies Volume 5 ,
Issue 1, pages 133-142 p-ISSN 2655-8564, e-ISSN 2685-9432 138 Forward and backward propagation are used during the training phase of the CNN algorithm.Fig. 2 displays the outcomes of training testing and data testing.The model performs well, with a loss on training data of 1.5% and an accuracy of 99.96%.

Figure 2 .Figure 3 .
Figure 2. The Result of Accuration Training and Testing Data

Figure 4 .Figure 5 . 4 Conclusions
Figure 4. Classification of Colon Adenocarcinoma Label with No Filter

Table 1 .
The CNN Model Architecture