Improving the Accuracy of Prediction of Dissolved Oxygen and Nitrate Level Using LSTM with K-Means Clustering and Spearman Analysis

Ika Arva Arshella(1*), I Wayan Mustika(2), Prapto Nugroho(3),

(1) Sanata Dharma University
(2) Gadjah Mada University
(3) Gadjah Mada University
(*) Corresponding Author

Abstract


This study discusses how to prepare data properly before entering the learning process for prediction using Deep Learning (DL). Long Short-Term Memory (LSTM) is one of the DL methods that is often used for prediction because of its superiority in maintaining long-term information. Although LSTM has proven effective, there are issues related to low-quality data that can reduce prediction accuracy. This problem is important to discuss because accuracy is important in predicting a value while field conditions can reduce the quality of the data taken. Data merging based on the relationship of each data collection location using the Spearman analysis and the K-Means clustering method is used to improve data quality. The results of the study show that improving data quality by merging data using K-Means has been successfully applied to various dataset conditions. In this study, we used two types of datasets related to river water quality, namely Dissolved Oxygen (DO) concentration and Nitrate levels for our simulation. The first data set produced DO predictions for eight locations with an average R2 = 0.9998, MAE = 0.0007, MSE = 1,13×10-6. The second data set produced nitrate predictions for ten locations with an average R2 = 0.7337, MAE = 0.0111, MSE = 0,00029


Full Text:

PDF

References


H. Zhongyang, Z. Jun, L. Henry, F. M. King, and W. Wei, “A review of deep learning models for time series prediction,” IEEE Sensor Journal, vol. 21, No. 6, Mar. 2021, doi: 10.1109/JSEN.2019.2923982.

H. Chen et al., “Water quality prediction based on LSTM and attention mechanism: A case study of the Burnett River, Australia,” Sustainability (Switzerland), vol. 14, no. 20, Oct. 2022, doi: 10.3390/su142013231.

A. Docheshmeh Gorgij, G. Askari, A. A. Taghipour, M. Jami, and M. Mirfardi, “Spatiotemporal forecasting of the groundwater quality for irrigation purposes, using deep learning Method: Long short-term memory (LSTM),” Agric Water Manag, vol. 277, Mar. 2023, doi: 10.1016/j.agwat.2022.108088.

C. W. W. Ng, M. Usman, and H. Guo, “Spatiotemporal pore-water pressure prediction using multi-input long short-term memory,” Eng Geol, vol. 322, Sep. 2023, doi: 10.1016/j.enggeo.2023.107194.

Z. Hu et al., “A water quality prediction method based on the deep LSTM network considering correlation in smart mariculture,” Sensors (Switzerland), vol. 19, no. 6, Mar. 2019, doi: 10.3390/s19061420.

Arshella. Ika Arva, I. W. Mustika, and P. Nugroho, “Water quality prediction based on machine learning using multidimension input LSTM,” IEEE, Aug. 2023. doi: 10.1109/ICITACEE58587.2023.10276970.

M. Del Giudice, “The prediction-explanation fallacy: A pervasive problem in scientific applications of machine learning,” Methodology, vol. 20, no. 1, pp. 22–46, 2024, doi: 10.5964/meth.11235.

R. G, “A study to find facts behind preprocessing on deep learning algorithms,” Journal of Innovative Image Processing, vol. 3, no. 1, pp. 66–74, Apr. 2021, doi: 10.36548/jiip.2021.1.006.

D. Dheda, L. Cheng, and A. M. Abu-Mahfouz, “Long short-term memory water quality predictive model discrepancy mitigation through genetic algorithm optimisation and ensemble modeling,” IEEE Access, vol. 10, pp. 24638–24658, Feb. 2022, doi: 10.1109/ACCESS.2022.3152818.

M. G. H. Omran, A. P. Engelbrecht, and A. Salman, “An overview of clustering methods,” 2007, IOS Press. doi: 10.3233/ida-2007-11602.

N. H. Wulandari and V. Purwayoga, “Cluster change analysis to assess the effectiveness of speaking skill techniques using machine learning,” International Journal of Applied Sciences and Smart Technologies, vol. 7, no. 1, pp. 1–14, 2025, doi: 10.24071/ijasst.v7i1.9667.

J. Wu et al., “Application of time serial model in water quality predicting,” Computers, Materials and Continua, vol. 74, no. 1, pp. 67–82, 2023, doi: 10.32604/cmc.2023.030703.

P. Pangestu, S. Maarip, Y. N. Addinsyah, and V. Purwayoga, “Clustering and trend analysis of priority commodities in the archipelago capital region (IKN) using a data mining approach,” International Journal of Applied Sciences and Smart Technologies, vol. 6, no. 1, pp. 169–182, 2024, doi: 10.24071/ijasst.v6i1.7798.

S. Chormunge and S. Jena, “Correlation based feature selection with clustering for high dimensional data,” Journal of Electrical Systems and Information Technology, vol. 5, no. 3, pp. 542–549, Dec. 2018, doi: 10.1016/j.jesit.2017.06.004.

G. Qiang, X. Hong Xia, H. Hong Gui, and G. Min, “Soft sensor method for surface water qualities based on fuzzy neural network,” IEEE, Jul. 2019. doi: 10.23919/ChiCC.2019.8866494.

D. Leach, A. Pinder, P. Wass, N. Bachiller-Jareno, I. Tindall, and R. Moore, “Continuous Measurements of Temperature, pH, Conductivity and Dissolved Oxygen in Rivers [LOIS],” NERC Environmental Information Data Centre. Accessed: Aug. 01, 2023. [Online]. Available: https://doi.org/10.5285/b8a985f5-30b5-4234-9a62-03de60bf31f7

D. Leach, M. Neal, N. Bachiller-Jareno, I. Tindall, and R. Moore, “Major ion and nutrient data from rivers [LOIS],” NERC Environmental Information Data Centre. Accessed: Aug. 01, 2023. [Online]. Available: https://doi.org/10.5285/4482fa14-aee2-4c7f-9c62-a08dc9704051

Centre for Innovation in Mathematics Teaching, Correlation and regression. University of Plymouth. Accessed: Jun. 27, 2025. [Online]. Available: https://www.cimt.org.uk/projects/mepres/alevel/stats_ch12.pdf

H. Sepp and S. Jurgen, “Long short-term memory,” Neural Comput, vol. 9, no. 8, pp. 1735–1780, 1997, doi: 10.1162/neco.1997.9.8.1735.

D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Comput Sci, vol. 7, pp. 1–24, 2021, doi: 10.7717/PEERJ-CS.623.




DOI: https://doi.org/10.24071/ijasst.v7i2.12361

Refbacks

  • There are currently no refbacks.









Publisher : Faculty of Science and Technology

Society/Institution : Sanata Dharma University

 

 

 

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.