# Using Multivariate Linear Regression for Biochemical Oxygen Demand Prediction in Waste Water. (arXiv:2209.14297v1 [q-bio.OT])

There exist opportunities for Multivariate Linear Regression (MLR) in the
prediction of Biochemical Oxygen Demand (BOD) in waste water, using the diverse
water quality parameters as the input variables. The goal of this work is to
examine the capability of MLR in prediction of BOD in waste water through four
input variables: Dissolved Oxygen (DO), Nitrogen, Fecal Coliform and Total
Coliform. The four input variables have higher correlation strength to BOD out
of the seven parameters examined for the strength of correlation. Machine
Learning (ML) was done with both 80% and 90% of the data as the training set
and 20% and 10% as the test set respectively. MLR performance was evaluated
through the coefficient of correlation (r), Root Mean Square Error (RMSE) and
the percentage accuracy in prediction of BOD. The performance indices for the
input variables of Dissolved Oxygen, Nitrogen, Fecal Coliform and Total
Coliform in prediction of BOD are: RMSE=6.77mg/L, r=0.60 and accuracy 70.3% for
training dataset of 80% and RMSE=6.74mg/L, r=0.60 and accuracy of 87.5% for
training set of 90% of the dataset. It was found that increasing the percentage
of the training set above 80% of the dataset improved the accuracy of the model
only but did not have a significant impact on the prediction capacity of the
model. The results showed that MLR model could be successfully employed in the
estimation of BOD in waste water using appropriately selected input parameters.