# Imputation via Clusterwise Linear Regression

"Without mathematics, there's nothing you can do. Everything around you is mathematics. Everything around you is numbers."
- Shakuntala Devi

## IviaCLR Software

IviaCLR [1,2] is an imputation method for preprocessing incomplete data. The method is based on clusterwise linear regression (CLR) and it combines two well-known approaches for missing value imputation: the linear regression and the clustering. The fundamental idea is to approximate missing values using only those data points which are somewhat similar to the incomplete data object. That is, we determine the value of a missing feature based on the item's observed features and its similarity to other items in the data set. The IviaCLR consists of three different parts: initial imputation, CLR-method, and predictions. The different initial imputations and prediction methods can be selected as parameters for the software. To solve the underlying CLR problem we use the LMBM-CLR - the limited memory bundle method for solving large CLR problems [3]. In its turn the LMBM-CLR consist of two different algorithms: an incremental algorithm is used to solve CLR problems globally and at each iteration of this algorithm the LMBM algorithm [4] is used to solve both the CLR and the auxiliary CLR problems (locally) with different starting points. In addition to the k-partition problem, LMBM-CLR solves also all intermediate l-CLR problems where l=1,…,k-1 due to the incremental approach used. Instead of final solution provided by the algorithm these intermediate solutions (written in files "imputed_values.txt" and "predictions.txt") can be used for imputation (but the software does not do this automatically). To use the software type ./iviaclr for help.

The software is free for academic teaching and research purposes but I ask you to refer the corresbonding reference given below if you use it.

### Code

 iviaclr.f03 - Mainprogram for imputation via clusterwise linear regression. - Initialization of parameters for imputation, CLR and LMBM. - Parameters. - Initial imputations and final predictions. - Subroutines for CLR software. - Computation of function and subgradient values for CLR software. - LMBM - limited memory bundle method. - Subprograms for LMBM. - makefile. - All the above in compressed form. - Instructions file.

## References

1. N. Karmitsa, S. Taheri, A. Bagirov, P. Mäkinen, "Missing Value Imputation via Clusterwise Linear Regression", IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 4, pp. 1889–1901, 2022.
2. N. Karmitsa, S. Taheri, A. Bagirov, P. Mäkinen, "Clusterwise Linear Regression based Missing Value Imputation for Data Preprocessing", TUCS Technical Report, No. 1193, Turku Centre for Computer Science, Turku, 2018.
3. N. Karmitsa, A. Bagirov, and S. Taheri, " Limited Memory Bundle Method for Solving Large Clusterwise Linear Regression Problems", TUCS Technical Report, No. 1172, Turku Centre for Computer Science, Turku, 2016.
4. Napsu Haarala, Kaisa Miettinen, Marko M. Mäkelä, "Globally Convergent Limited Memory Bundle Method for Large-Scale Nonsmooth Optimization" (author version), Mathematical Programming, Vol. 109, No. 1, pp. 181-205, 2007. DOI 10.1007/s10107-006-0728-2. The original publication is available online at www.springerlink.com.

## Acknowledgements

The work was financially supported by the Academy of Finland (Project No. 289500, 294002, and 313269) and Australian Research Counsil’s Discovery Projects funding scheme (Project No. DP140103213).