Predictive Model for Computing Health Insurance Premium Rates Using Machine Learning Algorithms


  • Angela D. Kafuria African Centre of Excellence for Data Science, University of Rwanda, KK 737 St, Kigali, Rwanda


Health Insurance, Insurance premiums, Machine learning, Predictive models


The health care systems depend heavily on out-of-pocket payments, the mechanism that is a barrier to universal health coverage, as it contributes to inefficiency, inequity and cost. To solve this challenge, people are encouraged to enrol on health insurance schemes to reduce the burden of out-of-pocket payments. There is a strong need for insurance companies to develop models that accurately predict medical expenses for the insured population. The variables; Age, sex, body mass index, number of children and region attributes were used to formulate a predictive model to determine health insurance charges using different Machine learning algorithms techniques. The findings showed that the following variables were significant; age (p = 0.000), BMI (p = 0.001), smoking (p = 0.000) and region (0.046). Therefore, these attributes can be said to be the determinants of health insurance charges. Five (5) models that were used in predictive analysis were evaluated. These models were Multiple Linear Regression (MLR), K-nearest Neighbors (KNN), Least Absolute Shrinkage and Selection Operator (LASSO), Extreme Gradient Boosting (Xgboosting) and Random Forest Regression (RFR) The models’ performance evaluation findings indicated Gradient Boosting and RFR were the best models in prediction with the following values R2 = 85.5%, MAE = 2688.2, RMSE = 4748.7 and R2 = 85.3%, MAE = 2726.4, RMSE = 4783.8 respectively. The insurance companies that seek to develop a model for prediction premiums are recommended to use Extreme Gradient Boosting and RFR for a more accurate model


M. Tungu et al., “Does health insurance contribute to improved utilization of health care services for the elderly in rural Tanzania? A cross-sectional study,” Glob. Health Action, vol. 13, no. 1, 2020, doi: 10.1080/16549716.2020.1841962.

A. Ho, “Health Insurance,” Encycl. Glob. Bioeth., 2015, doi: 10.1007/978-3-319-05544-2_222-1.

R. Douven, R. van der Heijden, T. McGuire, and F. Schut, “Premium levels and demand response in health insurance: relative thinking and zero-price effects,” J. Econ. Behav. Organ., vol. 180, pp. 903–923, Dec. 2020, doi: 10.1016/j.jebo.2019.02.030.

S. Greenlaw and D. Shapiro, Principles of Economics 2e. 2011. [Online]. Available:

B. Lantz, Machine Learning with R: Expert techniques for predictive modeling, 3rd Edition. Packt Publishing, 2019. [Online]. Available:

WHO, Consitution of the World Health Organization, no. October. 2008. doi: 10.4324/9780203029732.

M. Huber et al., “How should we define health?,” BMJ, vol. 343, no. 7817, 2011, doi: 10.1136/bmj.d4163.

J. F. Outreville, “Theory and Practice of Insurance,” Theory Pract. Insur., no. June 2016, 1998, doi: 10.1007/978-1-4615-6187-3.

C. Hong Wang, Kimberly Switlick, Christine Ortiz and and B. Z. Connor, “Africa Health Insurance Hand Book: How to make it work,” no. June, 2010, [Online]. Available:

C. Rapaport, “An Introduction to Health Insurance?: What Should a Consumer Know??,” Congr. Res. Serv., pp. 7–5700, 2015.

A. Lakshmanarao, C. S. Koppireddy, and G. V. Kumar, “Prediction of medical costs using regression algorithms,” J. Inf. Comput. Sci., vol. 10, no. 5, pp. 751–757, 2020.

M. hanafy and O. M. A. Mahmoud, “Predict Health Insurance Cost by using Machine Learning and DNN Regression Models,” Int. J. Innov. Technol. Explor. Eng., vol. 10, no. 2, pp. 137–143, 2021, doi: 10.35940/ijitee.c8364.0110321.

T. Kaur, “Factors affecting health insurance premiums?: Explorative and predictive analysis Factors Affecting Health Insurance Premiums?: Explorative and Predictive Analysis Creative Component Project Report By,” 2018.

N. Yego, J. Kasozi, and J. Nkrunziza, “A Comparative Analysis of Machine Learning Models for Prediction of Insurance Uptake in Kenya,” MDPI, no. October, 2020, doi: 10.20944/preprints202010.0186.v1.

C. Yang, C. Delcher, E. Shenkman, and S. Ranka, “Machine learning approaches for predicting high cost high need patient expenditures in health care,” Biomed. Eng. Online, vol. 17, no. S1, pp. 1–20, 2018, doi: 10.1186/s12938-018-0568-3.

P. Killada, “Data Analytics using Regression Models for Health Insurance Market place Data,” University of Toledo, 2017.

P. Strawi?ski and D. Celi?ska-Kopczy?ska, “Occupational injury risk wage premium,” Saf. Sci., vol. 118, pp. 337–344, Oct. 2019, doi: 10.1016/j.ssci.2019.04.041.

I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN Comput. Sci., vol. 2, no. 3, pp. 1–21, 2021, doi: 10.1007/s42979-021-00592-x.

N. D. Bhadja and P. A. A. Abhangi, “A review Of Machine Learning Methodology in Big data,” Int. J. Sci. Dev. Res. - IJSDR, vol. 3, no. 5, pp. 361–368, 2018.

Y.-H. Kiang, “Chapter 2.- Model development and validation methodology: A classical big data application,” in Fuel Property Estimation and Combustion Process Characterization, 2018, pp. 11–39. doi: 10.1016/B978-0-12-813473-3.00002-7.

S. Misra, H. Li, and J. He, “Chapter 5 - Robust geomechanical characterization by analyzing the performance of shallow-learning regression methods using unsupervised clustering methods,” in Machine Learning for Subsurface Characterization, Elsevier Inc., 2020, pp. 129–155. doi: 10.1016/B978-0-12-817736-5.00005-3.

C. D. Sutton, Classification and Regression Trees, Bagging, and Boosting, vol. 24, no. 04. Elsevier Masson SAS, 2005. doi: 10.1016/S0169-7161(04)24011-1.

J. Elith, J. R. Leathwick, and T. Hastie, “A working guide to boosted regression trees,” no. Ml, pp. 802–813, 2008, doi: 10.1111/j.1365-2656.2008.01390.x.

M. P. Allen, “Chapter 37 - The problem of multicollinearity,” in Understanding Regression Analysis, Boston, MA: Springer, 2007, pp. 176–180. doi: 10.1007/978-0-585-25657-3_37.

E. F. Adebayo, O. A. Uthman, C. S. Wiysonge, E. A. Stern, K. T. Lamont, and J. E. Ataguba, “A systematic review of factors that affect uptake of community-based health insurance in low-income and middle-income countries,” BMC Health Services Research, vol. 15, no. 1. BioMed Central Ltd., p. 543, Dec. 2015. doi: 10.1186/s12913-015-1179-3.

A. A. Kodiyan and K. Francis, “Linear regression model for predicting medical expenses based on insurance data,” no. December 2019, 2020, doi: 10.13140/RG.2.2.32478.38722.




How to Cite

Angela D. Kafuria. (2022). Predictive Model for Computing Health Insurance Premium Rates Using Machine Learning Algorithms. International Journal of Computer (IJC), 44(1), 21–38. Retrieved from