The rise of data science in insurance: Can good governance catch up?

Pardeep Bassi and Kartina Tahir Thomson look at the risks associated with data science and what firms should be considering as part of their work on technology, data and models.

The rate at which data science techniques are developing and being adopted is increasing faster than insurers are able to develop their own understanding of the risk governance and ethics required.

To make matters more challenging, within most insurers there are two distinct groups operating on the front line of data science, often in conflict rather than in harmony: data science teams practising using cutting-edge techniques without the necessary understanding of their organisation’s risk frameworks, and insurance leaders who have limited experience with the latest advanced analytics. As a consequence, this internal disconnect leaves insurers and individuals that work for them exposed to risk.

Finding the right balance between governance and control, whilst still advancing the adoption of data science and the value that it creates, has become the magic middle ground upon which insurers have set their sights.

Bias

As increasingly complex models are used, a key risk for insurers to consider is bias – an issue so far not fully understood and appreciated by many firms – and how best to address the problems it creates. When individuals or groups of individuals are differentiated from others based on particular characteristics, insurers need to understand why. Is the bias due to the data collected not representing the entire population? Is it caused by potentially flawed human decision-making which is represented in the data collected? Or was the bias introduced due to the AI and machine learning models trained on the data? Is the inherent model form being used responsible for reinforcing the existing bias or even creating new biases?

The ability to detect hidden biases is essential to enabling appropriate strategies to measure, monitor and manage bias. Instead of thinking about bias at every stage of the model-building process – when an insurer first explores their data, when they build a model and when model outputs are used to impact a business decision – the risks are all too often considered as an afterthought by data scientists.

Choosing the right algorithm that will help an insurer find the optimum balance between interpretability, transparency and predictive power is another essential capability. There are a number of custom algorithms being developed in the market at the moment. For example, layered gradient boosting machines capture the same predictive accuracy of a gradient boosting machine, whilst providing a much greater level of transparency and interpretability.

Open source risk

In recent years, open source adoption has seen unprecedented growth. While open source allows incredible flexibility and innovation, it also exposes an insurer to more risk, particularly relating to governance and security. Besides the potential for malicious code hiding in open source packages, key person dependency is another risk created by having either just one individual or a small team responsible for building and maintaining code.

Large language models (LLMs), such as ChatGPT, are examples of technology evolving and being adopted in a hurry. However, the governance risk and control frameworks have not kept pace, creating significant risks relating to data privacy and intellectual property.

Through the use of LLMs, an insurer could potentially lose sensitive and proprietary data. There is potential to have no or limited control over how the data is used now or in the future, including being used by competitors at a later date.

Another risk concerns hallucinations, which is the tendency of LLMs to produce text that appears to be correct but is actually false. This could be driven by bad prompts or simply due to an underlying weakness in the model, delivering results which are wrong but with a lot of certainty. Reputational risk for an insurer is high if the data or model are used improperly.

Taking control

At the end of the day, the stability of the open source code is in the insurers’ own hands. They alone are responsible for making sure they meet their business criticality needs. Therefore, it is important that an insurer clearly delineates roles and responsibilities to avoid confusion. Defining who is making which decision ensures better accountability, visibility and opportunity to challenge decisions are in place at every level.

Open source offers real potential to contribute to a more efficient and innovative insurance market. However, insurers must first address two critical decisions: what they should use open source for in order to gain an advantage; and then how best to integrate open source in such a way that good governance and control are in place, creating an optimal balance.

Data science is spreading quickly. If insurers want to compete in this new AI-driven world, they not only need to simply adopt data science, but also do it in the right way. This means a gradual evolution of governance to ensure the right oversight, a lignment to internal values and regulatory compliance are achieved, combined with an evolving risk management framework to anticipate and mitigate future risks.

Pardeep Bassi is global proposition lead, data science, at WTW, and Kartina Tahir Thomson is senior director at WTW and Institute and Faculty of Actuaries president-elect.

Insurance