Insurance fraud has been a challenge in the sector since… Always. Still, the problem has recently become more urgent due to the growth in global cybercrime and more sophisticated fraudulent schemes. The Covid-19 outbreak has also made a perfect storm for corporate fraud, placing an extra financial strain on businesses and forcing companies to go online.
Meanwhile, old-school fraud detection methods such as rule-based systems are not an option in insurance anymore. Companies need something more intelligent, and machine learning can become a great solution in this case. This article describes how insurers to do insurance fraud detection using machine learning to transform it into a powerful technological weapon and fight back against fraud more efficiently.
Is your company prepared for increasing fraud risks?
The Insurance Information Institute informs about $38 to $83 billion in yearly losses because of insurance fraud. And this is excluding health insurance fraud, which costs the US nation an additional $68 billion according to the National Health Care Anti-Fraud Association and is the most costly type of fraud in the insurance industry.
That’s a lot of money at stake. A heavy financial burden is yet not the only toll on the insurance business — here we also have bad customer experience, reduced loyalty, affected company reputation, and operational failures.
Traditional fraud detection methods
In the past, fraud detection was left to insurance fraud investigators, who had to go through new claims manually (btw, don’t forget to check our recent article on automated claims management). Their only weapon was a few facts and lots of intuition. For sure, this approach could not provide quality checks, aside from the fact that manual fraud detection was expensive and time-consuming.
The situation improved when rule-based systems appeared. This approach operates on a set of “rules”, so-called conditions, that warn about potential fraud once it’s detected. The rules could relate to unusual transaction types, suspicious timestamps, or account numbers. In other words, the system is looking for red flags to recognize fraud and automatically block it.
Ruled-based systems are a great anti-fraud toolkit, but their “black and white” logic doesn’t always work well. Its most critical limitation includes the impossibility of detecting new fraud schemes and patterns. But there are other drawbacks too:
-
Blind spots: As fraudsters become smarter and new schemes evolve, more blind spots in the insurer’s fraud detection system appear, i.e. the areas that rules haven’t covered yet. This makes a fraud detection system inefficient at some point as well as places an extra burden on the insurer’s fraud analyst team that should keep expanding the rules.
-
False positives: The more rules the company adds, the more it risks increasing false positive rates, which results in blocking genuine customers and valid claims. For instance, the insurer limits claims from a risky region. This means losing at least some amount of genuine customers from this location.
-
Only simple cases: Rule-based systems rarely notice more complex fraud cases because they are limited by human comprehension.
Machine learning to the rescue in fraud detection
Machine learning (ML) has been the next big step forward in fraud detection. Its idea lies in using complex algorithms that analyze large, complex datasets, seek patterns, learn, and improve from this experience. Here are a few most popular reasons why insurers are opting for ML-based fraud detection:
-
Speed: Imagine fraud detection in insurance using machine learning like having several teams of fraud investigators at your disposal. And these are working with thousands of claims registered in real-time and with high precision. ML can reduce the time spent on fraud detection by by 70%. This isn’t surprising if we consider that an ML-based fraud detection solution can work 24/7 and analyze large amounts of info in the blink of an eye.
-
Accuracy: Unlike rule-based systems that are broad and notice high probability fraud claims only, ML solutions spot non-intuitive behavior easily. According to Capgemini, an ML-based fraud detection system can increase accuracy by 90% thanks to noticing the subtlest evidence of abnormal behavior.
-
Efficiency: An ML model usually detects fraud at early stages. For example, a neural network can complete more complex analyses, like investigating how much time the customer spends to fill in the claims forms, how many pages they browse, and whether they are copy-pasting the info.
-
Scalability: While for a rule-based approach, more data could become a problem, machine learning fraud detection thrives on large datasets. Additional data is one more opportunity for an ML model to learn and discern patterns of valid and fraudulent claims. Besides, submitting more info allows ML models to keep up with the latest scams and fraud methods.
Here is one more important thing to mention. Although a huge novelty is machine learning for insurance fraud, it doesn’t mean that a company should replace its rules entirely. As much as a standalone solution, ML can work great as a complementary tool for your legacy system.
Machine learning fraud detection use cases
How exactly can insurance companies apply machine learning to insurance fraud detection? We mention a couple of ideas to get you started.
Fake claims
This is probably the most common use case for fraud detection in insurance using machine learning. Here ML takes advantage of semantic analysis, which makes it possible to analyze almost any type of data:
-
Structured
-
Unstructured
-
Table-type
Simply put, ML algorithms analyze claims-related files submitted by insurance agents, clients, police, and other stakeholders. They’re looking for inconsistencies in the provided evidence. And a great chance exists that ML will find these discrepancies since there are many hidden clues in textual data, and ML systems are great at detecting them.
In their case study “Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud”, Wang and Xu tested ML for fraud detection in insurance claims. The scholars used three different ML models, and all of them gained not lower than 75% accuracy in fraud detection.
Duplicate claims
Somehow, most insurers think of duplicate claims as “exact matches”. These are easy to spot, and you don’t even need a smart solution for fraud detection in this scenario. However, a duplicate request for payment is sometimes not so evident. Instead of conscious fraud, it can be a sort of mistake, like when a client resubmits a claim that wasn’t paid within the agreed time or when they want to add an extra modifier. In this scenario, smart ML-backed algorithms will be useful as they can notice subtle inconsistencies and inform fraud analysts.
Upcoding in medical billing
This is a type of health insurance fraud where a healthcare provider adds extra costs to the medical bill, planning to charge more to the patient and their insurance company. With the help of digital analysis based on Benford’s Law, a simple rule-based fraud detection system will reveal this type of fraud. However, ML can upgrade your rule-based system and, for example, add image recognition to digitize documents and classify them easily.
Overstated repair costs
In auto insurance, fraud detection using machine learning can help to search for inconsistencies in car repair costs. This is a type of classification task in machine learning, which can help to classify data in repair claims to see hidden correlations in claim records or even decisions of insurance agents, clients, and repair service providers. For instance, an auto repair service can charge an extra fee to the clients of a particular agent. Additionally, integrating car damage detection models can further streamline the claims process by automatically identifying the extent of damage from images, reducing human error, and speeding up assessments.
Others
There are a few other machine learning use cases in insurance fraud. How about adding image recognition to detect fraud at the personal identification stage? Or an insurance company can use an ML model to check medical receipts and bills to find links between a healthcare practitioner and a specific patient.
How an ML fraud detection system works
Before we go over to building an ML-based fraud detection model, let’s explore how it works. Imagine we want to check an insurance claim whether it’s fraudulent or not:
Thanks to ML for the insurance claim fraud detection function, an insurance company receives a risk score for each of its claims on a scale from 1 to 100. The higher the score is, the higher the probability of fraud is. The system then has to decide whether to block the claim, send it further for review, or allow it. This decision depends on the threshold chosen for each of the actions earlier.
If we want to improve these final results, we can take as much data as possible. For example, in healthcare insurance, we can use:
-
Personal info: Age, gender, and location
-
Claims data: Claims history, claims amount, minor vs. major claims
-
Hospital-related info: Length of stay, admission reasons, hospital status
-
Policy data: Plan type, direct vs. agency registration
The benefit of ML algorithms over rule-based systems is that machine learning can work with different types of data simultaneously. And more data will only contribute to the accuracy of outcomes.
Building an insurance fraud detection machine learning model
Generally speaking, the ML process in fraud detection includes five big steps:
1. Input data
Data plays the most critical role in building an insurance claim fraud detection solution using machine learning. Most insurers would use historical datasets of their insurance claim info as a backbone of their data.
The quantity and quality of data dictate how accurate the outcomes of the model will be. Although the general rule claims that the more data, the better, the insurer should still make sure that the quality of data is good.
Moreover, if it’s supervised machine learning, a critical part of the data preparation process will include dividing data into valid and fraudulent claims and labeling them accordingly.
2. Create features
Features are sort of characteristics of claims to separate fraudulent insurance activity from valid claims. To some extent, these are based on the same principles that fraud investigators will make their decisions upon.
For example, good indicators of insurance fraud could be the next features:
-
The date of claim, e.g. when a claim is made on short notice after the inception of the policy
-
The claimant is or has become unemployed
-
The documents provided by the claimant have inaccuracies, e.g. there are signs of alterations in dates, amounts, or descriptions
-
The applicant left some questions unanswered in the claim, such as about income or other insurance carried
-
The claimant has made insurance claims multiple times in their life
3. Choose the model
Different machine learning algorithms are used to build models in insurance fraud. In simple words, an ML algorithm is a set of rules to follow to solve complex problems, much like a mathematical equation or even a recipe. Its idea is to use the insurer’s data described by labels and features and learn to make conclusions, e.g. fraud vs. not fraud.
We briefly mentioned the algorithms that are the most popular in fraud detection in insurance claims:
-
Logistic regression that relies on a cause-effect relationship to work with structured data. In fraud detection, it tends to become more sophisticated with multiple variables and large datasets.
-
Decision trees that are used to automate the creation of rules for classification and regression tasks. This algorithm has a tree structure and, at its essence, is a set of rules trained using examples of insurance fraud.
-
Random forest that combines several decision trees to contribute to the performance of classification or regression. This technique works great to smooth the error that could occur in a single decision tree and, thus, achieves better accuracy.
-
Support Vector Machines (SVMs) that create a hyperline to divide data into two categories with a clear gap. The algorithm is especially useful for working with complex multidimensional systems.
-
K-Nearest Neighbors (KNN) includes an algorithm that classifies records according to how similar data points stay close to each other.
-
Neural Networks and deep neural networks are suitable for determining non-linear relations between the records. They can learn and uncover patterns — to some extent, similar to the human brain. To understand the difference, deep neural networks use more layers than neural networks, which guarantees more accurate results.
4. Train, evaluate, and fine-tune
When the algorithm is chosen, the learning part begins. First, an insurer can train the algorithm using historical data, a so-called training set. It’s important to have enough data to feed the model so it can learn the difference between fraudulent and valid claims and customer behavior better.
Patience and experimentation are required from ML engineers at this stage. At some moment, the model needs to be tested in real-life circumstances. The engineers will show the model new insurance claims, and it has to compare them to the valid/fraudulent claims it has seen before. Based on the results, the engineers tune parameters and improve the model.
This process should include as many iterations as needed so the fraud detection model provides the most accurate fraud score.
5. Detect fraud
The final stage of building auto insurance fraud detection using ML is the actual prediction. This is when the insurer’s ML model is ready for practical application and can differentiate valid claims from fraudulent ones.
How can you tell that the model is working? Again, insurance companies have to feed the model with the new fraud data (but the one that they know the outcomes for) and compare the results. If the model works correctly, it’s ready for deployment in the insurer’s live environment.
Final thoughts
To some extent, fraud detection resembles an arms race where insurers find out new and more exquisite ways to combat fraud. Meanwhile, their competitors — fraudsters — build new scams and schemes as fast as they can to pass by the insurer’s fraud detection system.
Machine learning is a game-changing technology that can bring fraud detection in your insurance company to a new level. Aside from automatic fraud detection, ML also delivers great speed, high accuracy, and insightfulness to insurers.
An ML-based fraud detection system handles overload most efficiently, also by harnessing the power of machine learning insurers can revolutionize customer segmentation and predictive lead scoring, ultimately boosting marketing efficiency and business profitability.
Thinking about implementing fraud detection and analysis for insurance claim using machine learning? We at Intelliarts are ready to give your company a hand and build a fully-fledged ML solution for you.
FAQ
The answer here depends on your business model. In some cases, a rule-based system is enough for fraud detection, and an ML-powered model is only a complementary solution. In others, ML is a necessary upgrade to the insurer’s anti-fraud toolset to save costs and stay profitable.
A good idea to know whether you need ML is to start with the next questions:
-
What are fraud losses vs. costs of advancing data analytics in your organization?
-
Does fraud burden your current and future operations a lot?
-
Is fraud affecting your company’s reputation and/or customer experience?
Most insurance businesses, as well as insurtechs, have vast repositories of existing data. This could be historic claims, policy data, and others, which will suit ideally to build an ML model for fraud detection. Additionally, insurers usually have a steady stream of new claims and application info, which you can use.
As for the amount of data, there is no exact quantity that is required to train an ML model. As said, the more, the better. You can still follow the rule of thumb: you need x10 data instances as there are features.
Even if you don’t have enough data, no worries. Our experienced data scientists can help you with data collection. We understand that this is the most important step that will impact the final ML results, so we’re ready to assist wherever possible.