Data and insurance fraud

Alan LeeAugust 4, 2015

Insurance is a big industry. In the U.S. alone, 7,000 companies collect over $1 trillion of premium a year, resulting in an extremely attractive market for fraud. The FBI and NICB (National Insurance Crime Bureau) estimates that roughly 10 percent of all non-health insurance claims totaling $40 billion a year are very likely fraudulent. Even worse, health-related insurance fraud likely costs the country upwards of $100 billion a year. With these numbers, it’s no wonder why any type of analytics that helps identify fraud can affect the bottom line for insurance in a big way.

Unsurprisingly, insurance is also an industry with a wealth of data due to the workflow of claims processing. There is significantly more data collection, due to the necessity of having various adjusters in any claim do at least a cursory investigation of costly claims, as well as the multitude of vendors that are used to repair or provide services to the insured (e.g. car rentals, mechanics, doctors, etc.). Compare this to a typical financial ACH transaction where the only real data is the transaction itself, and you can see why insurance fraud is rich in information. The downside, however, is that the data in an insurance company is spread out over a significantly larger aging infrastructure than it typically is in financial or other institutions. Insurance companies have historically been slow to implement new technologies, due to the conservative and regulated nature of insurance, as well as the sheer risk of adopting new processes with an industry of that size. Ultimately, there is no reason why the less-regulated portions of insurance, like fraud detection, should not adopt the latest technologies to solve problems that have traditionally been difficult.

Insurance fraud is predominantly about relationships. It is difficult for professional fraudsters to commit significant amounts of fraud without a well-built network due to the claims vetting process. Fraudsters need access to trusted doctors who will give inflated liability estimates, lawyers who will apply pressure on the insurance companies, witnesses who will verify a staged accident, etc. This network is not an easy thing to build from scratch, so the contacts are often recycled across multiple lines of business and carriers, and identifying these connections is a strain on the traditional infrastructure that the insurers are using to store their data. Take, for example, a lawyer who engages in a specific type of fraud for only serious injury car accidents, while keeping a homeowners business legitimate. Additionally, this lawyer works with a specific doctor who overstates injuries to increase the claim liability for a small portion of the most lucrative claims. This doctor may or may not be engaging with other unscrupulous lawyers with the same practice. To make matters worse, these two vendors often try to obfuscate their identities by using slightly different names or tax identification numbers, so that a cursory investigation would turn up several different identities in the system when these identities are really the same individuals that can’t be easily resolved as one. The key to detecting these cases involve scrutinizing the relationship of people to all others in the data. In my next post, I’ll discuss one way for using data to detect fraud in the property/casualty insurance industry.