Predictive Lead Scoring

Business Challenge: Our midsize insurance client specializing in medical insurance had a conventional lead processing system that involved handling leads from diverse sources, such as telemarketing, websites, and partnerships. The existing workflow of the client involved following up with leads through cold calling and offering insurance policies. However, managing a vast variety of leads was challenging and required significant investment in resources, including hiring, training, and paying agents.

This is some text inside of a div block.
This is some text inside of a div block.

Therefore, the client approached Predera with a goal to optimize their resources and develop an efficient lead scoring system using machine learning. They wanted to enhance their lead processing system and identify leads that were more likely to convert into customers, thereby reducing the burden on their agents and minimizing expenses.

The insurance company also asked for our assistance in improving their lead scoring process using machine learning. The client had observed that not all leads were of equal value. They noted that their agents could spend up to three hours making cold calls to potential leads and only sell one policy. Conversely, on another day, the same agent could sell two or three policies within just 30 minutes of a call. The client wanted to streamline their lead scoring process to better identify leads that were more likely to convert into customers and reduce the time and effort spent on leads that were less likely to convert.


ML Powered Lead Scoring Solution

After consulting with the insurance company, we decided to develop a predictive Lead Scoring model that could forecast the likelihood of a lead converting into a customer and finally purchasing a policy. This would enable insurance agents to prioritize leads with high scores, saving time and resources.

To build the solution we took the following steps:

  • We began by collecting data from various sources, including the lead source, lead behavior, historical quotes, demographics, policy information, and contact data.
  • Our data engineers opted to use gradient boosting techniques due to the tabular nature of the data, numerous categorical features, and binary target (converted vs. not converted and sold vs not sold).

During this stage, we encountered issues with:

  • Missing data and historical factors, such as modifications in business processes.
  • Discontinued partnerships, and the addition of new policies.

To address these problems, our engineers developed a Python module that contained various blocks, such as:

  • Feature Extraction from text and time data
  • Grouping of similar records to avoid model uncertainty
  • Cleaning of data, including replacing missing values and removing anomalies and outliers.

This approach enabled our data scientists to overcome data challenges and gain insights that improved the data collection logic and strengthened the ML lead scoring solution.

ML based Lead Efficacy Solution:

Our second solution aimed to improve lead efficiency through the use of the same features as before, with a different target variable. This time, we aimed to predict a continuous score, representing the number of sold policies by the call time spent on the lead by insurance agents. However, we encountered issues related to messy data related to call duration, including answering machine data and poor aggregation.

To solve this problem, we employed

  • Python module for data cleaning.
  • Designing appropriate training, validation, and testing datasets for the lead efficiency model.
  • To address the problem of imbalanced data and the situation where most of our scores equaled zero, we used only converted leads in training data.

We encountered another challenge in testing and validation. We couldn't use only converted leads because we couldn't know whether leads were sold while the model was still running a prediction for lead efficiency. To address this, we used the previous lead scoring model to test only those leads with high scores predicted by the ML model, reducing the number of non-sold leads and achieving better business metrics results.

Business Value:

  • We developed a highly accurate ML-based lead scoring model that allows the insurance company to group leads by their probability of being sold.
  • The top group with 80% or higher probability has an expected conversion rate 3.5 times higher than average, while the bottom group with 20% or lower probability has a conversion rate 5 times lower than average.
  • This scoring system eliminates guesswork and prevents wasting resources on leads that are unlikely to convert. The ML model also predicts lead quality, which is essential for insurance companies, as it affects future sales success.

By using our models, agents save time and focus on leads with high conversion potential, optimizing sales workflow and performance. We also deployed the models using AIQ a proprietary MLOPs platform of Predera and integrated them into the customer's system to avoid any integration challenges. We provide maintenance support and are available to fine-tune or retrain the models if needed.