Maturity Tracking Algorithm
The insured maturity tracking algorithm aims to determine a similarity between an insured in ClariNet and published obituary information provided by a third party. Details of the matching algorithm are provided below.
Significant ClariNet Fields
- Insured Name(s)
- Insured details tab
- Name from the contact set on the Insured Contact Details field
- Insured name variations
- Insured Date of Birth
- Insured Date of Death
- Insured Addresses
- All addresses from the contact set on the Insured Contact Details field
Fields in external feed
- Name components (e.g. first, middle, last)
- Recent address information (e.g. state, city)
- Date of Birth
- Date of Death
- Obituary/Obituaries text
- Information only, not used for matching
Matching algorithm
Fields are considered in groups:
- Address
- Name
- Date of Birth
- Date of Death
The best (most similar) match in each group is taken when considering the strength of the match. For example, an Insured in ClariNet may have many name variations defined, the most similar name will be used when calculating confidence.
Missing data points are skipped. For example, if the Insured in ClariNet only has a last name, the first/middle names in the external feed will not be considered. If the record in the external field has no last name or maiden name, it is skipped entirely.
Fields are weighted so that more specific fields contribute a greater amount to the overall confidence score. That is, name is more specific than city so is weighted more in the confidence calculation.
The overall confidence score is a measure of how close to a perfect match the data is. For example, if we have names and DoB then a perfect match will be the sum of the weights of those fields. If the fields don’t match perfectly, then the confidence will be the sum of the weighted field matches divided by the “perfect match” value. This is the total similarity. Using the two field example:
That is to say:
Field weightings are controlled by ClearLife and have been calculated to slightly favour false positives rather than missing a potential match. In practice, this means that names are weighted much more strongly than things like address which may be out of date.
Address match
Address components are matched individually. A similarity score between ClariNet and the external feed is calculated. This allows for data entry mistakes, for example: “Columbs” instead of “Columbus” without producing too many false positive matches.
Name match
Name matching follows a similar rule to addresses. However, common patterns are also tried; for example: Middle name is also tried as first name.
The individual components of the name are not considered separately, a full name is generated from the components and compared.
Date of Birth/Death match
Dates in the external feed may have limited information:
- Year only
- Year and Month only
- Full date
Dates with limited information are considered with half the weight of a full date match.
The number of days between the two dates is considered to score the similarity. Zero days difference constitutes a perfect match.
Match rank/confidence
Once the total similarity is calculated (ratio of actual match score/perfect match score), any score below a configured minimum value is rejected (rank 0). For anything meeting the minimum similarity, a rank of 1-4 is determined with the given label.
- Low
- Medium
- Medium
- High
Visually, that looks like this:
