Recognizing and Resolving Bad Data

August 2024

Pension system longevity, while great actuarially, poses unique technological challenges. For most of the last century, technology changed slowly enough that good procedures and gradual tech adjustments kept most systems current. Today, pensions systems need to adapt to the rapid pace of changing technology. For data this includes the volume of data stored, storage methods and especially data security methods. Because pension systems are designed to last scores of years, clean, well-managed data is paramount to smoothing the multiple upgrades and transitions that systems will experience.

This article will help you look at your data differently and think about how you can maintain its quality, completeness and usefulness over the long life of your system.

As you look at your data, you will want to assess its compatibility with your business rules. Oftentimes the rules were enforced procedurally rather than with an input validation, leaving behind a trail of inconsistent data initially accepted by the system. Trying to squeeze that data through validation-enforced business rules results in error reports that need review and correction. Imagine an error report of every account that has named a spouse beneficiary without a birth date (making benefit estimating impossible). Depending on the size of your membership and the age of your system, these reports could have thousands of errors for each business rule.

In our pension solution implementations, we see three primary types of bad data.

  1. Incomplete data
  2. Inaccurate data, including faked data and unnecessary data
  3. Missing data, including rogue or fugitive data

Incomplete Data

A common example of incomplete data happens during a technology transition when the pension system chooses not to convert detailed service and contribution data. Instead, only the totals effective on the date of the transition are converted. Members with these summary records will reach retirement age with essentially unprovable amounts of service and contributions. This creates potential liability risks and poses a challenge for how to treat these records during your next technology transition.

Resolution: Today’s PAS solutions can store and maintain detailed data records more simply and less expensively than ever before. As you begin a technology transition, rely on an experienced data architect to map all your data into a logical, accessible solution. The effort you put in to maintain detailed records will pay off in the long run.

Inaccurate Data

Deferred and inactive members are probably the best example of inaccurate data. Members leave service then move, change their name or pass away. According to the US Census Bureau (USCB), 12.8% of Americans move annually and the national marriage rate for women is 16.7 (2022; USCB American Community Survey). Those stats make data maintenance for this group a common challenge.

Resolution: A robust campaign to stay connected with inactive members is necessary, especially when a benefit or refund is due. Continue to include these members in your annual member statement messages and reach out regularly by email to make certain you have current contact information. One emerging contact method for pension systems is texting as people are increasingly keeping their cell phone numbers regardless of where they live or work.

A subset of inaccurate data is faked data. This data often gets created when information isn’t immediately available and the (older) solution allowed blank fields or creative entries. Faked data almost never complies with business rules and can interfere with implementing automated logical validations.
Here are some common examples.

  • Multiple individuals with the same fake SSN (e.g., 111-11-1111)
  • Member and spouse have the same SSN
  • Forced dates of birth (1/1/1900)
  • ZIP codes as 00000
  • Placeholders for names (e.g., John Doe)

Resolution: Queries that identify faked data are reasonably easy to write. Once you’ve run the reports, prioritize them and plan resources to research and correct the accounts.

Another type of inaccurate data is older, unnecessary data. This can often be status fields (e.g., refunded, vested, returned-to-work, etc.) that were used to trigger workflows or other activities. Unnecessary data can also be total fields, like service credit or contribution totals. In newer PAS solutions, this information is usually calculated using the most current member data, so pulling the old status fields or totals fields forward is unnecessary. Maintaining these fields results in duplicate sources for the same information creating both confusion and risk.

Resolution: The temptation to keep data fields that you have relied upon for a very long time is great. Your data architect will give good guidance on why some fields are unnecessary and how maintaining them can complicate your new PAS database.

Missing Data

This data simply doesn’t exist, and it is usually data you wish you had. For example, many systems need to contact named beneficiaries when a death is reported but the data has only a name and a relationship. There are myriad examples of missing data.

  • You want to start a texting campaign but don’t have identifiable mobile numbers or a field to store permission-to-text.
  • You want to email retirees but don’t have post-employment email addresses.
  • You want to connect retirees to senior services in their counties but don’t have county data.

Resolution: Expanding the dataset to include the data you wish you had is the only way to address this. It means working with your data architect to ensure your database is flexible enough to add fields as you identify new avenues for reaching your members and the data needed to support those avenues.

While technically not missing, almost every system has data that is hard to access, sometimes called rogue or fugitive data. Often the biggest problem with this type of data is just finding it all. It’s in spreadsheets, old Access databases, on microfiche, stored offsite and likely not digitized.

Resolution: Digitized data can be cleaned up and moved into the core PAS database with relative ease. For non-digitized data, it’s important to weigh the value and accuracy of the data to determine whether it is worth converting. Some systems convert this data manually only when it is needed to support a member’s account.

While serving at Michigan ORS, we made it a practice to run analyses to identify bad data. We searched for data that was incorrect, wasn’t useful because of old database limitations, or had bad formats because of practices that became outdated. This regular exercise helped us keep our data clean and identify validations that would keep the data clean going forward. This activity is important for all pensions systems, especially before you face your next technology transition.

Written by Laurie Mitchell, Senior Business Consultant

Tegrit

© Tegrit Software Ventures, Inc. All Rights Reserved.

Tegrit icon

Tegrit’s industry-leading experts leverage the latest technology to improve your business efficiencies and reduce your administrative costs.

Work with the team that works with integrity. Contact us today to schedule your demo and meet our exceptional team!

Contact Us Today