With an aging system, slowed down by exponential growth in data and made worse by the pandemic, the California Healthcare Eligibility, Enrollment, and Retention System (CalHEERS) faced significant technical challenges related to data storage and processing, which ultimately impacted the program’s ability to report progress to the executives at Covered California. This summer, the system completed a major upgrade to its data warehouse (DWH) and created a health exchange analytics program to help Californians better understand their options when choosing health insurance.
As a state-based health insurance marketplace, it’s critical for CalHEERS to effectively store and manage nine-plus years of historical data totaling up to roughly 150 terabytes.
Announced at the Data and AI Summit in July 2022, CalHEERS is pivoting to a modernized analytics platform consisting of a “cloud native data lake and DWH solution using Databricks Lakehouse Platform along with other key technologies.”
A data lake can be described as a more elastic, low-cost, and agile data management solution than a data warehouse, which typically features a more structured model.
“This lakehouse oriented architecture provides significantly higher performance and elastic scalability to better handle larger/varying data volumes with much lower cost of ownership compared to the existing solution,” reads the Data and AI Summit website.
According to the website, the shift to a server-less solution will improve performance, enhance scalability, and result in a lower cost of ownership.
An the summit, Enterprise Infrastructure Chief for the Office of Systems Integration Perminder Bagri noted that some of the technical challenges facing CalHEERS include:
- High data volume and rapid data growth
- Multi-format data readability
- Late availability of data, less time for analysis, and lower confidence in results
- Lack of advanced analytics capability
- Up to 10x variability in data processing volumes (spurred by the COVID-19 pandemic)
Bagri also stated that elasticity is a major factor in choosing the right data management solution for CalHEERS.
“We need a solution which can scale for the demand,” Bagri said. “And when we don’t need it, we can just scale it down to save the cost and to save the infrastructure.”
Deven Dharm, project lead for CalHEERS partner Deloitte, also joined the summit to speak about Deloitte’s role in the pivot to the new lake-style infrastructure.
The decision came about following interviews in 2020 with over 35 key stakeholders across numerous business divisions.
“We looked at various architectures,” Dharm said. “The paradigm that came forward was a modern, open data lake architecture where the state has control of the data and where you can bring your tools as tools continue to evolve in the market.”
Dharm’s statement regarding the decision to change architecture reflects two key differences between a lake and warehouse approach: the state should own the data rather than the business professionals and the infrastructure should be equipped to work in ever-changing analytical tools.
Furthermore, CalHEERS and its partners moved to the lake-style infrastructure to lower the need for constant maintenance as well as moving to a pay-as-you-go model that brings down the total operational cost.
Dharm also felt that the data lake infrastructure would allow CalHEERS to foster innovation and answer business questions in a more timely manner. Using collaborative notebooks, unified analytics, and high-performance, self-service analytics allows communication between different users as well as isolated work that doesn’t impact other users.
“We have been highly successful in delivering the vision,” Dharm said.