Announcement: SOA releases passing candidate numbers for May 2024 Exam SRM.

Synthetic Medical Claims Data

Background and Purpose

Medical claims data is a pivotal resource propelling advancements in healthcare. Datasets, compiled by health insurers from individual medical claims submitted by providers for reimbursement, serve a purpose broader than facilitating financial transactions. They are invaluable for understanding and forecasting healthcare utilization, enhancing risk assessment models, formulating cost-effective and quality-optimized treatment strategies, tracking diseases and illnesses across populations and subpopulations, assessing the feasibility of new healthcare innovations, training data models, and fostering other predictive analytics.

The significant value of medical claims datasets positions them as crucial assets. However, accessing these datasets is both challenging and costly due to patient-privacy, proprietary constraints, and legal considerations. Synthetic medical claims data presents a promising solution to these challenges. It enables researchers and developers, who rely on these datasets, to continue their work without the associated costs, privacy concerns, or legal obstacles that come with authentic medical claims data.

Furthermore, as AI models trained on medical claims data become more specialized and widespread, there will be an increasing need for larger medical claims datasets for training purposes. Synthetic claims data offers a scalability that traditional datasets may struggle to match, thereby catering to the growing demands of AI model training.

Synthetic medical claims datasets are designed to replicate real patient data, preserving the inherent properties and structures of these datasets while ensuring privacy and confidentiality. Generative AI techniques, such as variational autoencoders (VAEs) and generative adversarial networks (GANs), are commonly used to create synthetic medical claims data. These models learn from existing data and generate new, realistic samples. A variety of open-source and commercial offerings with divergent methodologies being used to generate the data are available, including Synthea[1], SyH-DR[2], and Syntegra.[3] Additionally, there are ongoing large-scale private efforts to leverage the same technology.[4]

Research Objective

The Actuarial Innovation & Technology Program Steering Committee (AITPSC) is seeking researchers to produce a report that discusses the uses of synthetic medical claims data for actuaries, its current and potential future functions in the industry, an overview of the methodologies used to generate the data, and actuarial considerations required to evaluate the quality and use of the data. The main objective of this research report is to explore the uses of synthetic medical claims data for actuaries, along with its potential impacts on the actuarial profession and industries that actuaries serve. The research report will provide an overview of the opportunity and risks associated with the adoption of synthetic medical claims data in actuarial work, privacy considerations, and the potential for unintended bias. The research format will include a comprehensive literature review, research report consolidation, and interviews conducted with professionals from the actuarial profession and relevant industries.

Research format should include literature review, research report consolidation and interviews from the actuarial profession and industries they serve.

The report should include:

  • Uses of Synthetic Medical Claims Data for Actuaries – an exploration of the uses of synthetic medical claims data for actuaries. This section will explore how this data could be utilized in actuarial work, including risk assessment, pricing, reserving, and forecasting. The discussion will highlight how synthetic data, in comparison and contrast with traditional claims data assets, provides a valuable resource for actuaries working in healthcare.
  • Overview of Methodologies - an overview of the methodologies used to generate synthetic medical claims data. This section will cover the statistical methods and machine learning models used to create realistic but not real patient data. This section will also discuss the various open-source and commercial offerings available for synthetic data generation, highlighting their divergent methodologies.
  • Actuarial Considerations – a discussion of the potential actuarial considerations involved in evaluating the quality and use of synthetic medical claims data. This section will discuss the challenges and limitations of using synthetic data, such as the need for careful validation, documentation of data sources, and potential bias in the data. The section will also cover the ethical and legal considerations that actuaries must keep in mind when working with synthetic data.

Proposal

To facilitate the evaluation of proposals, the following information should be submitted:

  1. Resumes of the researcher(s), including any graduate student(s) expected to participate, indicating how their background, education, and experience bear on their qualifications to undertake the research. If more than one researcher is involved, a single individual should be designated as the lead researcher and primary contact. The person submitting the proposal must be authorized to speak on behalf of all the researchers as well as for the firm or institution on whose behalf the proposal is submitted.
  2. An outline of the approach to be used (e.g., literature search, model, survey, interviews etc.), emphasizing issues that require special consideration. Details should be given regarding the techniques to be used, collateral material to be consulted, reusability and limitations of the analysis.
  3. A description of the expected deliverables and any supporting data, tools, or other resources. Consideration should be given to the preference for externalized data that can be included in the AITPSC’s data repository.
  4. Cost estimates for the research, including computer time, salaries, report preparation, material costs, etc. Such estimates can be in the form of hourly rates, but in such cases, time estimates should also be included. Any guarantees as to total cost should be given and will be considered in the evaluation of the proposal. While cost will be a factor in the evaluation of the proposal, it will not necessarily be the decisive factor.
  5. A schedule for completion of the research, identifying key dates or time frames for research completion and report submissions. The AITPSC is interested in completing this project in a timely manner. Suggestions in the proposal for ensuring timely delivery, such as fee adjustments, are encouraged.
  6. Other related factors that give evidence of a proposer's capabilities to perform in a superior fashion should be detailed.

Selection Process

The AITSC will appoint a Project Oversight Group (POG) to oversee the project. The AITSC is responsible for recommending the proposal to be funded. Input from other knowledgeable individuals also may be sought, but the AITSC will make the final recommendation, subject to Society of Actuaries (SOA) Research Institute leadership approval. An SOA Research Institute staff actuary will provide staff actuarial support.

Questions

Any questions regarding this RFP should be directed to research-ait@soa.org.

Notification of Intent to Submit Proposal

If you intend to submit a proposal, please e-mail written notification by August 9, 2024, to research-ait@soa.org.

Submission of Proposal

Final proposals for the project should be sent via e-mail by September 6, 2024, to research-ait@soa.org.

Note: Proposals are considered confidential and proprietary.

Conditions

The selection of a proposal is conditioned upon and not considered final until a Letter of Agreement is executed by both the Society of Actuaries and the researcher.

The SOA and AITPSC reserve the right to not award a contract for this research. Reasons for not awarding a contract could include, but are not limited to, a lack of acceptable proposals or a finding that insufficient funds are available. The SOA and AITPSC also reserve the right to redirect the project as is deemed advisable.

The SOA and AITPSC plan to hold the copyright to the research and to publish the results with appropriate credit given to the researcher(s).

The SOA and AITPSC may choose to seek public exposure or media attention for the research. By submitting a proposal, you agree to cooperate with the SOA and AITPSC in publicizing or promoting the research and responding to media requests.

The SOA and AITPSC may also choose to market and promote the research to members, candidates, and other interested parties. You agree to perform promotional communication requested by the SOA and AITPSC, which may include, but is not limited to, leading a webcast on the research, presenting the research at an SOA meeting, and/or writing an article on the research for an SOA newsletter.

Conflict of Interest

You agree to disclose any of your material business, financial and organizational interests and affiliations which are or may be construed to be reasonably related to the interest, activities and programs of the SOA Research Institute / AITSC.


Endnotes

[1] | Synthea (mitre.org)

[2] Synthetic Healthcare Database for Research (SyH-DR) | Agency for Healthcare Research and Quality (ahrq.gov) 

[3] Syntegra Pushes the Boundaries of Generative AI in Healthcare with Recent Tech Updates (prnewswire.com) 

[4] Anthem Looks to Fuel AI Efforts With Petabytes of Synthetic Data - WSJ