« Back

February 26, 2016

CCTS Inaugurates New Training Program on Using Big Data and Electronic Health Records to Advance Translational Research
By Editorial Staff

One of the new initiatives the Rockefeller University Center for Clinical and Translational Science (CCTS) is implementing a training program to use data from electronic health records (EHRs) as a research tool to advance translational science across the entire spectrum from basic science discoveries to epidemiology. The training program will enable bench to bedside (T1) researchers to utilize EHR databases to answer basic science research questions using clinical data by analyzing large data sets to test basic and translational science hypotheses. Traditionally, querying such databases has been the exclusive province of epidemiologists and health services investigators (T3-T4) and the primary goals have been to identify key information for improving health care delivery systems. The CCTS leadership believes that one of the key new skills that will advantage the translational workforce throughout the entire T0-T4 spectrum is the ability to obtain valid information from the new large clinical data sets.

In 2010 Dr. Barry Coller, CTSA PI, developed an agreement with QUEST Diagnostics to collaborate on projects joining an investigator at Rockefeller with investigators at QUEST. This agreement led to the collaboration between Dr. Manish Ponda, a Clinical Scholar graduate from Dr. Jan Breslow’s laboratory, and information technology experts at QUEST to assess the impact of oral vitamin D repletion on serum cholesterol levels. Dr. Ponda was able to confirm from data cleverly extracted from the QUEST database the previous finding by others of an inverse epidemiologic association between vitamin D levels and serum cholesterol levels. He was also able to confirm the paradoxical results from his small size mechanistic study that oral vitamin D repletion in patients deficient in vitamin D does not lower cholesterol levels. To reconcile these findings, he conducted a subsequent study to compare the impact on cholesterol levels of oral vitamin D repletion versus controlled exposure to UV light.
Preliminary data indicate that UV light may have a disparate effect on cholesterol metabolism pathways, and this finding can potentially not only reconcile the epidemiologic and clinical study findings, but point to new therapeutic measures. This experience and the emerging availability of large clinical data sets encouraged the CCTS to expand this initiative by developing a course focused on T0-T1 translational investigators to assist them in gaining the skills of querying population-based large databases to extract data that can help them test their basic and translational hypotheses at the population level.

Dr. Ponda, Dr. Coller and Dr. Jonathan Tobin, a PhD epidemiologist, President of Clinical Directors Network (a practice-based research network), and Co-Director of the Community Engagement program, who has extensive experience querying large databases, developed a draft curriculum. This was then reviewed by Dr. Richard Platt of Harvard, who leads FDA Sentinel large data program and the PCORnet coordinating center, and has a lead role in the NIH Collaboratory program that focuses on large pragmatic clinical trials, and Dr. Leslie Curtis the Co-Director of the Collaboratory coordinating center. After incorporating their suggestions, the draft curriculum was distributed to the New York City CTSA leaders for their input and participation. Representatives from Columbia (Siqin Kye Ye, MD, MS), NYU (Yindalon Aphinyanaphongs, MD, PhD), and Cornell (Elizabeth Wood, MS) then joined the course leadership. An X02 application was submitted to the National Center for Advancing Translational Science (NCATS) that proposes both an EHR didactic course and a “hands-on” module where investigators can query EHR data in various formats.

These laboratory-based modules will be designed to teach discrete querying skills so that investigators gain familiarity with various data models that may be relevant to their research (e.g., structured, unstructured, or enterprise databases). To further facilitate utilization of big datasets, a third aim of the X02 is to form a working group to share experiences, standardize query development processes, and identify new opportunities to enhance cross-CTSA collaboration.
Faculty with appropriate expertise will be drawn from participating CTSA institutions to teach and facilitate discussions among course participants. In addition, external experts will be invited to incorporate niche knowledge outside of the CTSA network. Classes will be rotated among participating institutions, but all classes will be broadcast live to remote sites, as well as archived in an online repository.

Teaching faculty will determine the appropriate resource materials to prepare for each class, and participants will supplement their learning by accessing the wealth of information available from online resources, including the NIH Collaboratory Knowledge Repository. The topics to be covered include: 1. Understanding how and why electronic data are captured and what it means for research; 2. Defining a query: translating a basic science discovery into a tractable clinical question; 3. Utilizing and incorporating existing knowledge through effective literature searches; 4. Understanding human subjects protection and HIPAA regulations; 5. The NIH Collaboratory, PCORNet, FDA Mini-Sentinel and NYC-CDRN; 6. The structure of EHR databases; 7. Principles and practice of distributed research data networks; 8. Identifying a database: parameters available; representative population; data quality; 9. Constructing a query: working with clinicians and other stakeholders, epidemiologists and data specialists; inclusion/exclusion criteria; 10. Choosing the appropriate descriptive and inferential statistical methods; 11. Data Integrity: sources of error/threats to validity; importance of data refresh; internal measures of validity; longitudinal completeness; 12. Teamwork and Leadership: collaborating with external data managers to design, develop and execute studies; 13. Bioethical considerations in searching patient databases; 14. Effective ways to communicate results and study output: tables; graphics; presentation formats; 15. Case studies using topics proposed by the trainees. Readings were selected for each topic from a variety of sources, most notably the government’s HealthIT.gov site; the 2014 Institute of Medicine report on integrating research and practice; the NIH Collaboratory, PCORnet, and FDA Sentinel web sites; and the Collaboratory on-line Living Textbook. The first tutorial was conducted with the Clinical Scholars on July 29 and the immediate feedback was extremely positive!