Platform Development & Data Harmonization
Data harmonization involves integrating and reorganizing diverse data from different sources into a standard consistent structure. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) is one example of such a standard format for data harmonization. The benefits of the OMOP-CDM are ensuring data consistency across various data sources, promoting seamless data exchange and collaboration, streamlining data analysis, accommodating large and complex datasets, and ensuring the reproducibility of analysis and findings. The OMOP-CDM harmonization process involves reorganizing data into a single schema using several key steps:
Data Acquisition
Identifying relevant health data sources.
Data Cleaning & Preprocessing
Checking and addressing data completeness, accuracy, missing values, outliers, and inconsistencies.
Vocabulary Mapping
Map source vocabularies to standardized OMOP vocabularies i.e., Concept IDs.
Data Mapping
Creating a data model that aligns with the OMOP CDM structure using mapping tools.
Transformation
Converting the cleaned and mapped data into the OMOP-CDM schema using various ETL tools.
Data Integration
Combine transformed data from different sources into a unified dataset.
Validation
Ensuring data integrity and consistency within the OMOP-CDM structure.
Data Analysis
Utilizing analytic tools and techniques to explore and analyze the harmonized data.
The OMOP-CDM leverages open-source tools and resources for the harmonization process: programming languages (R, Python, SQL); relational database management systems (Postgres); and some tools (White Rabbit, Rabbit in a Hat, USAGI, ATLAS) provided by OHDSI (Observational Health Data Sciences and Informatics).