ATLAS Tutorial: Data Sources - Data Density

OHDSI
23 May 201904:28

Summary

TLDRThis video tutorial explores the data density report feature in Atlas, a tool within the OHDSI framework. It explains how to visualize data accumulation across different domains like conditions, drugs, and procedures over time. The report offers insights into the total number of records, records per person, and distinct concepts per person, using graphs and box plots to illustrate data distribution and density. This analysis helps users understand the growth and stability of data within their domains.

Takeaways

  • 🗺️ The script introduces the data sources capability of Atlas, focusing on the data density report.
  • 🔍 Users can select a data source and explore various reports, starting with the data density report.
  • 📊 The first graph in the data density report shows the total number of data rows by time and data domain.
  • 📅 The x-axis of the graph represents calendar months, and the y-axis shows the number of records.
  • 🔑 Each data domain (conditions, drugs, observations, procedures, and visits) is represented by a separate line series.
  • 👁️ Hovering over a line in the graph reveals the specific calendar month and the number of records for a domain.
  • 📈 The second graph normalizes the data to show records per person, helping to understand data growth relative to the number of persons.
  • 🧐 By hovering, one can see the average number of records per person in a specific domain for a given month, like 2.45 records in the condition occurrence domain for October 2015.
  • 📊 The third graph provides a box plot for concepts per person across selected data domains, showing the distribution of distinct information.
  • 📊 The box plot includes minimum, maximum, median, interquartile range, and the 10th and 90th percentiles for the distribution of distinct concepts.
  • 🔎 Hovering over the box plots allows for further exploration of the data distribution.
  • 🔗 For more information on Atlas and additional data sources reports, the script directs viewers to Odyssey.org.

Q & A

  • What is the main feature of the data sources capability in Atlas?

    -The main feature of the data sources capability in Atlas is the ability to select a data source and explore it through a series of reports, such as the data density report.

  • What does the data density report in Atlas show?

    -The data density report in Atlas shows a series of graphs that represent the total number of rows of data by time and by data domain, records per person, and concepts per person across selected data domains.

  • How is the x-axis represented in the first graph of the data density report?

    -In the first graph of the data density report, the x-axis represents calendar months.

  • What does the y-axis represent in the first graph of the data density report?

    -In the first graph, the y-axis represents the number of records in the data source.

  • What information can be obtained by hovering over a specific line in the first graph of the data density report?

    -By hovering over a specific line, one can see the series represented, the calendar month shown, and the number of records in the data source for that domain in that calendar month.

  • What does the second graph in the data density report show?

    -The second graph shows records per person, normalizing the counts to be records per person by dividing the total number of records by the number of persons.

  • How does the third graph in the data density report differ from the first two?

    -The third graph focuses on concepts per person, showing the number of distinct concepts per person across different data domains, represented by box plots that display the distribution of distinct concepts.

  • What does the minimum value in the condition occurrence domain's box plot represent in the third graph?

    -The minimum value represents that every individual in the condition occurrence table has at least one distinct concept.

  • What does the maximum value in the condition occurrence domain's box plot indicate?

    -The maximum value indicates that there is an individual with as many as 800 distinct concepts in the condition occurrence domain.

  • What additional resources can be found on Odyssey.org for more information about Atlas and its data sources reports?

    -Odyssey.org provides additional details about Atlas, its capabilities, and other data sources reports that can enhance the understanding of the platform.

Outlines

00:00

📊 Data Density Report Overview

This paragraph introduces the Data Density Report feature within Atlas's data sources capability. Users can select a data source and explore various reports, with a focus on the data density report in this context. The report is visualized through several graphs that provide insights into data accumulation over time and across different data domains. The first graph displays the total number of data rows by time and data domain, with each domain represented by a line on the graph. Hovering over a line reveals specific data counts for a given month and domain. The domains covered include conditions, drugs, observations, observation periods, procedures, and visits. The second graph shifts the focus to records per person, normalizing the data to understand data density relative to the number of persons. The third graph presents concepts per person across selected data domains, using box plots to show the distribution of distinct concepts, including minimum, maximum, median, and interquartile ranges. This comprehensive approach allows users to gauge data growth and stability, and to understand the distinct information available per person.

Mindmap

Keywords

💡Data Sources

Data sources refer to the origins of data that are used within the Atlas system to generate reports. These sources provide the raw information that is analyzed and visualized in various reports, such as the data density report. In the video, selecting a data source is the first step to explore and analyze the available data.

💡Data Density Report

The data density report is a specific type of report within Atlas that visualizes the volume and distribution of data over time and across different data domains. This report includes several graphs that help users understand the accumulation and density of data. It is central to the video's demonstration of how to use the Atlas system.

💡OMOP Common Data Model

The OMOP (Observational Medical Outcomes Partnership) Common Data Model standardizes data from disparate sources to facilitate large-scale analytics. The video references this model to explain how data is organized and analyzed in the Atlas system, ensuring consistency and comparability across different datasets.

💡Graph

Graphs are visual representations of data used in the data density report to show trends and distributions. The video discusses various types of graphs, such as line graphs and box plots, which help in interpreting data related to different domains like conditions, drugs, and procedures.

💡Calendar Month

Calendar month refers to the time intervals used on the x-axis of the graphs in the data density report. It helps in tracking the accumulation and density of data over time. The video uses calendar months to show how data records grow and change from month to month.

💡Data Domain

Data domains are categories or types of data within the OMOP Common Data Model, such as conditions, drugs, observations, procedures, and visits. The video explains how each domain is represented in the data density report and how users can analyze data specific to each domain.

💡Records per Person

Records per person is a measure used in the data density report to normalize data by dividing the total number of records by the number of persons. This metric helps in understanding the data density relative to the population size. The video demonstrates how to interpret this measure to assess data accumulation per individual.

💡Distinct Concepts per Person

Distinct concepts per person refer to the unique pieces of information, such as distinct conditions or procedures, associated with each person in the dataset. The video explains how this metric is visualized using box plots to show the distribution of unique concepts across different data domains.

💡Box Plot

A box plot is a type of graph used in the data density report to show the distribution of distinct concepts per person. It illustrates the median, interquartile range, and outer ranges of the data. The video describes how to interpret box plots to understand the spread and central tendency of the data.

💡Odyssey.org

Odyssey.org is the website mentioned at the end of the video for more information about the Odyssey initiative, which includes Atlas and other data analysis tools. The video directs viewers to this site for additional details and resources related to the data sources and reports discussed.

Highlights

Introduction to Atlas' data sources reporting capability

Accessing data density report through the data sources menu

Visualization of total number of data rows by time and data domain

X-axis represents calendar months, Y-axis shows record counts

Line series for each data domain supported by OMA command data model

Data domains include conditions, drugs, observations, procedures, and visits

Interactivity feature to view specific domain's data for a given month

Example of 33 million records in condition occurrence domain in March 2015

Understanding data accumulation and density across domains over time

Second graph shows records per person, normalizing data counts

Y-axis represents records per person, considering total number of persons

Example of average 2.45 records per person in October 2015 for condition occurrence

Analyzing data density growth in relation to total data or data per person

Third graph displays concepts per person across selected data domains

Box plot representation of the distribution of distinct concepts per person

Example of a maximum of 800 distinct concepts for an individual in the condition occurrence domain

Box plot details including minimum, maximum, median, interquartile, and outer ranges

Explorability of box plots for a comprehensive understanding of data distribution

Conclusion of the data density report summary and invitation to Odyssey.org for more information

Transcripts

play00:01

[Music]

play00:09

today I'm going to walk you through some

play00:13

of the reports within the data sources

play00:15

capability of Atlas if you click on data

play00:19

sources you can select the data source

play00:21

that you are interested in exploring and

play00:23

select one of a series of reports today

play00:26

we'll look at the data density report

play00:29

when the data density report loads

play00:32

you'll see a series of graphs on the

play00:34

screen the first graph represents the

play00:37

total number of rows of data by time and

play00:41

by data domain in the Oh mop command

play00:43

data model so here you can see on your

play00:45

x-axis calendar months and on your

play00:49

y-axis you see the number of records in

play00:51

your data source and there's a align

play00:54

series for every data domain in that we

play00:57

support in the OMA command data model

play00:59

that includes conditions drugs

play01:02

observations observation periods

play01:04

procedures and visits if you hover over

play01:08

any one particular line you will see the

play01:10

series that you are represented the

play01:13

calendar month that's shown and the

play01:15

number of records in your data source

play01:17

for that domain in that calendar month

play01:20

so here for example we can see that in

play01:23

this data source in the condition

play01:26

occurrence domain there are 33 million

play01:28

records in March of 2015 this graph

play01:33

allows you to see how data are

play01:36

accumulating across your data domains

play01:38

over time and to understand which data

play01:41

domains have the greatest density of

play01:43

information the next graph shows you

play01:46

records per person similar to the first

play01:49

graph this graph is showing you calendar

play01:53

month on the x-axis and showing you a

play01:55

line series for each of the data domains

play01:57

however now the y-axis has normalized

play02:01

the the counts to be records per person

play02:04

taking the total number of records as

play02:06

represented in the first graph but

play02:08

dividing it by a common denominator

play02:10

represent

play02:11

by the number of persons so here if I

play02:15

hover over any particular graph I can

play02:17

get an understanding of the number of

play02:19

Records per person in a given data

play02:21

domain for a given calendar month

play02:24

so here I've highlighted that this

play02:26

source in October of 2015 in the

play02:29

condition occurrence domain has an

play02:31

average of two point four four or five

play02:33

records per person this graph allows you

play02:37

to understand whether or not the data

play02:39

density seems to be growing as a

play02:40

function of the total data or as a

play02:44

function of the amount of data per

play02:46

person and the relative stability that's

play02:48

included there in the third graph that's

play02:51

shown on the data density report is

play02:53

concepts per person and here this is

play02:55

giving you a sense of how much distinct

play02:58

information exists per person across

play03:02

selected data domains on the x-axis we

play03:05

show the concept type represented by the

play03:08

different data domains including

play03:10

conditions drugs observations procedures

play03:13

and on the y-axis we're showing you the

play03:16

number of distinct concepts per person

play03:18

and each graph represents a box plot

play03:22

showing you the distribution of distinct

play03:26

concepts so here on the condition

play03:29

occurrence domain we can see that the

play03:31

distribution of distinct concepts per

play03:33

person has a minimum of one for

play03:36

everybody that's in the condition

play03:37

occurrence table and a maximum of 800

play03:40

meaning that there is an individual who

play03:42

has 800 distinct concepts in the

play03:45

condition occurrence domain this

play03:47

provides you the full distribution

play03:49

including the median here representing

play03:51

that that 50% of patients have 12 or

play03:55

more distinct concepts the interquartile

play03:58

range represented by a P 25 and a P 75

play04:01

as well as an outer range represented by

play04:04

the 10% and the 90th percent percentile

play04:10

each of these box plots can be explored

play04:13

by hovering over them this concludes our

play04:16

summary of the data density report for

play04:19

more information about odyssey including

play04:21

additional details of Atlas and

play04:23

additional data sources reports check

play04:25

at Odyssey org

Rate This

5.0 / 5 (0 votes)

Related Tags
Data DensityHealthcare DataAtlas ReportsData VisualizationRecord AnalysisData AccumulationInformation DensityConcept DistributionData InsightsHealthcare Trends