ATLAS Tutorial: Data Sources - Data Density
Summary
TLDRThis video tutorial explores the data density report feature in Atlas, a tool within the OHDSI framework. It explains how to visualize data accumulation across different domains like conditions, drugs, and procedures over time. The report offers insights into the total number of records, records per person, and distinct concepts per person, using graphs and box plots to illustrate data distribution and density. This analysis helps users understand the growth and stability of data within their domains.
Takeaways
- 🗺️ The script introduces the data sources capability of Atlas, focusing on the data density report.
- 🔍 Users can select a data source and explore various reports, starting with the data density report.
- 📊 The first graph in the data density report shows the total number of data rows by time and data domain.
- 📅 The x-axis of the graph represents calendar months, and the y-axis shows the number of records.
- 🔑 Each data domain (conditions, drugs, observations, procedures, and visits) is represented by a separate line series.
- 👁️ Hovering over a line in the graph reveals the specific calendar month and the number of records for a domain.
- 📈 The second graph normalizes the data to show records per person, helping to understand data growth relative to the number of persons.
- 🧐 By hovering, one can see the average number of records per person in a specific domain for a given month, like 2.45 records in the condition occurrence domain for October 2015.
- 📊 The third graph provides a box plot for concepts per person across selected data domains, showing the distribution of distinct information.
- 📊 The box plot includes minimum, maximum, median, interquartile range, and the 10th and 90th percentiles for the distribution of distinct concepts.
- 🔎 Hovering over the box plots allows for further exploration of the data distribution.
- 🔗 For more information on Atlas and additional data sources reports, the script directs viewers to Odyssey.org.
Q & A
What is the main feature of the data sources capability in Atlas?
-The main feature of the data sources capability in Atlas is the ability to select a data source and explore it through a series of reports, such as the data density report.
What does the data density report in Atlas show?
-The data density report in Atlas shows a series of graphs that represent the total number of rows of data by time and by data domain, records per person, and concepts per person across selected data domains.
How is the x-axis represented in the first graph of the data density report?
-In the first graph of the data density report, the x-axis represents calendar months.
What does the y-axis represent in the first graph of the data density report?
-In the first graph, the y-axis represents the number of records in the data source.
What information can be obtained by hovering over a specific line in the first graph of the data density report?
-By hovering over a specific line, one can see the series represented, the calendar month shown, and the number of records in the data source for that domain in that calendar month.
What does the second graph in the data density report show?
-The second graph shows records per person, normalizing the counts to be records per person by dividing the total number of records by the number of persons.
How does the third graph in the data density report differ from the first two?
-The third graph focuses on concepts per person, showing the number of distinct concepts per person across different data domains, represented by box plots that display the distribution of distinct concepts.
What does the minimum value in the condition occurrence domain's box plot represent in the third graph?
-The minimum value represents that every individual in the condition occurrence table has at least one distinct concept.
What does the maximum value in the condition occurrence domain's box plot indicate?
-The maximum value indicates that there is an individual with as many as 800 distinct concepts in the condition occurrence domain.
What additional resources can be found on Odyssey.org for more information about Atlas and its data sources reports?
-Odyssey.org provides additional details about Atlas, its capabilities, and other data sources reports that can enhance the understanding of the platform.
Outlines
📊 Data Density Report Overview
This paragraph introduces the Data Density Report feature within Atlas's data sources capability. Users can select a data source and explore various reports, with a focus on the data density report in this context. The report is visualized through several graphs that provide insights into data accumulation over time and across different data domains. The first graph displays the total number of data rows by time and data domain, with each domain represented by a line on the graph. Hovering over a line reveals specific data counts for a given month and domain. The domains covered include conditions, drugs, observations, observation periods, procedures, and visits. The second graph shifts the focus to records per person, normalizing the data to understand data density relative to the number of persons. The third graph presents concepts per person across selected data domains, using box plots to show the distribution of distinct concepts, including minimum, maximum, median, and interquartile ranges. This comprehensive approach allows users to gauge data growth and stability, and to understand the distinct information available per person.
Mindmap
Keywords
💡Data Sources
💡Data Density Report
💡OMOP Common Data Model
💡Graph
💡Calendar Month
💡Data Domain
💡Records per Person
💡Distinct Concepts per Person
💡Box Plot
💡Odyssey.org
Highlights
Introduction to Atlas' data sources reporting capability
Accessing data density report through the data sources menu
Visualization of total number of data rows by time and data domain
X-axis represents calendar months, Y-axis shows record counts
Line series for each data domain supported by OMA command data model
Data domains include conditions, drugs, observations, procedures, and visits
Interactivity feature to view specific domain's data for a given month
Example of 33 million records in condition occurrence domain in March 2015
Understanding data accumulation and density across domains over time
Second graph shows records per person, normalizing data counts
Y-axis represents records per person, considering total number of persons
Example of average 2.45 records per person in October 2015 for condition occurrence
Analyzing data density growth in relation to total data or data per person
Third graph displays concepts per person across selected data domains
Box plot representation of the distribution of distinct concepts per person
Example of a maximum of 800 distinct concepts for an individual in the condition occurrence domain
Box plot details including minimum, maximum, median, interquartile, and outer ranges
Explorability of box plots for a comprehensive understanding of data distribution
Conclusion of the data density report summary and invitation to Odyssey.org for more information
Transcripts
[Music]
today I'm going to walk you through some
of the reports within the data sources
capability of Atlas if you click on data
sources you can select the data source
that you are interested in exploring and
select one of a series of reports today
we'll look at the data density report
when the data density report loads
you'll see a series of graphs on the
screen the first graph represents the
total number of rows of data by time and
by data domain in the Oh mop command
data model so here you can see on your
x-axis calendar months and on your
y-axis you see the number of records in
your data source and there's a align
series for every data domain in that we
support in the OMA command data model
that includes conditions drugs
observations observation periods
procedures and visits if you hover over
any one particular line you will see the
series that you are represented the
calendar month that's shown and the
number of records in your data source
for that domain in that calendar month
so here for example we can see that in
this data source in the condition
occurrence domain there are 33 million
records in March of 2015 this graph
allows you to see how data are
accumulating across your data domains
over time and to understand which data
domains have the greatest density of
information the next graph shows you
records per person similar to the first
graph this graph is showing you calendar
month on the x-axis and showing you a
line series for each of the data domains
however now the y-axis has normalized
the the counts to be records per person
taking the total number of records as
represented in the first graph but
dividing it by a common denominator
represent
by the number of persons so here if I
hover over any particular graph I can
get an understanding of the number of
Records per person in a given data
domain for a given calendar month
so here I've highlighted that this
source in October of 2015 in the
condition occurrence domain has an
average of two point four four or five
records per person this graph allows you
to understand whether or not the data
density seems to be growing as a
function of the total data or as a
function of the amount of data per
person and the relative stability that's
included there in the third graph that's
shown on the data density report is
concepts per person and here this is
giving you a sense of how much distinct
information exists per person across
selected data domains on the x-axis we
show the concept type represented by the
different data domains including
conditions drugs observations procedures
and on the y-axis we're showing you the
number of distinct concepts per person
and each graph represents a box plot
showing you the distribution of distinct
concepts so here on the condition
occurrence domain we can see that the
distribution of distinct concepts per
person has a minimum of one for
everybody that's in the condition
occurrence table and a maximum of 800
meaning that there is an individual who
has 800 distinct concepts in the
condition occurrence domain this
provides you the full distribution
including the median here representing
that that 50% of patients have 12 or
more distinct concepts the interquartile
range represented by a P 25 and a P 75
as well as an outer range represented by
the 10% and the 90th percent percentile
each of these box plots can be explored
by hovering over them this concludes our
summary of the data density report for
more information about odyssey including
additional details of Atlas and
additional data sources reports check
at Odyssey org
5.0 / 5 (0 votes)