About
This interactive dashboard monitors the diversity of participants across all published Genome Wide Association Studies (GWAS), the primary technique used for genetic discovery. The objective of a GWAS is to identify statistical associations between a set of genetic variants across different individuals (Single Nucleotide Polymorphisms, or 'SNPs') with specific traits of interest. This monitor is an extension of our earlier project, 'The Scientometrics of Genome Wide Association Studies' published in Communications Biology in January 2019. As there, we leverage the magnificent dataset curated by the NHGRI-EBI Catalog (subject to their licensing information which can be found here ). The dashboard itself is a combination of Python (Flask) and JavaScript (D3), designed to be used in modern web browsers for presentation. An earlier prototype appears here, with the full code base available for replication on GitHub. We are actively encouraging community-based suggestions and contributions. The backend code checks daily for updates to the NHGRI-EBI Catalog, writing to logs and refreshing the dataset which powers the dashboard as appropriate.
The dashboard is under perpetual development and review, but is currently comprised of two global widgets: METRIC (which toggles whether we are evaluating by number of studies or by number of participants), and STAGE (which determines whether we are considering the discovery or replication phase of research). Local widgets also toggle the 'EFO Parent Term', the 'Broader' ancestry category, and 'Year' (related to year of study). Due to the relatively small size of the dataset, not all widgets apply to each figure, and we describe our design choices below:
-
Summary Breakdown: Total GWAS participants diversity. Shown in the upper left panel, this displays summary statistics without any filtering (over time, ancestry, traits, or otherwise).
-
Bubble Plot: Ancestry over time by parent term. This graphic in the upper middle panel provides a granular overview of all GWAS in the Catalog, mapping onto EFO Parent terms. In addition to being affected by the STAGE global widget, it also allows a finer, more granular search term based on individual EFO Traits (and combinations thereof). Clicking on the individual bubbles provides detailed study information (including Unique Identifiers provided by the and links out to the relevant PubMed page.
-
Time Series Plot: Participants across all parent terms. This figure in the upper right panel displays how 'Broader Ancestry' varies over time across the two global widgets. Note that we do not divide by the ‘Parent Term’ widget here due to the fact that that least studied ancestry categories are relatively absent by this level of disaggregation. The tickbox for 'Include not recorded' provides a robustness check with respect to how we are mapping our 'Broader Ancestry' field.
-
Heatmap: Parent term by 'broader' ancestry. This figure in the bottom left panel displays maps studies across both STAGE and METRIC, broken down into each of the individual 'Broader Ancestry' and 'Parent Terms' across each of the years in the dataset. Hovering over the figure reveals numbers by each category.
-
Choropleth Map.Due to the sparsity of countries recruited from across all EFO Parent terms, this figure draws from the two global widgets, adjustable by Year only. Polygon data are available from here. Hovering over the figure provides the name of the country and number of participants.
-
Doughnut Chart. This figure displays the percent of 'Broader Ancestry' by 'Parent Term' across both the global widgets. A 'Show'/'Hide' toggle reveals an inset graph of the breakdown of associations discovered across all EFO Parent terms at the discovery stage alone.