Detailed gene set information
A table summarizing significant gene-phenotype associations from multiple databases
A summary plot visualizing these associations for the gene sets
After clicking 'Run GeneSet Analysis', navigate to the specific section for gene set analysis.
You can find a web tutorial here: Web tutorial.
Instead using this shiny app, it might be a better option to consider using the shiny app and R Markdown file locally (GitHub) for visualizations of large gene lists (>500 genes).
Alternatively, click 'Generate Report' to create an HTML file summarizing the gene/variant-phenotype association results from the gene set analysis (< 100 genes).
Note: This might take a few minutes. If the operation fails, please refresh the page and try again.
About
This web application integrates data from public databases such as the AstraZeneca PheWAS Portal, FinnGen, GWAS Catalog, Human Phenotype Ontology, gnomAD, and ClinVar, to generate visual summaries of gene, genetic variant, phenotype, and association information ( Human gene-phenotype correlations ).
Data Sources
AstraZeneca PheWAS Portal:
Gene-phenotype associations, the largest and most comprehensive exome-wide genotype-phenotype dataset, rare-variant genetic association data. Provides both gene and variant-level phenome-wide association statistics (PheWAS) using the exome sequences of the UK Biobank participants and considered ~17K binary and ~1.4K quantitative phenotypes.
FinnGen:
The FinnGen research project is an academic industrial collaboration aiming to identify genotype-phenotype correlations in the Finnish founder population designed to develop the potential of these resources to serve medicine initiate and enrich drug discovery programs.
GWAS Catalog:
The NHGRI-EBI Catalog of human genome-wide association studies.
Human Phenotype Ontology (HPO):
A standardized vocabulary of phenotypic abnormalities encountered in human disease.
gnomAD:
Provides information on human genetic variation in healthy individuals across a diverse range of genetic ancestry groups.
ClinVar:
Report the relationships among human variations and phenotypes, with supporting evidence.
This was developed by Jiru Han with input from Melanie Bahlo and other Bahlo Lab members
For any queries or suggestions, please contact Jiru Han: han.ji@wehi.edu.au
Gene Summary
This module enables the extraction of comprehensive gene set information, including Entrez, Ensembl, and Uniprot IDs, genomic locations, gene function summaries, gene sequences, transcript counts, gene biotypes, and additional details. Additionally, it summarizes significant gene-phenotype associations from databases such as AZPheWAS, GWAS Catalog, and FinnGen, identifying the number of associated genes and presenting the results in both summary tables and figures.
Significant Association P-value Threshold:
AZPheWAS (p ≤ 1e-8)
GWAS Catalog (p ≤ 5e-8)
FinnGen (p ≤ 5e-8)
Detailed gene set information
A table summarizing significant gene-phenotype associations from multiple databases
A summary plot visualizing these associations for the gene sets
Gene-Phenotype Association
This module in GeneSetPheno provides a detailed summary of gene-phenotype associations. Significant associations from AZPheWAS, GWAS Catalog, and FinnGen databases are summarized for the gene set. This information is presented in an interactive table and figure, offering comprehensive details for each gene, including all associated phenotypes, phenotype categories, and variants.
A summary plot that displays significant gene–phenotype associations for the gene sets. Users can easily access detailed information about each gene’s phenotype associations across databases, such as phenotype categories, by hovering over a gene or database.
A cluster plot showing significant gene–phenotype associations across databases for the gene sets.
A table that displays significant gene–phenotype associations for the gene sets. Users can access all relevant phenotype categories, phenotypes, and variants across multiple databases for each gene, offering a clear and comprehensive overview of significant gene–phenotype associations.
Variant-Phenotype Association
This module focuses on four key components: a summary table of variant-phenotype associations, AZPheWAS, GWAS Catalog, and Finngen, with the aim of displaying significant associations between genetic variants and various phenotypes from different databases.
The summary table provides a comprehensive overview of variant associations from various databases, including details like the gene list group, gene symbol, variant, rsID, allele frequency in gnomAD, link-outs to the gnomAD browser, and clinical significance from ClinVar. It also consolidates significant phenotype data from AZPheWAS, GWAS Catalog, and Finngen, along with additional information on phenotype categories.
Phenotypic profile clustering is performed by grouping genes according to their associations with upper-level phenotype categories. For each category, significant variant-phenotype associations for each gene are assessed, followed by hierarchical clustering. This analysis highlights genes with similar phenotype associations. The x-axis represents genes, and the y-axis represents phenotype categories. The color indicates whether there is a significant association of each gene across phenotype categories: red represents a significant association, while white represents no significant association.
The phenotype distribution overview section visualizes phenotype association patterns across different gene set groups. It aggregates all phenotypes associated with each gene list within each phenotype category, either by counting unique phenotypes or calculating the proportions of genes.
The mean effect of each gene across various phenotype categories is calculated by averaging the odds ratios or absolute effect sizes from all significant variant-phenotype associations within each gene, for both binary and continuous phenotypes. This reflects the estimated overall effect of each gene within each phenotype category.
Phenotypic profile clustering is performed by grouping genes according to their associations with upper-level phenotype categories. For each category, significant variant-phenotype associations for each gene are assessed, followed by hierarchical clustering. This analysis highlights genes with similar phenotype associations. The x-axis represents genes, and the y-axis represents phenotype categories. The color indicates whether there is a significant association of each gene across phenotype categories: red represents a significant association, while white represents no significant association.
The phenotype distribution overview section visualizes phenotype association patterns across different gene set groups. It aggregates all phenotypes associated with each gene list within each phenotype category, either by counting unique phenotypes or calculating the proportions of genes.
Phenotypic profile clustering is performed by grouping genes according to their associations with upper-level phenotype categories. For each category, significant variant-phenotype associations for each gene are assessed, followed by hierarchical clustering. This analysis highlights genes with similar phenotype associations. The x-axis represents genes, and the y-axis represents phenotype categories. The color indicates whether there is a significant association of each gene across phenotype categories: red represents a significant association, while white represents no significant association.
The phenotype distribution overview section visualizes phenotype association patterns across different gene set groups. It aggregates all phenotypes associated with each gene list within each phenotype category, either by counting unique phenotypes or calculating the proportions of genes.
The mean effect of each gene across various phenotype categories is calculated by averaging the odds ratios or absolute effect sizes from all significant variant-phenotype associations within each gene, for both binary and continuous phenotypes. This reflects the estimated overall effect of each gene within each phenotype category.
HPO Phenotype
This module enables visualization of phenotype enrichment results and comparative phenotype analysis of different gene sets using the HPO resource.
This module integrates functions from the R package PhenoExam . For more information, please refer to the Paper, and GitHub.
Phenotype enrichment analysis (HPO databases: 19,248 genes, 7,861 phenotypes, 186,290 Human gene-phenotype associations) on the gene set. The results display the top enriched terms for the gene set in a plot. This will display information about the phenotype term ID, term name, source of the term, Bonferroni-adjusted p-value for enrichment, the number of genes associated with the term in the database (genes_associated_in_db), the number of genes in gene sets linked to the term (gene_overlap), the overlap ratio (gene_overlap/genes_associated_in_db), the raw p-value, and the gene symbols of gene sets linked to the term.
PhenoExamWeb pvalues (PhenoExam)
Phenotype enrichment analysis (HPO databases) on the gene set. The results display the top enriched terms for the gene set in a summary table. This will display information about the phenotype term ID, term name, source of the term, Bonferroni-adjusted p-value for enrichment, the number of genes associated with the term in the database (genes_associated_in_db), the number of genes in gene sets linked to the term (gene_overlap), the overlap ratio (gene_overlap/genes_associated_in_db), the raw p-value, and the gene symbols of gene sets linked to the term.
This is a web tutorial that demonstrates the usage and analysis results of the GeneSetPheno application.
Download this file for a detailed overview of all results generated by GeneSetPheno.
Data Format:
The example dataset for GeneSetPheno be accessed on the GeneSetPheno R Shiny homepage by clicking the 'Download Example Input File' button. The application requires only a single input file (CSV, TXT, or XLSX format) containing gene list information with two columns: “Group” (representing the gene list group, such as a specific disease or condition) and “Gene” (listing gene names as approved by the Hugo Gene Nomenclature Committee (HGNC)).
Upload & Run Analysis:
a. Navigate through your local files to select data
b. After clicking 'Run GeneSet Analysis', and then need to navigate to the specific section for gene set analysis.
c. Download example input dataset
d. Instead of navigating to the specific section for gene set analysis, after uploading the data, click 'Generate Report' to create an HTML file summarizing the gene/variant-phenotype association results for gene sets (< 100 genes). This may take a few minutes.
Upon clicking 'Generate Gene Summary', the following outputs will be produced: a table with detailed gene set information, a Venn diagram displaying gene set counts and overlaps between groups, a table summarizing significant gene-phenotype associations from multiple databases, and a summary plot visualizing these associations for the gene sets.
Clicking 'Gene–Phenotype Associations' generates two summary plots and a table displaying significant gene–phenotype associations.
The interactive heatmap shows associations based on the input gene list, with detailed information accessible by hovering over genes or databases.
The summary table provides in-depth data on gene–phenotype associations across multiple databases.
The variant-phenotype associations module in GeneSetPheno highlights four key components: a summary table of significant variant-phenotype associations and detailed data from AZPheWAS, the GWAS Catalog, and FinnGen, reflecting their diverse phenotypic data and unique characteristics. This module serves as a comprehensive resource for showcasing significant associations between genetic variants and diverse phenotypes.
a. The summary table displays significant variant-phenotype associations by gene, integrating data from multiple databases. It includes gene symbols, variant details, rsID, allele frequencies from gnomAD, gnomAD browser links, clinical significance from ClinVar, and phenotypes from AZPheWAS, GWAS Catalog, and FinnGen. Users can search by variant or phenotype keyword and download the results.
b. Each database includes phenotypic profile clustering, phenotype distribution, and variant-phenotype gene effects, highlighting gene clusters with shared phenotype associations and identifying unique distribution patterns for each group.
This module identifies key phenotypic terms within gene sets.
a. Select Gene Group for Phenotype Enrichment Analysis. This will automatically update group information after uploading the gene list input data.
b. Run Phenotype Enrichment Analysis: this will display a plot and table illustrating the top enriched phenotype terms.
For any queries or suggestions, please contact Jiru Han: han.ji@wehi.edu.au