1.Overview


PsyMuKB (Psychiatric Mutation Knowledge Base) is a comprehensive knowledge base that provides information about various types of de novo variants, such as single nucleotide variations (SNVs), small insertions and deletions (indels) and copy number variation (CNVs), reported in patients with psychiatric disorders. (Detail in Table 1)

PsyMuKB currently includes published de novo variants, both coding and non-coding, from 25 neuropsychiatric diseases, including Autism Spectrum Disorder (ASD), Bipolar Disorder (BP), Developmental Delay (DD), Intellectual Disability (ID), Obsessive Compulsive Disorder (OCD), Schizophrenia (SCZ), Tourette Syndrome (TD), etc., and unaffected siblings as control (CTR). (Detail in Table 1)

In addition, large-scale genomics data is integrated into the PsyMuKB database including mRNA brain expression, transcript and regulatory element annotation, and protein-protein interactions (PPI). Thus, PsyMuKB provides users with the comprehensive exploration of the variants with the visualization, in terms of the brain expression pattern of the disrupted gene and/or isoform, its genomic location in transcripts, and PPIs of the impacted protein.


Fig 1. Flowchart of the PsyMuKB database


2.Data summary


At present, the PsyMuKB database contains various types of mutations and more than ~876,178 de novo variants. (Table 1).


Table 1. Statistics of de novo mutations in PsyMuKB


Table 2. The types of de novo variants


3.Browse

3.1 Browse by gene symbol

You can select the interested gene by browsing through the list of gene symbols.(Fig 2.)


Fig 2. The interface of browsing by gene symbol


3.2 Browser by gene sets

You can select the interested gene by selecting symbols from different gene sets. PsyMuKB provides users with 16 different gene sets curated from the literature that are associated with neuronal functional, developments or behaviors, etc.(Fig 3.)


Fig 3. The interface of browsing by gene sets



There are a number of options available for querying the PsyMuKB.

4.1 Quick Search:

You can search by gene symbol, gene ID, or chromosome coordinates on "Home" page.


Fig 4. Quick search interface


4.2 Advance search:

  • Search by gene symbol or gene entrez ID (same as in quick Search):

  • Fig 5. Gene Search interface


  • Search by DNMs:
  • One can search for more precise de novo mutations by indicating mutations type and disorder, with an optional specification of chromosome number and coordinates.


    Fig 6. DNM Search interface


  • Search by CNVs:
  • One can search for more precise de novo copy number of variants by indicating mutations type and disorder, with an optional specification of chromosome number and coordinates.


    Fig 7. CNV Search interface



The query results contain two major sections: "Gene information" and "Mutation information".

5.1 Gene information

There are five modules: Gene Information, Expressions, De novo variants, Transcript, and Protein-protein interactions.

5.1.1 Gene information

This module shows some tables of gene details, including Basic information, Summary (Uniprot Summary and Refseq Summary), Assessment table (score of some tools on the potential tolerance of genes).

5.1.2 Expression

This module shows some visualizations for gene expression in a specific tissue or cell types.

The brain expressions including Spatio-temporal human developmental brain expression (BrainSpan), Genotype-Tissue expression (GTEx), human brain single cell expression (Zhang et al., PMID: 25186741), adult mouse brain single cell expression (Saunders et al., PMID: 30096299).

a

b

c

d

e

Fig 8. Gene expression in specific tissue or cell types. (a) Gene-level expression in different tissues (data source: GTEx v7). (b) Spatio-temporal human developmental brain expression (data source: BrainSpan). (c) Human embryonic prefrontal cortex single cell expression. (d) Adult mouse Brain single cell expression. (e) Protein expression (data source: Proteomics DB).


5.1.3 De novo variants

The "De novo variants" part in gene information page provides the count of neuropsychiatric diseases related to the certain gene De novo mutations.

Fig 9. Gene DNMs-Neuropsychiatric diseases information table


5.1.3 Transcript

The mutations are visualized on top of all the isoforms of an impacted gene. Mutations are drawn on top of all the isoforms according to their chromosome locations. The mutation size has been adjusted for the visualization purpose.

The information including 3 diagrams: Visualization of regulatory element DNMs, Visualizations of de novo variants in transcripts, Visualizations of de novo CNVs. With similar modes of structure, the Visualization of regulatory element DNMs (Fig 10.) as an example.


Fig 10. Visualization of regulatory element DNMs


The tables of Gencode transcripts, Refseq transcripts, Enhancer information and Promoter information are also shown in this part.


Fig 11. Gencode transcripts and Refseq transcripts table

Fig 12. Enhancer information and promoter information table


5.1.4 Protein-protein interactions

Protein-protein interaction (PPI) data is extracted from Biogrid v3.5.165. The red node indicates the queried gene, and all other blue nodes are its interacting partners (Fig 13.).


a

b

Fig 13. PPI network (data source: BioGrid)


Below PPI diagram, there is a table showing all the genes which have PPI in the diagram. (Fig 14.)


Fig 14. PPI statistical table

5.2 Mutation information


5.2.1 SNV mutation

We collected and curated both coding and non-coding de novo mutations (in total of ~876,178) from ~120 all major WES, WGS and targeted sequencing studies on 26 neuropsychiatric disorders, including autism spectrum disorders (ASD), intellectual disability (ID), epileptic encephalopathy (EE), schizophrenia (SCZ), bipolar disorder (BP), attention-deficit/hyperactivity disorder (ADHD), obsessive compulsive disorder (OCD), Congenital Heart Disease (CHD), Tourette Disorder (TD), and healthy siblings as controls(Table 1). Further, we included the mutation annotation scores from tools, such as SIFT, PolyPhen, CADD, DeepSea to annotate the pathogenicity of variants. In addition, PsyMuKB also provides the visualization of exact locations of these mutations on the isoforms of the impacted genes in the "Transcript information" section.

The mutation table is shown below:


a

b

Fig 15. The mutation table. (a) Coding de novo mutations. (b) Non-coding de novo variations.


5.2.2 CNV mutation

We collected and curated copy number variation from 19 studies on four psychiatric disorders, including ASD, BD, ID/DD, SCZ and Control (Table 1).

The mutation table is shown below:


Fig 16. De novo copy number variations


5.2.3 Mutation details

In the table of the SNV mutation list, you can click on a mutation in a row and then tune to the page of the selected mutation details. On the next mutation details page, you can see the details of the mutation, mRNA-level impact and Protein-level impact.

5.2.3.1 Selected Mutation

Fig 17. Selected mutation

5.2.3.2 mRNA-level impact

Isoform expressions of the human brain were downloaded from GTEx portal (2016-01-15_v7_RNASeQCv1.1.8). Interface design provides the graphical view on the brain expressions of all the impacted transcripts owing to this mutation, which can help the user realize the consequence of isoform carrying a certain mutation in a specific brain region.


Fig 18. mRNA-level impact


5.2.3.2 Protein-level impact

Protein isoform expressions in tissues were downloaded from Proteomics DB. Like mRNA-level, this interface design allowed users to explore gene products special expression patterns by selecting impacted and/or brain expressed protein.


Fig 19. Protein-level impact


6.Download

6.1 Small data sets download

PsyMuKB provides users with download option next to each data table and visualization. You can download small data sets and subsets directly from this website by following the download link on any search result page.

Fig 20. Download button samples

6.2 Batch download

You can download in bulk from the PsyMuKB download page.

Fig 21. Batch download