Data Source

Dr.TomAbout 402 wordsAbout 1 min

Data source

Reference gene

mRNA: Collect from NCBI RefSeq annotation or other databases

lncRNA: Collect from NCBI RefSeq annotation or other databases, and RNAcentral database.

LncRNA from RNAcentral database has RNA ID but Gene ID and positional information are not offered. The positional information was determined by aligning the sequences to the genome using blast. There are two ways to determine its gene ID. First, the lncRNA is compared with the known RNA from NCBI. If there is an overlapping region with the known RNA, the gene ID corresponding to this region is used. If there is no overlapping region with the known RNA, the new gene is assigned a gene ID starting with ‘BGIG’.

miRNA: collect from miRbase 22 and some of the miRNAs are predicted using BGI internal data. The prediction software is miRDeep for animals and miRDeep-P2 for plants. Predicted miRNA is assigned a new gene ID, starting with ‘novel’.

miRNA target gene prediction: Multiple software are used for prediction, combined with corresponding filtering conditions such as free energy, score values, etc. Generally speaking, we use RNAhybrid, miRanda, and TargetScan to predict animal target genes, and Tapir and TargetFinder to predict plant target genes. The default parameters of the target gene prediction software are as follows:

  • miRanda: -en -20 -strict
  • RNAhybrid: -b 100 -c -f 2,8 -m 100000 -v 3 -u 3 -e -20 -p 1 -s 3utr_human
  • TargetScan: Default
  • Tapir: --score 5 --mfe_ratio 0.6
  • TargetFinder: -c 4

circRNA: from circBase. The positional information is determined by aligning the sequences to the genome using blast.

Annotation

KEGG: the newest version is v102.0

GO: from three databases:

Transcription factors annotation (TF Desc):
Animal: http://bioinfo.life.hust.edu.cn/AnimalTFDB/#!/open in new window
Plant: http://planttfdb.gao-lab.org/open in new window

Transcription cofactors annotation (TF Cofactors Desc):

Animal: AnimalTFDB v3.0

MsigDB annotation: v7.1

http://software.broadinstitute.org/gsea/msigdb/open in new window

Genebank (GeneBank Desc): collected from NCBI

Interpro (InterPro Desc),pfam (Pfam Desc),EggNOG (EggNOG Desc) annotation:

idmapping from GeneOncology (Downloaded on May, 2020)

ftp://ftp.pir.georgetown.edu/databases/idmapping/idmapping.tb.gzopen in new window

Reactome (Reactome Desc): Extracted through the official mapping relationship of NCBI2Reactome_PE_All_Levels.txt.

https://reactome.org/download-dataopen in new window (Downloaded on June, 2020)

CR2Cancer (CR2Cancer Desc): http://cis.hku.hk/CR2Cancer/open in new window

CellMarker (CellMarker Desc): http://biocc.hrbmu.edu.cn/CellMarker/open in new window