agReg-SNPdb Plants






Georg-August-Universität Göttingen
CiBreed

Biological Background

Transcription factors (TFs) are special regulatory proteins that govern the regulation of transcription and gene expression by binding to the DNA. The TFs bind to specific sequence-motifs in the DNA,
known as transcription factor binding sites (TFBSs), which are usually between 5 and 15 bp long. These TFBSs are enriched in the promoter regions. Single nucleotide polymorphisms (SNPs) - the exchange
of a base at a specific position in the genome - can strongly influence the gene expression level by changing the binding affinity of the TFs to the sequence. A nucleotide substitution in one position of a
TFBS can be sufficient to even disrupt or create a TFBS. Such SNPs are referred to as regulatory SNPs (rSNPs). They have recently gained much attention in life sciences, because they can be
causal for specific traits or diseases.



Workflow

Our workflow to identify rSNPs and their consequences on TF binding consists of four steps:



1. Extraction of the promoter region for each gene covering the -7.5 kb to 2.5 k.b regions relative to the transcription start sites

2. Identification of the SNPs occuring within these promoter regions and extraction of their respective flanking sequences defined as the 25 bp upstream and downstream of the SNP. These SNPs are defined as rSNPs.

3. Employing the TFBS prediction tool MATCH™ to these flanking sequences we predict putative TFBSs for the reference as well as the alternate allele of each SNP.

4. In order to determine the consequences of the SNP we compare the predicted TFBSs between reference and alternate allele.
We separate the effect of an rSNP on the binding of a TF in four categories:
  • Gain of TFBS : The TFBS exists only for the 1 (alternative) allele of the SNP
  • Loss of TFBS : The TFBS exists only for the 0 (reference) allele of the SNP
  • Score-Change : The TFBS is predicted for both alleles but the TF binding affinity differs (measured by the Core_Similarity_Score and Matrix_Similarity_Score calculated by MATCH™)
  • No Change : The TFBS is predicted for both alleles with the same TF binding affinity (measured by the Core_Similarity_Score and Matrix_Similarity_Score calculated by MATCH™)




  • Data sources

    agReg-SNPdb Plants was constructed using the following genome assembly versions. For all plants except for rapeseed input data was downloaded from Ensembl.

    Plant Assembly version Download links
    Hordeum vulgare (barley) MorexV3_pseudomolecules_assembly Reference genome
    SNPs
    Genes
    (downloaded on 22 December 2021)
    Solanum lycopersicum (tomato) SL3.0 Reference genome
    SNPs
    Genes
    (downloaded on 22 December 2021)
    Triticum aestivum (wheat) IWGSC Reference genome
    SNPs
    Genes
    (downloaded on 08 November 2021)
    Zea mays (corn) Zm-B73-REFERENCE-NAM-5.0 Reference genome
    SNPs
    Genes
    (downloaded on 08 November 2021)
    Vitis vinifera (winegrape) 12X Reference genome
    SNPs
    Genes
    (downloaded on 08 November 2021)
    Helianthus annuus (sunflower) HanXRQr1.0 Reference genome
    SNPs
    Genes
    (downloaded on 08 November 2021)
    Oryza glaberrima (African rice) Oryza_glaberrima_V1 Reference genome
    SNPs
    Genes
    (downloaded on 08 November 2021)
    Oryza sativa Indica (Asian rice Indica Group) ASM465v1 Reference genome
    SNPs
    Genes
    (downloaded on 22 December 2021)
    Oryza sativa Japonica (Asian rice Japonica Group) IRGSP-1.0 Reference genome
    SNPs
    Genes
    (downloaded on 08 November 2021)
    Oryza glumipatula (wild rice) Oryza_glumaepatula_v1.5 Reference genome
    SNPs
    Genes
    (downloaded on 08 November 2021)
    Sorghum bicolor (sorghum) Sorghum_bicolor_NCBIv3 Reference genome
    SNPs
    Genes
    (downloaded on 22 December 2021)
    Triticum turgidum (durum wheat) Svevo.v1 Reference genome
    SNPs
    Genes
    (downloaded on 08 November 2021)
    Brassica napus (rapeseed) Brassica_napus_v4.1 Reference genome (also see Chalhoub et al.)
    SNPs
    Genes