
Georg-August-Universität Göttingen

Biological Background

Transcription factors (TFs) are special regulatory proteins that govern the regulation of transcription and gene expression by binding to the DNA. The TFs bind to specific sequence-motifs in the DNA,
known as transcription factor binding sites (TFBSs), which are usually between 5 and 15 bp long. These TFBSs are enriched in the promoter regions. Single nucleotide polymorphisms (SNPs) - the exchange
of a base at a specific position in the genome - can strongly influence the gene expression level by changing the binding affinity of the TFs to the sequence. A nucleotide substitution in one position of a
TFBS can be sufficient to even disrupt or create a TFBS. Such SNPs are referred to as regulatory SNPs (rSNPs). They have recently gained much attention in life sciences, because they can be
causal for specific traits or diseases.


Our workflow to identify rSNPs and their consequences on TF binding consists of four steps:

1. Extraction of the promoter region for each gene covering the -7.5 kb to 2.5 k.b regions relative to the transcription start sites

2. Identification of the SNPs occuring within these promoter regions and extraction of their respective flanking sequences defined as the 25 bp upstream and downstream of the SNP. These SNPs are defined as rSNPs.

3. Employing the TFBS prediction tool MATCH™ to these flanking sequences we predict putative TFBSs for the reference as well as the alternate allele of each SNP.

4. In order to determine the consequences of the SNP we compare the predicted TFBSs between reference and alternate allele.
We separate the effect of an rSNP on the binding of a TF in four categories:
  • Gain of TFBS : The TFBS exists only for the 1 (alternative) allele of the SNP
  • Loss of TFBS : The TFBS exists only for the 0 (reference) allele of the SNP
  • Score-Change : The TFBS is predicted for both alleles but the TF binding affinity differs (measured by the Core_Similarity_Score and Matrix_Similarity_Score calculated by MATCH™)
  • No Change : The TFBS is predicted for both alleles with the same TF binding affinity (measured by the Core_Similarity_Score and Matrix_Similarity_Score calculated by MATCH™)

  • Data sources

    agReg-SNPdb was constructed using the following genome assembly versions downloaded from Ensembl (release 103):

    Animal Assembly version Download links
    Cattle ARS-UCD1.2 Reference genome
    (downloaded on 1 March 2021)
    Pig Sscrofa11.1 Reference genome
    (downloaded on 9 March 2021)
    Chicken GRCg6a Reference genome
    (downloaded on 25 February 2021)
    Sheep Oar_rambouillet_v1.0 Reference genome
    (downloaded on 1 March 2021)
    Horse EquCab3.0 Reference genome
    (downloaded on 1 March 2021)
    Goat ARS1 Reference genome
    (downloaded on 1 March 2021)
    Dog CanFam3.1 Reference genome
    (downloaded on 8 March 2021)