Extreme PCR with a Twist !
Comparison of High-Fidelity DNA Polymerase used for PCR amplification and cloning of a genomic DNA region containing the ORF of the human TWIST1 gene, which varies wildly from 95% to 12% GC content over 2.1 kilobases.
Experimental dates range : 2018-05 to 2018-10
Published : September 22nd, 2019 by Simon Roy, Ph.D
- THIS IS STILL A DRAFT VERSION -
Extremely variable GC composition found in a 2.1 kb genomic DNA sequence coding for exon 1, an intron and exon 2 of the human TWIST1 gene
DNA sequences of extreme composition or repetition such as DNA sequences of high or low GC content or containing multiple repeated bases or repeated motifs such as those coding for alanine tracts have proved to be problematic for DNA analysis ever since the very era of DNA analysis. For simple techniques such as PCR amplification, we have previously shown that not all DNA Polymerases are able to successfully amplify difficult DNA sequences such as part of the human ARX gene.
Using modern DNA analyses techniques such as Next-Generation Sequencing, the analysis and sequencing of such DNA sequences is also proves to be challenging. Extreme DNA sequences most often result in very poor coverage or lead to biased and non-representative results. Most often, they are absent from the sequencing analyses and/or have to be re-evaluated using more malleable techniques such as high-fidelity PCR and Sanger sequencing.
Herein, we perform high-fidelity PCR amplification from human genomic DNA of different regions of the human TWIST1 gene that span accross a 2.1 kb. This includes the open-reading frame for the TWIST1 protein, an additionnal exon and an intron. The DNA sequence varies from 95% to 12% GC content and contains multiple identical DNA base stretches.
As previously demonstrated, not all commercially available high-fidelity DNA polymerases are able to accomplish PCR amplification of high GC DNA sequences. Therefore, we compare FastPfu FLY from TransGen Biotech and Q5 High-Fidelity DNA Polymerases in diverse PCR conditions.
Finally, we clone the entire 2.1 kb genomic region of TWIST1 from a purified PCR amplicon and validate the accuracy of FastPfu FLY using Sanger sequencing from two independant clones.
Note : special thank you to Dr Vincent Coljee who gave me the idea to target TWIST1 as a difficult-to-amplify PCR target!
Exon 1 has an average of 73% GC composition and contains a few extreme features
- 56 bp stretch with 88% GC : CCGTCCCCTCCCCCTCCCGCCTCCCTCCCCGCCTCCCCCGCGCGCCCTCCCCGCGG
- 40 bp stretch with 95% GC : CCGCGGGCCGCATCGCCCGGGCCGGCGCCGCGCGCGGGGG
- Within the ORF for TWIST1, a 127 bp stretch with 87% GC : GCGCGGGGGACGCAAGCGGCGCAGCAGCAGGCGCAGCGCGGGCGGCGGCGCGGGGCCCGGCGGAGCCGCGGGTGGGGGCGTCGGAGGCGGCGACGAGCCGGGCAGCCCGGCCCAGGGCAAGCGCGGC
- The previous is followed by a 60 bp segment with 92% GC : GCGGGCTGTGGCGGCGGCGGCGGCGCGGGCGGCGGCGGCGGCAGCAGCAGCGGCGGCGGG
Exon 2 has a 34% average in GC content and also contains extreme features
- Exon 2 starts with a 136 bp sequence with 32% GC
- Two short AT-rich segments, separated by only 11 bp, follow each other TTGTTTGTGTTTTTTTTTTTTTTTTTTTT (29 bp, 10% GC) and TTTTTATTTTTATTTTTTT (19bp, 0% GC)
- Near the end of exon 2, a 50 bp and 12 % GC-rich sequence is found : AAAACAAAAAAAAACTTAAAATACAAAAAACAACATTCTATTTATTTATT
Other key features
- At the end of the intron that spans from exon 1 to exon 2, a short but highly difficult DNA sequence is found: CCCCCCCCCCCC, for which the antisense strand could potentially form G-quadruplex structures and inhibit DNA polymerization.
- CCGTCCCCTCCCCCTCCCGCCTCCCTCCCCGCCTCCCCCGCGCGCCCTCCCCGCGG88%
- CCGCGGGCCGCATCGCCCGGGCCGGCGCCGCGCGCGGGGG95%
- GCGCGGGGGACGCAAGCGGCGCAGCAGCAGGCGCAGCGCGGGCGGCGGCGCGGGGCCCGGCGGAGCCGCGGGTGGGGGCGTCGGAGGCGGCGACGAGCCGGGCAGCCCGGCCCAGGGCAAGCGCGGC88%
- GCGGGCTGTGGCGGCGGCGGCGGCGCGGGCGGCGGCGGCGGCAGCAGCAGCGGCGGCGGG92%
- TTGTTTGTGTTTTTTTTTTTTTTTTTTTTGACGAAGAATGTTTTTATTTTTATTTTTTT14%
- AAAAACAAAAAAAAACTTAAAATACAAAAAACAACATTCTATTTATTTATT | in a 22 %GC hairpin: TTAATTCTTTTTTTCATCCTTCCTCTGAGGGGAAAAACAAAAAAAAACTTAAAATACAAAAAACAACATTCTATTTATTTATT12%
Function of the human TWIST1 protein
TWIST1, Twist-related protein 1, is also known as Class A basic helix-loop-helix protein 38 (bHLHa38) or H-twist. TWIST1 acts as a transcriptional regulator. Inhibits myogenesis by sequestrating E proteins, inhibiting trans-activation by MEF2, and inhibiting DNA-binding by MYOD1 through physical interaction. This interaction probably involves the basic domains of both proteins. TWIST1 also represses expression of proinflammatory cytokines such as TNFA and IL1B. Cranial suture patterning and fusion is also regulated by TWIST1. In the form of a heterodimer with E proteins, TWIST1 acts as a transcription activator and regulates gene expression differentially, depending on dimer composition. Homodimers induce expression of FGFR2 and POSTN while heterodimers repress FGFR2 and POSTN expression and induce THBS1 expression. Heterodimerization is also required for osteoblast differentiation. In addition, TWIST1 represses the activity of the circadian transcriptional activator: NPAS2-ARNTL/BMAL1 heterodimer. UniProt - Q15672 (TWST1_HUMAN)
High-Fidelity PCR comparison
Date: May 2018
Experiment: Extremely high GC-content PCR performed on genomic DNA. Combined high and low GC content PCR.
DNA Polymerases tested:
- FastPfu FLY
- M0491 from NEB (Q5® High-Fidelity)
- KD Plus
Map of Chr 7 TWIST1 gDNA sequence
Annotated genomic map location of human TWIST1 corresponding to NC_000007 REGION: 19115177..19117803.
The map features the most extreme high and low GC content sections and other DNA structures.
The red arrow indicates the ORF coding for the human TWIST protein. This sequence is entirely coded in exon 1.
NC_000007 REGION: 19115319 (primer F1.1) to 19117568 (primer R1.2) has an overall GC content of 55%. It is composed of at least 2 distinctive regions that have very constrating GC contents.
PCR Target #1
TWIST1 open-reading frame within exon 1
- 610 bp PCR amplicon
- Average GC-content: 71%
- Contains 87% and 92% GC segments from exon 1.
F1.2 primer : AGATGATGCAGGACGTGTCC (20-mer; 55% GC; Tm : 59 °C)
R1.1 primer : TAGTGGGACGCGGACATG (18-mer; 61% GC; Tm : 57 °C)
PCR Target #2
TWIST1 exon 1
- 949 bp PCR amplicon
- Average GC-content: 73% (76-71)
- In addition to Target #1, it contains two additional GC-rich segments : a 56bp 88% GC and a 40 bp 95% GC sequence.
F1.1 primer : AGCCTCCAAGTCTGCAGCTCTC (22-mer; 59% GC; Tm : 62 °C)
R1.1 primer : TAGTGGGACGCGGACATG (18-mer; 61% GC; Tm : 57 °C)
PCR Target #3
TWIST1 ORF+intron+exon 2
- 1769 bp PCR amplicon
- Average GC-content: 56% (71-58-34)
- In addition to Target #1, the last third of the sequence presents itself with very strong A or T stretches (i.e TTTTTATTTTTATTTTTTT)
F1.2 primer : AGATGATGCAGGACGTGTCC (20-mer; 55% GC; Tm : 59 °C)
R1.2 primer : ACCGGATCTATTTGCATTTTACCATG (26-mer; 38% GC; Tm : 58 °C)
PCR Target #4
The 2.1 kb, 95-12% GC PCR
- 2108 bp PCR amplicon
- Average GC-content: 59% (76-71-58-34)
- Contains all the difficulties from previous target amplicons 1, 2 and 3.
F1.1 primer : AGCCTCCAAGTCTGCAGCTCTC (22-mer; 59% GC; Tm : 62 °C)
R1.2 primer : ACCGGATCTATTTGCATTTTACCATG (26-mer; 38% GC; Tm : 58 °C)
Map of TWIST1 showing the expected PCR amplicons using different primer combinations
- Target #1 (Green) : F1.2 + R1.1 -> 610 bp
- Target #2 (Orange) : F1.1 + R1.1 -> 949 bp
- Target #3 (Red) : F1.2 + R1.2 -> 1769 bp
- Target #4 (Crimson) : F1.1 + R1.2 -> 2108 bp
Agarose gel electrophoresis simulation of the expected PCR amplicons for TWIST1
- Target #1 (Green) : 610 bp
- Target #2 (Orange) : 949 bp
- Target #3 (Red) : 1769 bp
- Target #4 (Crimson) : 2108 bp
- Trans5K DNA Ladder
PCR Target #1 : TWIST1 open-reading frame within exon 1
- 610 bp PCR amplicon
- Average GC-content: 71%
- Contains 87% and 92% GC segments from exon 1.
F1.2 primer : AGATGATGCAGGACGTGTCC (20-mer; 55% GC; Tm : 59 °C)
R1.1 primer : TAGTGGGACGCGGACATG (18-mer; 61% GC; Tm : 57 °C)
TWIST1 target #1 - 610 bp | 71% GC
I have to ad………….
Here’ the recipe that worked best for the ‘twisted’ PCR
PCR Target #1 : TWIST1 open-reading frame within exon 1
- 610 bp PCR amplicon
- Average GC-content: 71%
- Contains 87% and 92% GC segments from exon 1.
F1.2 primer : AGATGATGCAGGACGTGTCC (20-mer; 55% GC; Tm : 59 °C)
R1.1 primer : TAGTGGGACGCGGACATG (18-mer; 61% GC; Tm : 57 °C)
PCR Setup (on ice)
5x buffer: 5 ul
dNTPs (2,5 mM) : 1.2 ul
F primer (10 uM) : 0.3 ul (3 pmol)
R primer (10 uM) : 0.6 ul (6 pmol)
gDNA (5 ng/ul) : 1 ul
FastPfu FLY : 0.3 ul
PCR Cycling
35 cycles
95°C for 20 s
58°C for 20 s
70°C for 60 s
Final extension : 95°C for 120 s
PCR Target #2 : TWIST1 exon 1
- 949 bp PCR amplicon
- Average GC-content: 73% (76-71)
- In addition to Target #1, it contains two additional GC-rich segments : a 56bp 88% GC and a 40 bp 95% GC sequence.
F1.1 primer : AGCCTCCAAGTCTGCAGCTCTC (22-mer; 59% GC; Tm : 62 °C)
R1.1 primer : TAGTGGGACGCGGACATG (18-mer; 61% GC; Tm : 57 °C)
TWIST1 target #2 - 949 bp
I have to ad………….
Here’ the recipe that worked best for the ‘twisted’ PCR
PCR Setup (on ice)
5x buffer: 5 ul
dNTPs (2,5 mM) : 1.2 ul
F primer (10 uM) : 0.3 ul (3 pmol)
R primer (10 uM) : 0.6 ul (6 pmol)
gDNA (5 ng/ul) : 1 ul
FastPfu FLY : 0.3 ul
PCR Cycling
Initial denaturation : 95°C for 120 s
35 cycles
95°C for 20 s
58°C for 20 s
70°C for 60 s
Final extension : 70°C for 120 s
PCR Target #3 : TWIST1 ORF+intron+exon 2
- 1769 bp PCR amplicon
- Average GC-content: 56% (71-58-34)
- In addition to Target #1, the last third of the sequence presents itself with very strong A or T stretches (i.e TTTTTATTTTTATTTTTTT)
F1.2 primer : AGATGATGCAGGACGTGTCC (20-mer; 55% GC; Tm : 59 °C)
R1.2 primer : ACCGGATCTATTTGCATTTTACCATG (26-mer; 38% GC; Tm : 58 °C)
Human TWIST1 1769 bp PCR
to be completed
Here’ the recipe that worked best for the ‘twisted’ PCR
PCR Setup (on ice)
5x buffer: 5 ul
dNTPs (2,5 mM) : 1.2 ul
F primer (10 uM) : 0.3 ul (3 pmol)
R primer (10 uM) : 0.6 ul (6 pmol)
gDNA (5 ng/ul) : 1 ul
FastPfu FLY : 0.3 ul
PCR Cycling
Initial denaturation : 95°C for 120 s
35 cycles
95°C for 20 s
58°C for 20 s
70°C for 60 s
Final extension : 70°C for 120 s
PCR Target #4 : 2.1 kb 95-12% GC
- 2108 bp PCR amplicon
- Average GC-content: 59% (76-71-58-34)
- Contains all the difficulties from previous target amplicons 1, 2 and 3.
F1.1 primer : AGCCTCCAAGTCTGCAGCTCTC (22-mer; 59% GC; Tm : 62 °C)
R1.2 primer : ACCGGATCTATTTGCATTTTACCATG (26-mer; 38% GC; Tm : 58 °C)
Human TWIST1 2.1 kb PCR
I have to admit that this PCR is very (very very very) difficult to achieve. The difficulty of this PCR resides in the extreme GC-content variation of the 2.1 kb human TWIST1 target that includes exon 1 - intron - exon 2 present in genomic DNA.
Here’ the recipe that worked best for the ‘twisted’ PCR
PCR Setup (on ice)
5x buffer: 5 ul
dNTPs (2,5 mM) : 1.2 ul
F primer (10 uM) : 0.3 ul (3 pmol)
R primer (10 uM) : 0.6 ul (6 pmol)
gDNA (5 ng/ul) : 1 ul
FastPfu FLY : 0.3 ul
PCR Cycling
Initial denaturation : 95°C for 120 s
35 cycles
95°C for 20 s
58°C for 20 s
70°C for 60 s
Final extension : 70°C for 120 s
Sequencing Results for human TWIST1 2.1 kb genomic region
Two indidual clones were sequenced at the Plateforme de Génomique du CHUL in Québec City.
The results indicate very minimal differences compared to the reference sequence NG_008114.2 Chr 7.
Only 2 differences with the refernce TWIST1 sequence are observed in either clone SR-1110 or SR-1115. These differences could be attributable to either DNA Polymerase slippage since they are located in extremely repeated regions. Alternatively, the differences may be due to single-nucleotide polymorphisms (SNPs) in the (SR) DNA sample.
Altogether, the sequencing results of the cloned TWIST1 2.1 kb genomic region prove without a doubt that FastPfu FLY DNA Polymerase is very well suited for ultra high-fidelity PCR, even with extreme DNA targets such as this.
Want your PCRs to be easy ?
Put us in charge of optimizing PCR conditions for you. It’s FREE !