Experimental design

Design of the Ancestry project. Part I

An important advancement in possessing the sequence of the human genome is the amount of valuable information we can extract from it. We now have enough knowledge to know that a single DNA base at a specific location might affect the probability of developing a disease or might help pinning down the geographical origin of our ancestors. Our current project is essentially interested in the second part.

The variability observed at these specific DNA bases (called SNPs for Single Nucleotides Polymorphisms) is in some cases specific to human populations. For example, a gene X has two different alleles, which corresponds to two versions of the gene X:

Allele A: …CCTGCGATCGATGCTAGCTAG…..TGATCTGATCGTAATGCTGA…

Allele B: …CCTGCGATCGATGCTACCTAG…..TGATCTGATCGTAATGCTGA…

 

For a region of the gene X, we see that one base of the DNA (a SNP) differs between the allele A and B. In fact, several hundred SNPs have been liked to specific human populations and can therefore be used to determine the ancestry of a person. For example, the allele A might be found only in European and Asian people whereas the allele B is found only in American and African people.

In our case, we are using PCR to determine for specific SNPs which DNA base we have. The idea of the PCR is to exclusively amplify a sequence of DNA to obtain a really high number of molecules which can then be seen on a gel. To amplify specifically a sequence of DNA, we use DNA primers which are short sequence of DNA specific to a specific position in the genome.

Since every human has two sets of chromosomes and therefore two exemplary of each gene, a person will be either AA, BB, or AB corresponding to two parents European/Asian, American/African, or one parent European/Asian and the other one American/African.

To test this, we need to design three primers which corresponds to the sequences underlined. It was shown that the last nucleotide of the primer must be complementary to genomic DNA for the PCR to be productive.

Primer 1a: 5’-CTGCGATCGATGCTAG-3’

Primer 1b: 5’-gtattatctttgataataatcCTGCGATCGATGCTAC-3’

Primer 2: 5’-CAGCATTACGATCAG-3’

 

The primer 2 is the same for both reactions since there is no mutation in this region. The primer 1a is specific to the allele A whereas the primer 1b recognize only the allele B but we add for this primer a tail on the 5’ end (in lowercase). The presence of the tail will not modify the efficiency of the PCR but will produce a larger final product which can then be differentiate from the product of the allele A on the gel. This larger product is also necessary to determine if someone is heterozygote (AB) rather than homozygote (AA or BB).

SNPs_PCR Example

Our idea is then to multiplex this method by checking in a single reaction four SNPs. If the SNPs are close to each other, we can design four primers (1a, 1b, 2a, and 2b) with a different tail’s size for three of them to facilitate the reading of the gel. If the SNPs are too far away, the approach will be similar to the one explained above.