


Unlike most existing computational approaches to the problem, our method does not require knowledge of one of the involved sequences to use as a reference, nor any other additional information. Here we describe an algorithmic method which accurately reconstructs the pair of allelic sequences from the observed complex pattern of calls. While signaling the presence of a potentially important mutation, such output cannot be read directly and often gets discarded. If, due to insertion or deletion (indel) mutations, one allele contains extra nucleotides, most sites in the sequencing output beyond the mutation site will contain pairs of nucleotide calls. Yet, samples from organisms with two sets of chromosomes generally contain two types of DNA molecules (alleles), each derived from one parent. When these are identical, each site in the output contains a single nucleotide call. The most common technique for determining such sequences, the Sanger method, outputs a single consensus for a pool of DNA molecules in the analyzed sample. In DNA, information is encoded as a sequence of four types of building blocks–nucleotides. It is available as a free Web application Indelligent at.

Because these conditions occur in most encountered DNA sequences, the method is widely applicable. Simulations with artificial sequences have demonstrated that the method yields accurate reconstructions when (1) the allelic sequences forming the mixed trace are sufficiently similar, (2) the analyzed fragment is significantly longer than the indel, and (3) multiple indels, if present, are well-spaced. We used the method to decode 104 human traces (mean length 294 bp) containing heterozygous indels 5 to 30 bp with a mean of 99.1% bases per allelic sequence reconstructed correctly and unambiguously. We describe a simple yet accurate method, which uses dynamic programming optimization to predict superimposed allelic sequences solely from a string of letters representing peaks within an individual mixed trace. Existing computational methods for deconvolution of such traces require knowledge of a reference sequence or the availability of both direct and reverse mixed sequences of the same template. Direct Sanger sequencing of a diploid template containing a heterozygous insertion or deletion results in a difficult-to-interpret mixed trace formed by two allelic traces superimposed onto each other.
