# Distribution of Aligned Letter Pairs in Optimal Alignments of Random Sequences

@article{Hauser2012DistributionOA, title={Distribution of Aligned Letter Pairs in Optimal Alignments of Random Sequences}, author={R. Hauser and Heinrich Matzinger}, journal={arXiv: Probability}, year={2012} }

Considering the optimal alignment of two i.i.d. random sequences of length $n$, we show that when the scoring function is chosen randomly, almost surely the empirical distribution of aligned letter pairs in all optimal alignments converges to a unique limiting distribution as $n$ tends to infinity. This result is interesting because it helps understanding the microscopic path structure of a special type of last passage percolation problem with correlated weights, an area of long-standing open… Expand

#### One Citation

An Upper Bound on the Convergence Rate of a Second Functional in Optimal Sequence Alignment

- Mathematics
- 2014

Consider finite sequences $X_{[1,n]}=X_1\dots X_n$ and $Y_{[1,n]}=Y_1\dots Y_n$ of length $n$, consisting of i.i.d.\ samples of random letters from a finite alphabet, and let $S$ and $T$ be chosen… Expand

#### References

SHOWING 1-10 OF 22 REFERENCES

A Monte Carlo Approach to the Fluctuation Problem in Optimal Alignments of Random Strings

- Mathematics
- 2012

The problem of determining the correct order of fluctuation of the optimal alignment score of two random strings of length $n$ has been open for several decades. It is known that the biased expected… Expand

Estimating statistical significance of sequence alignments.

- Mathematics, Medicine
- Philosophical transactions of the Royal Society of London. Series B, Biological sciences
- 1994

The Azuma-Hoeffding inequality gives an upper bound on the probability of large deviations of the score from its mean in the linear case and Poisson approximation can be applied in the logarithmic case. Expand

Detecting the homology of DNA-sequences based on the variety of optimal alignments: a case study

- Mathematics, Biology
- 2012

The method is based on the observation that for many scor-ing schemes, especially for the longest common subsequence (LCS) scoring, the optimal alignments are the more diﬀerent the unrelated are the sequences, which gives the idea that the distance between extremal alignments could provide important information about the similarity of the sequences. Expand

Approximation to the mean curve in the LCS problem

- Mathematics
- 2008

The problem of sequence comparison via optimal alignments occurs naturally in many areas of applications. The simplest such technique is based on evaluating a score given by the length of a longest… Expand

Extensive simulations for longest common subsequences

- 1999

Abstract:Given two strings X and Y of N and M characters respectively, the Longest Common Subsequence (LCS) Problem asks for the longest sequence of (non-contiguous) matches between X and Y. Using… Expand

Longest common subsequences of two random sequences.

- Mathematics
- 1975

Given two random k-ary sequences of length n, what is f(n,k), the expected length of their longest common subsequence? This problem arises in the study of molecular evolution. We calculate f(n,k) for… Expand

The Rate of Convergence of the Mean Length of the Longest Common Subsequence

- Mathematics
- 1994

Given two i.i.d. sequences of n letters from a finite alphabet, one can consider the length Ln of the longest sequence which is a subsequence of both the given sequences. It is known that ELn grows… Expand

Extensive simulations for longest common subsequences . Finite size scaling, a cavity solution, and configuration space properties

- Mathematics, Physics
- 1998

Given two strings X and Y of N and M characters respectively, the Longest Common Subsequence (LCS) Problem asks for the longest sequence of (non-contiguous) matches between X and Y. Using extensive… Expand

Standard deviation of the longest common subsequence

- Mathematics
- 2009

Let L n be the length of the longest common subsequence of two independent i.i.d. sequences of Bernoulli variables of length n. We prove that the order of the standard deviation of L n is Vn,… Expand

Large deviation based upper bounds for the LCS-problem

- Mathematics
- 2003

We analyse and apply a large deviation and Montecarlo simulation based method for the computation of improved upper bounds on the Chvatal-Sankoff constant for i.i.d. random sequences over a finite… Expand