All Tools

DNA Sequence Explorer

Align DNA sequences using global and local alignment algorithms, translate codons to amino acids, introduce mutations to study their effects, and build phylogenetic trees to visualize evolutionary relationships.

Alignment View

1
ATGCTAGCATCGATCG
||||×|||×|||||||
1
ATGCAAGCTTCGATCG
MatchMismatchGap

Controls

16 bp
16 bp
Match Score1
Mismatch Score-1
Gap Penalty-2

Results

Score=12\text{Score} = 12
Percent Identity
87.50%
Alignment Length
16 bp
GC Content (Seq 1)
50.00%
GC Content (Seq 2)
50.00%
Jukes-Cantor Distance
0.1367

Scoring Matrix Heatmap

-ATGCAAGCTTCGATCG
-0-2-4-6-8-10-12-14-16-18-20-22-24-26-28-30-32
A-21-1-3-5-7-9-11-13-15-17-19-21-23-25-27-29
T-4-120-2-4-6-8-10-12-14-16-18-20-22-24-26
G-6-3031-1-3-5-7-9-11-13-15-17-19-21-23
C-8-5-21420-2-4-6-8-10-12-14-16-18-20
T-10-7-4-1231-1-3-3-5-7-9-11-13-15-17
A-12-9-6-303420-2-4-6-8-8-10-12-14
G-14-11-8-5-212531-1-3-5-7-9-11-11
C-16-13-10-7-4-1036420-2-4-6-8-10
A-18-15-12-9-6-3014531-1-1-3-5-7
T-20-17-14-11-8-5-2-12564200-2-4
C-22-19-16-13-10-7-4-303475311-1
G-24-21-18-15-12-9-6-3-212586422
A-26-23-20-17-14-11-8-5-4-10369753
T-28-25-22-19-16-13-10-7-6-301471086
C-30-27-24-21-18-15-12-9-6-5-21258119
G-32-29-26-23-20-17-14-11-8-7-4-1236912

Reference Guide

Needleman-Wunsch Scoring

Global alignment finds the best end-to-end alignment of two sequences using dynamic programming. The score at each cell considers three possibilities.

Recurrence relation
F(i,j)=max{F(i1,j1)+s(xi,yj)F(i1,j)+dF(i,j1)+dF(i,j) = \max \begin{cases} F(i-1,j-1) + s(x_i, y_j) \\ F(i-1,j) + d \\ F(i,j-1) + d \end{cases}

Where s(x,y) is the match/mismatch score and d is the gap penalty. Traceback from F(m,n) recovers the optimal alignment.

The Genetic Code

The standard genetic code maps 64 possible three-nucleotide codons to 20 amino acids plus 3 stop signals.

Key codons
ATG = Met (M) - Start codon
TAA, TAG, TGA = Stop (*)
TTT, TTC = Phe (F)
AAA, AAG = Lys (K)

The code is degenerate: most amino acids are encoded by 2-6 different codons. Third-position changes are often "silent" (synonymous).

Jukes-Cantor Distance

The Jukes-Cantor model corrects for multiple substitutions at the same site. Raw percent difference underestimates true evolutionary distance.

Distance formula
d=34ln(14p3)d = -\frac{3}{4} \ln\left(1 - \frac{4p}{3}\right)

Where p is the observed proportion of different sites. The formula becomes undefined (saturated) when p reaches 0.75.

UPGMA Clustering

UPGMA (Unweighted Pair Group Method with Arithmetic Mean) builds a rooted tree from a distance matrix using agglomerative clustering.

Algorithm steps
  1. Find the pair with smallest distance
  2. Merge them at height = distance / 2
  3. Update distances using average linkage
  4. Repeat until one cluster remains

UPGMA assumes a constant rate of evolution (molecular clock). Branch lengths are proportional to evolutionary time.