Back to Blog
DiffAlgorithmsGit

How Text Diff Algorithms Work

April 10, 2026 · 6 min read

Every time you run git diff, open a pull request, or use a code comparison tool, a diff algorithm is computing the minimal set of changes between two texts. Understanding how this works helps you read diffs faster, write better commits, and build your own comparison tools.

The Problem: Longest Common Subsequence

At its core, a text diff answers: what is the longest sequence of lines (or characters) that appears in both texts in the same order — the Longest Common Subsequence (LCS)?

Given the LCS, lines that are in the original but not the LCS are deletions; lines in the new version but not the LCS are insertions. Everything in the LCS is unchanged.

Original:          New:
A                  A
B       →          C
C                  D
D

LCS: A, C, D
Diff: delete B, insert nothing (B→ gone), C unchanged, D unchanged

The Myers Algorithm

The algorithm used by git diff, GNU diff, and most modern tools is the Myers diff algorithm, published by Eugene Myers in 1986. It finds the shortest edit script — the minimum number of insertions and deletions — in O(ND) time, where N is the total length and D is the number of differences.

Myers works by exploring an "edit graph" where moving diagonally means a match (no cost), moving right means an insertion, and moving down means a deletion. The algorithm finds the path with the fewest non-diagonal moves.

For most real-world text comparisons, D is small relative to N, making Myers very fast in practice.

Reading a Unified Diff

The unified diff format (output of git diff) is the standard way diffs are displayed:

--- a/config.json
+++ b/config.json
@@ -1,7 +1,8 @@
 {
   "name": "my-app",
-  "version": "1.0.0",
+  "version": "1.1.0",
   "port": 3000,
+  "debug": false,
   "database": {
     "host": "localhost"
   }

Reading the header:

SymbolMeaning
--- a/fileOriginal file
+++ b/fileNew file
@@ -1,7 +1,8 @@Hunk header: original started at line 1 (7 lines shown); new starts at line 1 (8 lines shown)
lineContext line — unchanged, shown for readability
-lineRemoved from the original
+lineAdded in the new version

Line Diff vs Character Diff

Most tools operate at the line level for performance — comparing full lines. But within a changed line, a character-level diff highlights exactly which characters changed:

// Line-level diff
- const greeting = "Hello, World!";
+ const greeting = "Hello, Developer!";

// Character-level diff within the line
  const greeting = "Hello, World!";
                          ^^^^^^        ← deleted
                          ^^^^^^^^^     ← inserted "Developer"

Git uses line diffs by default. The --word-diff flag switches to word-level diffs, which is often more readable for prose.

Context Lines

Diff tools show a few lines of unchanged context around each change (3 lines by default in git). This helps you understand what the change is near without seeing the full file. You can adjust with git diff -U5 for 5 context lines.

Semantic vs syntactic diffs

Standard diffs are purely textual — they don't understand code structure. A semantic diff would know that moving a function to a different file isn't a "deletion + insertion" — it's a refactor. Tools like difftastic use language-specific parsers to generate more meaningful diffs of code.

JSON Diff

Plain text diffs on JSON can be noisy — reformatting a JSON object shuffles lines without changing meaning. A JSON-aware diff compares the data structure, not the text representation, and highlights actual logical changes (added keys, changed values) regardless of whitespace or key ordering.

Try It

Use the Compare tool on io9.me for side-by-side text diff and JSON diff — with line, word, and character granularity options.