Modern DVCSs factor apart networking, synchronisation, history, storage, and merging; older VCSs tangled them all together, which led to much implementation and conceptual complexity.
hello, world head co ld
var delta = Diff.diff_patch("hello, world", "head cold");
/* [{file1: {offset:2, length:4, chunk:["l", "l", "o", ","]},
file2: {offset:2, length:2, chunk:["a", "d"]}},
{file1: {offset:7, length:1, chunk:["w"]},
file2: {offset:5, length:1, chunk:["c"]}},
{file1: {offset:9, length:1, chunk:["r"]},
file2: {offset:7, length:0, chunk:[]}}] */
diff uses a Longest Common Subsequence algorithm to find a short description of the differences between two files. The notion of minimum edit distance is a related idea.
It so happens that the output of diff often makes sense to a human trying to figure out how a file has been changed. How lucky!
Note that Diff.diff_patch can operate equally well on lists of strings an on lists of characters (strings). It doesn't work very well when given single strings, as in the example above, but it does work.
Sometimes called two-way merge: every difference is a conflict
Bram Cohen has invented a diff algorithm that works well for programming-language (or other line-oriented) text. It uses uniquely occurring lines to anchor the LCS.
js> uneval(Diff.patch("hello, world", delta).join(""))
"head cold"
js> Diff.invert_patch(delta);
js> uneval(Diff.patch("head cold", delta).join(""))
"hello, world"
In some revision-control systems, e.g. darcs, inverting a patch is a central operation. Darcs in particular has a full (and very useful!) "theory of patches", where patch inversion, commutation and merging are developed formally.
| "this" | "base" | "other" | Result | Notes |
|---|---|---|---|---|
| A | A | A | A | no changes |
| A | A | B | B | "other" wins |
| B | A | A | B | "this" wins |
| B | A | B | B or conflict | accidental clean merge |
| B | A | C | conflict | "true" conflict |
var base = "the quick brown fox jumped over a dog".split(/\s+/); var derived1 = "the quick fox jumps over some lazy dog".split(/\s+/); var derived2 = "the quick brown fox jumps over some record dog".split(/\s+/); var mergeResult = Diff.diff3_merge(derived1, base, derived2, true); /* [{ok:["the", "quick", "fox", "jumps", "over"]}, {conflict:{a:["some", "lazy"], aIndex:5, o:["a"], oIndex:6, b:["some", "record"], bIndex:6}}, {ok:["dog"]}] */

LCA is defined for trees. Efficient algorithms are known to exist. It has also been defined for DAGs, which is the case we have in a DVCS, but the definition leads to some problems in our case.
History is a DAG of changesets.
Each changeset should record
Many modern DVCSs use some function of the contents of an object to identify the object, e.g. a SHA-1 hash. This has a lot of nice properties, and is a good choice. JavaScript doesn't have particularly good support for binary data, which makes hashing (and canonical binary representations!) awkward, so I chose to use simple random UUIDs for identifiers, instead.

The database used to store all information about current and past state in the repository, in every branch, for every commit.
Example user-interface-led design criteria:
Any questions?

Here's a problem case. The LCA of "e" and "i" is either "c" or "h". Both "c" and "h" are two steps away from the root.
Note that the path from "e" to "i" through "c" is three steps long, while the path through "h" is two steps long. This could mean that "h" is a more suitable ancestor for use in 3-way merging.
The algorithm I've implemented is very naive and inefficient. It also answers "c" or "h" depending on the order of arguments you give it.

Can truncate, missing piece may be held by other servers, or can fall back to two-way merge