Fossil

Theory

Benefits of version control

  1. Immutable file and version identification
    1. Simplified and unambiguous communication between developers
    2. Detect accidental or surreptitious changes
    3. Locate the origin of discovered files
  2. Parallel development
    1. Multiple developers on the same project
    2. Single developer with multiple subprojects
    3. Experimental features do not contaminate the main line
    4. Development/Testing/Release branches
    5. Incorporate external changes into the baseline
  3. Historical record
    1. Exactly reconstruct historical builds
    2. Locate when and by whom faults were injected
    3. Find when and why content was added or removed
    4. Team members see the big picture
    5. Research the history of project features or subsystems
    6. Copyright and patent documentation
    7. Regulatory compliance
  4. Automatic replication and backup
    1. Everyone always has the latest code
    2. Failed disk-drives cause no loss of work
    3. Avoid wasting time doing manual file copying
    4. Avoid human errors during manual backups

Definitions

  • Project → a conceptual collection of computer files that serve some common purpose. Often the project is a software application and the individual files are source code together with makefiles, scripts, and “README.txt” files. Other examples of projects include books or manuals in which each chapter or section is held in a separate file.
    • Projects change and evolve. The whole purpose of version control is to track and manage that evolution.
    • Most projects contain many files, but it is possible to have a project consisting of just a single file.
    • Fossil requires that all the files for a project must be collected into a single directory hierarchy - a single folder possibly with layers of subfolders. Fossil is not a good choice for managing a project that has files scattered hither and yon all over the disk. In other words, Fossil only works for projects where the files are laid out such that they can be archived into a ZIP file or tarball.
  • Repository → (also called “repo”) a single file that contains all historical versions of all files in a project. A repo is similar to a ZIP archive in that it is a single file that stores compressed versions of many other files. Files can be extracted from the repo and new files can be added to the repo, just as with a ZIP archive. But a repo has other capabilities above and beyond what a ZIP archive can do.
    • Fossil does not care what you name your repository files, though names ending with “.fossil” are recommended.
    • A single project typically has multiple, redundant repositories on separate machines.
    • All repositories stay synchronized with one another by exchanging information via HTTP or SSH.
    • All repos for a single project redundantly store all information about that project. So if any one repo is lost due to a disk crash, all content is preserved in the surviving repos.
    • The usual arrangement is one repository per user. And since most users these days have their own computer, that means one repository per computer. But this is not a requirement. It is ok to have multiple copies of the same repository on the same computer.
    • Fossil works fine with just a single copy of the repository. But in that case there is no redundancy. If that one repository file is lost due to a hardware malfunction, then there is no way to recover the project.
    • Best practice is to keep all repositories for a user in a single folder. Folders such as “~/Fossils” or “%USERPROFILE%\Fossils” are recommended. Fossil itself does not care where the repositories are stored. Nor does Fossil require repositories to be kept in the same folder. But it is easier to organize your work if all repositories are kept in the same place.
  • Check-out → a set of files that have been extracted from a repository and that represent a particular version or snapshot of the project.
    • Check-outs must be on the same computer as the repository from which they are extracted. This is just like with a ZIP archive: one must have the ZIP archive file on the local machine before extracting files from ZIP archive.
    • There can be multiple check-outs (in different folders) from the same repository.
    • The repository must be on the same computer as the check-out, but the relative locations of the repo and the check-out are arbitrary. The repository may be located inside the folder holding the check-out, but it certainly does not have to be and usually is not.
    • A special file exists in every check-out that tells Fossil from which repository the check-out was extracted, and which version of the project the check-out represents. This is the “.fslckout” file on unix systems or the “_FOSSIL_” file on Windows.
  • Check-in → another name for a particular version of the project. A check-in is a collection of files inside of a repository that represent a snapshot of the project for an instant in time. Check-ins exist only inside of the repository. This contrasts with a check-out which is a collection of files outside of the repository.
    • Every check-out knows the check-in from which it was derived. But check-outs might have been edited and so might not exactly match their associated check-in.
    • Check-ins are immutable. They can never be changed. But check-outs are collections of ordinary files on disk. The files of a check-out can be edited just like any other file.
    • A check-in can be thought of as an historical snapshot of a check-out.
    • “Check-in”, “version”, “snapshot”, and “revision” are synonyms.
    • When used as a noun, the word “commit” is another synonym for “check-in”. When used as a verb, the word “commit” means to create a new check-in.

Basic Fossil commands

  • clone → Make a copy of a repository. The original repository is usually (but not always) on a remote machine and the copy is on the local machine. The copy remembers the network location from which it was copied and (by default) tries to keep itself synchronized with the original.
  • open → Create a new check-out from a repository on the local machine.
  • update → Modify an existing check-out so that it is derived from a different version of the same project.
  • commit → Create a new version (a new check-in) of the project that is a snapshot of the current check-out.
  • revert → Undo all local edits on a check-out. Make the check-out be an exact copy of its associated check-in.
  • push → Copy content found in a local repository over to a remote repository. (Fossil usually does this automatically in response to a “commit” and so this command is seldom used, but it is important to understand it.)
  • pull → Copy new content found in a remote repository into a local repository. A “pull” by itself does not modify any check-out. The “pull” command only moves content between repositories. However, the “update” command will (often) automatically do a “pull” before attempting to update the local check-out.
  • sync → Do both a “push” and a “pull” at the same time.
  • add → Add a new file to the local check-out. The file must already be on disk. This command tells Fossil to start tracking and managing the file. This command affects only the local check-out and does not modify any repository. The new file is inserted into the repository at the next “commit” command.
  • rm/mv → Short for ‘remove’ and ‘move’, these commands are like “add” in that they specify pending changes to the structure of the check-out. As with “add”, no changes are made to the repository until the next “commit”.

The history of a project is a Directed Acyclic Graph (DAG)

  • Fossil (and other distributed VCSes like Git and Mercurial, but not Subversion) represent the history of a project as a directed acyclic graph (DAG).
    • Each check-in is a node in the graph
    • If check-in X is derived from check-in Y then there is an arc in the graph from node X to node Y.
    • The older check-in (X) is call the “parent” and the newer check-in (Y) is the “child”. The child is derived from the parent.
  • Two users (or the same user working in different check-outs) might commit different changes against the same check-in. This results in one parent node having two or more children.
  • Command: merge → combines the work of multiple check-ins into a single check-out. That check-out can then be committed to create a new that has two (or more) parents.
    • Most check-ins have just one parent, and either zero or one child.
    • When a check-in has two or more parents, one of those parents is the “primary parent”. All the other parent nodes are “secondary”. Conceptually, the primary parent shows the main line of development. Content from the secondary parents is added into the main line.
    • The “direct children” of a check-in X are all children that have X as their primary parent.
    • A check-in node with no direct children is sometimes called a “leaf”.
    • The “merge” command changes only the check-out. The “commit” command must be run subsequently to make the merge a permanent part of project.
  • Definition: branch → a sequence of check-ins that are all linked together in the DAG through the primary parent.
    • Branches are often given names which propagate to direct children.
    • It is possible to have multiple branches with the same name. Fossil has no problem with this, but it can be confusing to humans, so best practice is to give each branch a unique name.
    • The name of a branch can be changed by adding special tags to the first check-in of a branch. The name assigned by this special tag automatically propagates to all direct children.

Why version control is important (reprise)

  1. Every check-in and every individual file has a unique name - its SHA1 or SHA3-256 hash. Team members can unambiguously identify any specific version of the overall project or any specific version of an individual file.
  2. Any historical version of the whole project or of any individual file can be easily recreated at any time and by any team member.
  3. Accidental changes to files can be detected by recomputing their cryptographic hash.
  4. Files of unknown origin can be identified using their hash.
  5. Developers are able to work in parallel, review each others work, and easily merge their changes together. External revisions to the baseline can be easily incorporated into the latest changes.
  6. Developers can follow experimental lines of development, then revert back to an earlier stable version if the experiment does not work out. Creativity is enhanced by allowing crazy ideas to be investigated without destabilizing the project.
  7. Developers can work on several independent subprojects, flipping back and forth from one subproject to another at will, and merge patches together or back into the main line of development as they mature.
  8. Older changes can be easily backed out of recent revisions, for example if bugs are found long after the code was committed.
  9. Enhancements in a branch can be easily copied into other branches, or into the trunk.
  10. The complete history of all changes is plainly visible to all team members. Project leaders can easily keep track of what all team members are doing. Check-in comments help everyone to understand and/or remember the reason for each change.
  11. New team members can be brought up-to-date with all of the historical code, quickly and easily.
  12. New developers, interns, or inexperienced staff members who still do not understand all the details of the project or who are otherwise prone to making mistakes can be assigned significant subprojects to be carried out in branches without risking main line stability.
  13. Code is automatically synchronized across all machines. No human effort is wasted copying files from machine to machine. The risk of human errors during file transfer and backup is eliminated.
  14. A hardware failure results in minimal lost work because all previously committed changes will have been automatically replicated on other machines.
  15. The complete work history of the project is conveniently archived in a single file, simplifying long-term record keeping.
  16. A precise historical record is maintained which can be used to support copyright and patent claims or regulatory compliance.

(Nearly verbatim copy of [whyusefossil].)

Theory: Branches and forks

_images/branch_00.svg
_images/branch_01.svg
_images/branch_02.svg
_images/branch_03.svg
_images/branch_04.svg
_images/branch_05.svg
_images/branch_06.svg
_images/branch_07.svg
_images/branch_08.svg
_images/branch_09.svg
_images/branch_10.svg
_images/branch_11.svg
_images/branch_12.svg

Practice: a primetime story

Background

You are a math researcher and you are studying Goldbach’s conjecture, one of the oldest (1742) and best-known unsolved problems in number theory and all of mathematics.[wiki-goldbach] It states:

Every even integer greater than 2 can be expressed as the sum of two primes.

Your first project is to write a program checking the Goldbach conjecture.

Initializing

Let’s create a new repository (the “archive” containing all of your project versions and data). I strongly recommend you keep all your repositories in the same dedicated place on your computer for ease of backups. Create a new folder for that in File Explorer, open a shell there, then initialize a new empty repository called goldbach.fossil using the command:

fossil init goldbach.fossil

To begin working on the project inside this repository, we must extract (“check-out”) a specific version from the repository. The result is a check-out (the term “check-out” describes both the action and the result). Create a folder for the check-out (in File Explorer), then open a shell there and run the command:

fossil open ../path/to/goldbach.fossil

The fossil open command tells fossil that a directory is to be used as a check-out for the repository goldbach.fossil.

Note

You can’t use the fossil checkout command directly for the first check-out because fossil doesn’t know what repository you want to check-out from. fossil open is used only for the first check-out; after that, fossil will remember (for that directory).

(In case you’re wondering, the way that fossil “remembers” is through the use of the “_FOSSIL_” hidden file (“.fslckout” on Mac and GNU/Linux) inside your check-out, which tells fossil that that directory is a check-out.)

First check-in

You write your initial version of the code. Since this workshop is about version control and not about programming, you can download the initial version goldbach.py, and save it inside your check-out. You can have a look at the contents using your favourite text editor.

You now have to tell fossil to start tracking this file, so you use the fossil add command:

fossil add goldbach.py

Having written this initial version of the code, you commit it (thus creating a new check-in). This is as simple as running:

fossil commit

Fossil now prompts you for a commit message. This message should be a summary of the changes you’ve made from the version you’ve checked out initially. Since it’s the initial commit, you can set the message to simply “initial commit”. Congratulations, you’ve produced your first check-in!

You can examine the results using the fossil ui command, which opens the fossil repository interface in your web browser. Just run:

fossil ui

Try the “Timeline” page. When you’re done, hit Ctrl-C.

First bugfix

You can run the code using:

python3 goldbach.py

This program is supposed to produce, for each integer n, the list of pairs of primes (a, b) such that a + b = n.

As you run it, you notice that the first line of output is 4 [(1, 3)]. However, 1 is not a prime number. It turns out that the problem is the if n < 1: return False line in the is_prime(n) function. The condition should instead be n < 2 instead.

Modify your the goldbach.py file to fix the error, then test by running python3 goldbach.py. The 1’s should be gone now.

Now is a good time to commit this change. To remind yourself of what changed, you can view the difference between the original check-out version and the current state of the files in the check-out (try them all out):

fossil diff

## OR ##

fossil diff --tk

## OR ##

fossil gdiff

To view a summary of the changes, use:

fossil changes

## OR ##

fossil status

Now that you’re happy with the changes you’ve made, you can commit by typing:

fossil commit

with a nice commit message summarizing the change, such as “fix bug in is_prime that made it think 1 is prime”. You can now run fossil ui to look at your new commit.

First feature branch

Your advisor now sends you an email and tells you that, as a check, you should make your code print out OEIS sequence A045917, i.e. the sequence where A(n) is the “number of decompositions of 2n into unordered sums of two primes”.

You implement this feature in a separate function:

def OEIS_A045917(start, end):
    ''' produce OEIS sequence A045917, i.e. number of decompositions
    of 2n into unordered sums of two primes.

https://oeis.org/A045917 '''
    seq = []
    for n in range(start, end+1):
        seq.append(len(list(find_goldbach_pairs(2*n))))
    return seq

(which you can place right above the def main():) and then you modify the main() function so that it prints out the sequence, by adding the line:

print("OEIS:", OEIS_A045917(1, 100))

just below the '''main method''' line.

Since this is a separate feature, you create a feature branch dedicated to the development of this feature, instead of committing to the main trunk branch. You then commit using the --branch option:

fossil commit --branch OEIS-feature

Note

But what exactly is a branch? See here for a simple graphical explanation.

Note

Alternately, you can create the branch in a separate operation (this effectively creates an empty check-in):

fossil branch new OEIS-feature trunk
fossil update OEIS-feature
fossil commit

Bugfix on the feature branch

Your advisor checks out the code and tells you the output is wrong. You run:

python3 goldbach.py

and see that the sequence starts with “[0, 0, 0, 1]” whereas it should be “[0, 1, 1, 1]”. Indeed, you can write 4 as 2+2, but your program is missing that. The problem is an off-by-one error in find_goldbach_pairs(n); the range(1, n//2) should instead be range(1, n//2 + 1) (since the range is open at the upper end).

Having fixed that, you commit the change using fossil commit.

The coworker problem

Meanwhile, your coworker has agreed to take a look at your code on trunk (the main branch). Your coworker notices that your is_prime(n) function could be significantly sped up.

Run the following command to update the check-out to the latest version on trunk:

fossil checkout trunk

Note

In reality you would sync your code with an online fossil repository, and give your coworker access to it. Your coworker would then grab a copy of your fossil repository (including all versions and history information) by running something like:

fossil clone https://example.com/f/goldbach/ goldbach.fossil

inside the folder where they keep their fossil repositories. Note that by default, fossil automatically synchronizes its contents with the last-used remote server on every commit. You can change this using the autosync setting in fossil settings.

Your coworker notices that you don’t need to check all possible divisors from 2 to n-1, but merely to 2 to sqrt(n). They therefore change the line for i in range(2, n): to for i in range(2, int(n**0.5)+1):, and commit:

fossil commit

(Make sure to summarize the commit in the message!)

You now notice that your coworker committed this interesting speedup, and you would like to benefit from it on your feature branch. Open up fossil ui and examine the situation.

What you would like to do is to merge all the changes in trunk (in this case, the one check-in your coworker made) into the feature branch. You would do this as follows:

# check-out latest version on OEIS-feature branch
# (this is just because we were on trunk previously as we were
#  pretending to be our imaginary coworker)
fossil checkout OEIS-feature

# merge all the changes in trunk
fossil merge trunk

You verify that everything is fine:

fossil diff --tk

Happy with the result, you commit:

fossil commit

Cherrypicking merge

While developing on the OEIS-feature branch, you discovered an off-by-one error that affects the code on trunk. You would like to fix the issue on trunk without merging the entire OEIS-feature branch just yet. That is, you would like to cherrypick which changes from OEIS-feature you would like to merge onto trunk.

To do that, first check out the latest version on trunk (where you would like to merge):

fossil checkout trunk

Next, use the fossil timeline command (or simply look at the Timeline page in fossil ui) to find out the commit ID of the change you would like to merge (this is where commit comments are helpful), specifically the commit where you fixed the off-by-one error from before. The commit ID will be the hexadecimal gibberish in square brackets (let’s say it’s “[68a1a63e]”).

Next, merge, cherrypicking that commit only:

fossil merge --cherrypick 68a1a63e

Finally, commit:

fossil commit

Notice that by default, the commit message suggested to you will be the same as the commit message of the cherrypicked commit that was merged in.

If you use fossil ui, you’ll notice that fossil won’t draw an arrow from the cherrypicked commit to the new commit. (This may change in the future.)

Note

--cherrypick has a counterpart --backout which undoes the change in the specified commit. This is useful for backing out of a bad change earlier on the branch.

Integrating the feature branch

Your advisor tells you that the OEIS feature branch is ready for primetime, so you would like to merge it into trunk. Easy:

fossil checkout trunk
fossil merge --integrate OEIS-feature
fossil commit

The --integrate switch tells fossil that you’re done with that branch, i.e. you don’t expect to make any more commits on top of it, i.e. it’s “closed”. As a result, the command fossil branch will no longer show that branch by default.

Note

But what exactly is a branch? See here for a simple graphical explanation.

Practice: exploring the fossil source tree

Let’s grab a copy of the fossil source tree:

fossil clone https://fossil-scm.org/ fossil.fossil

Next, create a temporary directory “fossil” for the check-out, and check-out fossil inside it:

mkdir checkout
cd checkout
fossil open ../path/to/fossil.fossil

Comparing two versions

Let’s suppose we want to compare versions [c8b46764] and [1374d581] of fossil itself.

Using fossil’s diff-tk

Easy:

fossil diff --tk --from c8b46764 --to 1374d581

Using the fossil web UI

Launch a fossil ui, navigate to the “Timeline” page, then click on the bubble next to the first commit, then on the bubble next to the second commit. Fossil will take you to a page containing the full changes between these two commits, similar to this.

Using kdiff3

Check-out version “c8b46764” in the first check-out:

fossil checkout c8b46764

Create a second check-out “fossil2”, next to your initial check-out “fossil”, i.e.:

cd ..
mkdir fossil2
cd fossil2
fossil open ../path/to/repo.fossil
fossil checkout 1374d581

Then use kdiff3 to compare the two check-outs:

cd ..
kdiff3 fossil/ fossil2/

Miscellaneous

“I’ve made a huge mistake”

Worry not, you can edit almost any of the check-in metadata in the Fossil UI, including:

  • The check-in comment/message
  • The background color
  • Whether the check-in is the start of a new branch. That’s right, you can decide that a check-in should be the start of a new branch after the fact!
  • Whether the branch should be hidden by default.
  • Whether the branch is “closed”.

Just click on the check-in you’d like to edit, then click on the “edit” link.

If you make a bad commit (as in, checked in the wrong files), the typical way to deal with it is to edit it so that it’s on a branch named “mistake”, and set it to be hidden.

If you’ve made other commits on top of the bad commit, use fossil merge --backout to undo the bad commit only.

Note

You are perhaps entertaining the thought of rewriting the entire history since the bad commit. Fossil does not provide any facility to automate that, whereas git allows users to do it via the git rebase command. The absence of this “feature” is intentional: Fossil’s philosophy is to record history as it happened, not as it “should” have happened.

Note that the rebase feature in git is fraught with peril: if a user rebases a branch in a collaborative project, the other users will find themselves working on commits that don’t belong to any branch, and may experience significant amounts of frustration (depending on how many commits they’ve made based on the branch that had been rebased).

Collaborative development

Synchronization

If you’ve hosted your fossil project somewhere (say, at https://example.com/, you can use fossil sync to sync all data between your local repository and the remote repository. Effectively, they will become copies of each other.

Note that by default, fossil will attempt to sync before and after every commit to prevent forks in the development.

Fossil update

Let’s say that you’re about to commit a change, and someone else just committed something else to the same branch. Fossil helpfully informs you of the situation and tells you:

$ fossil commit
would fork.  "update" first or use --allow-fork.

You can go ahead and force a fork using the --allow-fork option, then merge back manually as you normally would.

However, the easier and often faster option is to use the fossil update command. This command tries to take the current changes that you’ve made to the checkout and apply (merge) them on top of a different version (by default, the latest commit on the same branch). If the changes in your commit and your coworker’s commit don’t overlap, all you’ll need to do is:

fossil update

(If it turns they do overlap and it would be a headache to sort out without an explicit merge, you can undo the fossil update command using fossil undo, then commit with --allow-fork and perform an explicit merge afterwards).

Antisocial coding: Private branches

Sometimes a branch is intended to be so bad, so hacky, so experimental, that you just don’t want other people to see it. Thankfully, Fossil provides a way for you to avoid sharing that branch with other people: that is, private branches.

To create a private branch, use the --private switch on your fossil commit or fossil branch new command.

Note that if you do merge the changes from a private branch into a public (normal) branch, you will not see a merge arrow in the timeline. This is normal, because otherwise other people would become aware of the existence of the private branch.

Dealing with larger projects

Do not add derivative files

If a file is an executable or a pdf or something entirely and automatically derived from source files that are also part of the project (and that are tracked by version control), I recommend against adding it to Fossil. Such derivative files are redundant as they can be produced from your source files, and they will take up space in your repository for all eternity (Fossil tries very hard not to lose data, which will make your task of removing files after the fact rather difficult (though by all means not impossible)).

If you want to keep a history of derivative files, you may consider tracking them in a separate repository.

Of course, I’m not the boss of you, and you may very well decide that the benefits of tracking the output/derivative files outweigh the costs.

Fossil addremove

When dealing with projects with lots and lots of files, you can accidentally forget to fossil add a file you created (to tell Fossil to track it). To avoid this, you can use the fossil extra command to see all the files in the checkout that fossil is not currently tracking, and you can use the fossil addremove command to tell fossil to automatically fossil add them.

To see the files that would be added (or marked as removed) by fossil addremove without actually adding/removing, use:

fossil addremove -n

To actually add/remove, run:

fossil addremove

Ignoring files

The problem is that fossil extra and fossil addremove -n will report all sorts of junk or derivative files you don’t care about.

For example, if you use LaTeX, it’ll list those annoying “foo.aux” and “foo.log” files. If you use C, it’ll list the “foo.o” temporary object files. If you use Python, it’ll list the “*.pyc” bytecode files. If your text editor uses a scheme like “foo.txt~” for backup files, those will be listed there too. This makes it quite hard to see files that you might actually care about.

To avoid that, you can use the “ignore-glob” feature. In the base directory of your project, create a directory “.fossil-settings” (yes, with a leading dot), and inside that, create a text file named “ignore-glob”. i.e.:

mkdir -p .fossil-settings/
k .fossil-settings/ignore-glob

Inside this file you can put, one per line, a glob pattern matching files and directories you wish to ignore. For example, to match the annoying files from above, you could put in:

*.aux
*.log
*.o
*.pyc
*~

I recommend having a separate “output” or “build” or “work” directory containing the files that are produced as a result of running your code, which you can then wholesale ignore by adding a line like:

output/

inside your “.fossil-settings/ignore-glob” file.

Fossil rm

To tell fossil to stop caring about a file foo, use:

fossil rm foo

To also delete the file at the same time, use:

fossil rm --hard foo

Fossil mv

To rename a file (and tell fossil about it so it can track changes across the rename), use:

fossil mv --hard oldname newname

To reduce confusion (for both yourself and other people inspecting the history later on), I strongly recommend performing the rename operations as part of a separate commit (not containing any other changes).

Fossil revert

Suppose you’d like to revert the changes you’ve made. Use:

# revert changes in file `foo`
fossil revert foo

# revert all changes made to current checkout
fossil revert

If the fossil revert itself was a mistake, run fossil undo to undo it.

Other useful commands

Use fossil help to find out more, or check out the built-in documentation here. (This is also available via fossil ui, then navigating to the “/help” page.)