Fossil¶
Theory¶
Benefits of version control¶
Immutable file and version identification
Simplified and unambiguous communication between developers
Detect accidental or surreptitious changes
Locate the origin of discovered files
Parallel development
Multiple developers on the same project
Single developer with multiple subprojects
Experimental features do not contaminate the main line
Development/Testing/Release branches
Incorporate external changes into the baseline
Historical record
Exactly reconstruct historical builds
Locate when and by whom faults were injected
Find when and why content was added or removed
Team members see the big picture
Research the history of project features or subsystems
Copyright and patent documentation
Regulatory compliance
Automatic replication and backup
Everyone always has the latest code
Failed disk-drives cause no loss of work
Avoid wasting time doing manual file copying
Avoid human errors during manual backups
Definitions¶
Project → a conceptual collection of computer files that serve some common purpose. Often the project is a software application and the individual files are source code together with makefiles, scripts, and “README.txt” files. Other examples of projects include books or manuals in which each chapter or section is held in a separate file.
Projects change and evolve. The whole purpose of version control is to track and manage that evolution.
Most projects contain many files, but it is possible to have a project consisting of just a single file.
Fossil requires that all the files for a project must be collected into a single directory hierarchy - a single folder possibly with layers of subfolders. Fossil is not a good choice for managing a project that has files scattered hither and yon all over the disk. In other words, Fossil only works for projects where the files are laid out such that they can be archived into a ZIP file or tarball.
Repository → (also called “repo”) a single file that contains all historical versions of all files in a project. A repo is similar to a ZIP archive in that it is a single file that stores compressed versions of many other files. Files can be extracted from the repo and new files can be added to the repo, just as with a ZIP archive. But a repo has other capabilities above and beyond what a ZIP archive can do.
Fossil does not care what you name your repository files, though names ending with “.fossil” are recommended.
A single project typically has multiple, redundant repositories on separate machines.
All repositories stay synchronized with one another by exchanging information via HTTP or SSH.
All repos for a single project redundantly store all information about that project. So if any one repo is lost due to a disk crash, all content is preserved in the surviving repos.
The usual arrangement is one repository per user. And since most users these days have their own computer, that means one repository per computer. But this is not a requirement. It is ok to have multiple copies of the same repository on the same computer.
Fossil works fine with just a single copy of the repository. But in that case there is no redundancy. If that one repository file is lost due to a hardware malfunction, then there is no way to recover the project.
Best practice is to keep all repositories for a user in a single folder. Folders such as “~/Fossils” or “%USERPROFILE%\Fossils” are recommended. Fossil itself does not care where the repositories are stored. Nor does Fossil require repositories to be kept in the same folder. But it is easier to organize your work if all repositories are kept in the same place.
Check-out → a set of files that have been extracted from a repository and that represent a particular version or snapshot of the project.
Check-outs must be on the same computer as the repository from which they are extracted. This is just like with a ZIP archive: one must have the ZIP archive file on the local machine before extracting files from ZIP archive.
There can be multiple check-outs (in different folders) from the same repository.
The repository must be on the same computer as the check-out, but the relative locations of the repo and the check-out are arbitrary. The repository may be located inside the folder holding the check-out, but it certainly does not have to be and usually is not.
A special file exists in every check-out that tells Fossil from which repository the check-out was extracted, and which version of the project the check-out represents. This is the “.fslckout” file on unix systems or the “_FOSSIL_” file on Windows.
Check-in → another name for a particular version of the project. A check-in is a collection of files inside of a repository that represent a snapshot of the project for an instant in time. Check-ins exist only inside of the repository. This contrasts with a check-out which is a collection of files outside of the repository.
Every check-out knows the check-in from which it was derived. But check-outs might have been edited and so might not exactly match their associated check-in.
Check-ins are immutable. They can never be changed. But check-outs are collections of ordinary files on disk. The files of a check-out can be edited just like any other file.
A check-in can be thought of as an historical snapshot of a check-out.
“Check-in”, “version”, “snapshot”, and “revision” are synonyms.
When used as a noun, the word “commit” is another synonym for “check-in”. When used as a verb, the word “commit” means to create a new check-in.
Basic Fossil commands¶
clone → Make a copy of a repository. The original repository is usually (but not always) on a remote machine and the copy is on the local machine. The copy remembers the network location from which it was copied and (by default) tries to keep itself synchronized with the original.
open → Create a new check-out from a repository on the local machine.
update → Modify an existing check-out so that it is derived from a different version of the same project.
commit → Create a new version (a new check-in) of the project that is a snapshot of the current check-out.
revert → Undo all local edits on a check-out. Make the check-out be an exact copy of its associated check-in.
push → Copy content found in a local repository over to a remote repository. (Fossil usually does this automatically in response to a “commit” and so this command is seldom used, but it is important to understand it.)
pull → Copy new content found in a remote repository into a local repository. A “pull” by itself does not modify any check-out. The “pull” command only moves content between repositories. However, the “update” command will (often) automatically do a “pull” before attempting to update the local check-out.
sync → Do both a “push” and a “pull” at the same time.
add → Add a new file to the local check-out. The file must already be on disk. This command tells Fossil to start tracking and managing the file. This command affects only the local check-out and does not modify any repository. The new file is inserted into the repository at the next “commit” command.
rm/mv → Short for ‘remove’ and ‘move’, these commands are like “add” in that they specify pending changes to the structure of the check-out. As with “add”, no changes are made to the repository until the next “commit”.
The history of a project is a Directed Acyclic Graph (DAG)¶
Fossil (and other distributed VCSes like Git and Mercurial, but not Subversion) represent the history of a project as a directed acyclic graph (DAG).
Each check-in is a node in the graph
If check-in X is derived from check-in Y then there is an arc in the graph from node X to node Y.
The older check-in (X) is call the “parent” and the newer check-in (Y) is the “child”. The child is derived from the parent.
Two users (or the same user working in different check-outs) might commit different changes against the same check-in. This results in one parent node having two or more children.
Command: merge → combines the work of multiple check-ins into a single check-out. That check-out can then be committed to create a new that has two (or more) parents.
Most check-ins have just one parent, and either zero or one child.
When a check-in has two or more parents, one of those parents is the “primary parent”. All the other parent nodes are “secondary”. Conceptually, the primary parent shows the main line of development. Content from the secondary parents is added into the main line.
The “direct children” of a check-in X are all children that have X as their primary parent.
A check-in node with no direct children is sometimes called a “leaf”.
The “merge” command changes only the check-out. The “commit” command must be run subsequently to make the merge a permanent part of project.
Definition: branch → a sequence of check-ins that are all linked together in the DAG through the primary parent.
Branches are often given names which propagate to direct children.
It is possible to have multiple branches with the same name. Fossil has no problem with this, but it can be confusing to humans, so best practice is to give each branch a unique name.
The name of a branch can be changed by adding special tags to the first check-in of a branch. The name assigned by this special tag automatically propagates to all direct children.
Why version control is important (reprise)¶
Every check-in and every individual file has a unique name - its SHA1 or SHA3-256 hash. Team members can unambiguously identify any specific version of the overall project or any specific version of an individual file.
Any historical version of the whole project or of any individual file can be easily recreated at any time and by any team member.
Accidental changes to files can be detected by recomputing their cryptographic hash.
Files of unknown origin can be identified using their hash.
Developers are able to work in parallel, review each others work, and easily merge their changes together. External revisions to the baseline can be easily incorporated into the latest changes.
Developers can follow experimental lines of development, then revert back to an earlier stable version if the experiment does not work out. Creativity is enhanced by allowing crazy ideas to be investigated without destabilizing the project.
Developers can work on several independent subprojects, flipping back and forth from one subproject to another at will, and merge patches together or back into the main line of development as they mature.
Older changes can be easily backed out of recent revisions, for example if bugs are found long after the code was committed.
Enhancements in a branch can be easily copied into other branches, or into the trunk.
The complete history of all changes is plainly visible to all team members. Project leaders can easily keep track of what all team members are doing. Check-in comments help everyone to understand and/or remember the reason for each change.
New team members can be brought up-to-date with all of the historical code, quickly and easily.
New developers, interns, or inexperienced staff members who still do not understand all the details of the project or who are otherwise prone to making mistakes can be assigned significant subprojects to be carried out in branches without risking main line stability.
Code is automatically synchronized across all machines. No human effort is wasted copying files from machine to machine. The risk of human errors during file transfer and backup is eliminated.
A hardware failure results in minimal lost work because all previously committed changes will have been automatically replicated on other machines.
The complete work history of the project is conveniently archived in a single file, simplifying long-term record keeping.
A precise historical record is maintained which can be used to support copyright and patent claims or regulatory compliance.
(Nearly verbatim copy of [whyusefossil].)
Theory: Branches and forks¶
Click here to see a slideshow explaining branches and forks in Fossil.
Practice: a primetime story¶
Background¶
You are a math researcher and you are studying Goldbach’s conjecture, one of the oldest (1742) and best-known unsolved problems in number theory and all of mathematics.[wiki-goldbach] It states:
Every even integer greater than 2 can be expressed as the sum of two primes.
Your first project is to write a program checking the Goldbach conjecture.
Initializing¶
Let’s create a new repository (the “archive”
containing all of your project versions and data). I strongly
recommend you keep all your repositories in the same dedicated place
on your computer for ease of backups. Create a new folder for that in
File Explorer, open a shell there, then initialize a new empty
repository called goldbach.fossil
using the command:
fossil init goldbach.fossil
To begin working on the project inside this repository, we must extract (“check-out”) a specific version from the repository. The result is a check-out (the term “check-out” describes both the action and the result). Create a folder for the check-out (in File Explorer), then open a shell there and run the command:
fossil open ../path/to/goldbach.fossil
The fossil open
command tells fossil that a directory is to be used as a
check-out for the repository goldbach.fossil
.
Note
You can’t use the fossil checkout
command directly for the
first check-out because fossil doesn’t know what repository you
want to check-out from. fossil open
is used only for the first
check-out; after that, fossil will remember (for that directory).
(In case you’re wondering, the way that fossil “remembers” is through the use of the “_FOSSIL_” hidden file (“.fslckout” on Mac and GNU/Linux) inside your check-out, which tells fossil that that directory is a check-out.)
First check-in¶
You write your initial version of the code. Since this workshop is
about version control and not about programming, you can download the
initial version goldbach.py
, and save it inside your
check-out. You can have a look at the contents using your favourite
text editor.
You now have to tell fossil to start tracking this file, so you use the fossil add
command:
fossil add goldbach.py
Having written this initial version of the code, you commit it (thus creating a new check-in). This is as simple as running:
fossil commit
Fossil now prompts you for a commit message. This message should be a summary of the changes you’ve made from the version you’ve checked out initially. Since it’s the initial commit, you can set the message to simply “initial commit”. Congratulations, you’ve produced your first check-in!
You can examine the results using the fossil ui
command, which
opens the fossil repository interface in your web browser. Just run:
fossil ui
Try the “Timeline” page. When you’re done, hit Ctrl-C.
First bugfix¶
You can run the code using:
python3 goldbach.py
This program is supposed to produce, for each integer n
, the list
of pairs of primes (a, b)
such that a + b = n
.
As you run it, you notice that the first line of output is
4 [(1, 3)]
. However, 1
is not a prime number. It
turns out that the problem is the if n < 1: return False
line in
the is_prime(n)
function. The condition should instead be n <
2
instead.
Modify your the goldbach.py
file to fix the error, then test by
running python3 goldbach.py
. The 1’s should be gone now.
Now is a good time to commit this change. To remind yourself of what changed, you can view the difference between the original check-out version and the current state of the files in the check-out (try them all out):
fossil diff
## OR ##
fossil diff --tk
## OR ##
fossil gdiff
To view a summary of the changes, use:
fossil changes
## OR ##
fossil status
Now that you’re happy with the changes you’ve made, you can commit by typing:
fossil commit
with a nice commit message summarizing the change, such as “fix bug in
is_prime that made it think 1 is prime”. You can now run fossil ui
to look at your new commit.
First feature branch¶
Your advisor now sends you an email and tells you that, as a check, you should make your code print out OEIS sequence A045917, i.e. the sequence where A(n) is the “number of decompositions of 2n into unordered sums of two primes”.
You implement this feature in a separate function:
def OEIS_A045917(start, end):
''' produce OEIS sequence A045917, i.e. number of decompositions
of 2n into unordered sums of two primes.
https://oeis.org/A045917 '''
seq = []
for n in range(start, end+1):
seq.append(len(list(find_goldbach_pairs(2*n))))
return seq
(which you can place right above the def main():
) and then you modify the main()
function so that it prints out the sequence, by adding the line:
print("OEIS:", OEIS_A045917(1, 100))
just below the '''main method'''
line.
Since this is a separate feature, you create a feature branch
dedicated to the development of this feature, instead of committing to
the main trunk
branch. You then commit using the --branch
option:
fossil commit --branch OEIS-feature
Note
But what exactly is a branch? See here for a simple graphical explanation.
Note
Alternately, you can create the branch in a separate operation (this effectively creates an empty check-in):
fossil branch new OEIS-feature trunk
fossil update OEIS-feature
fossil commit
Bugfix on the feature branch¶
Your advisor checks out the code and tells you the output is wrong. You run:
python3 goldbach.py
and see that the sequence starts with “[0, 0, 0, 1]” whereas it should
be “[0, 1, 1, 1]”. Indeed, you can write 4 as 2+2, but your program is
missing that. The problem is an off-by-one error in
find_goldbach_pairs(n)
; the range(1, n//2)
should instead be
range(1, n//2 + 1)
(since the range
is open at the upper end).
Having fixed that, you commit the change using fossil commit
.
The coworker problem¶
Meanwhile, your coworker has agreed to take a look at your code on
trunk
(the main branch). Your coworker notices that your
is_prime(n)
function could be significantly sped up.
Run the following command to update the check-out to the latest
version on trunk
:
fossil checkout trunk
Note
In reality you would sync your code with an online fossil repository, and give your coworker access to it. Your coworker would then grab a copy of your fossil repository (including all versions and history information) by running something like:
fossil clone https://example.com/f/goldbach/ goldbach.fossil
inside the folder where they keep their fossil repositories. Note
that by default, fossil automatically synchronizes its contents
with the last-used remote server on every commit. You can change
this using the autosync
setting in fossil settings
.
Your coworker notices that you don’t need to check all possible
divisors from 2
to n-1
, but merely to 2
to
sqrt(n)
. They therefore change the line for i in range(2, n):
to for i in range(2, int(n**0.5)+1):
, and commit:
fossil commit --user-override your-coworker-name
(Make sure to summarize the commit in the message!)
Note
The --user-override USER
parameter is not necessary in the
above command. This parameter allows you to override the author
(username) for a single checkin.
Committing under a different username can be useful to give someone else proper credit (or blame). It is especially useful to track contributions from someone who doesn’t want to use version control, such as your advisor.
You now notice that your coworker committed this interesting speedup,
and you would like to benefit from it on your feature branch. Open up
fossil ui
and examine the situation.
What you would like to do is to merge all the changes in trunk
(in this case, the one check-in your coworker made) into the feature
branch. You would do this as follows:
# check-out latest version on OEIS-feature branch
# (this is just because we were on trunk previously as we were
# pretending to be our imaginary coworker)
fossil checkout OEIS-feature
# merge all the changes in trunk
fossil merge trunk
You verify that everything is fine:
fossil diff --tk
Happy with the result, you commit:
fossil commit
Cherrypicking merge¶
While developing on the OEIS-feature
branch, you discovered an
off-by-one error that affects the code on trunk
. You would like to
fix the issue on trunk
without merging the entire OEIS-feature
branch just yet. That is, you would like to cherrypick which
changes from OEIS-feature
you would like to merge onto trunk
.
To do that, first check out the latest version on trunk
(where you
would like to merge):
fossil checkout trunk
Next, use the fossil timeline
command (or simply look at the
Timeline page in fossil ui
) to find out the commit ID of the
change you would like to merge (this is where commit comments are
helpful), specifically the commit where you fixed the off-by-one error
from before. The commit ID will be the
hexadecimal gibberish in square brackets (let’s say it’s
“[68a1a63e]”).
Next, merge, cherrypicking that commit only:
fossil merge --cherrypick 68a1a63e
Finally, commit:
fossil commit
Notice that, by default, the commit message suggested to you will be the same as the commit message of the cherrypicked commit that was merged in.
If you use fossil ui
, you’ll notice that fossil won’t draw an
arrow from the cherrypicked commit to the new commit. (This may change
in the future.)
Note
--cherrypick
has a counterpart --backout
which undoes
the change in the specified commit. This is useful for backing out
of a bad change earlier on the branch.
Integrating the feature branch¶
Your advisor tells you that the OEIS feature branch is ready for
primetime, so you would like to merge it into trunk
. Easy:
fossil checkout trunk
fossil merge --integrate OEIS-feature
fossil commit
The --integrate
switch tells fossil that you’re done with that
branch, i.e. you don’t expect to make any more commits on top of it,
i.e. it’s “closed”. As a result, the command fossil branch
will no
longer show that branch by default.
Note
But what exactly is a branch? See here for a simple graphical explanation.
Practice: exploring the fossil source tree¶
Let’s grab a copy of the fossil source tree:
fossil clone https://fossil-scm.org/ fossil.fossil
Next, create a temporary directory “fossil” for the check-out, and check-out fossil inside it:
mkdir checkout
cd checkout
fossil open ../path/to/fossil.fossil
Comparing two versions¶
Let’s suppose we want to compare versions [c8b46764] and [1374d581] of fossil itself.
Using the fossil web UI¶
Launch a fossil ui
, navigate to the “Timeline” page, then click on
the bubble next to the first commit, then on the bubble next to the
second commit. Fossil will take you to a page containing the full
changes between these two commits, similar to this.
Using kdiff3¶
Check-out version “c8b46764” in the first check-out:
fossil checkout c8b46764
Create a second check-out “fossil2”, next to your initial check-out “fossil”, i.e.:
cd ..
mkdir fossil2
cd fossil2
fossil open ../path/to/repo.fossil
fossil checkout 1374d581
Then use kdiff3 to compare the two check-outs:
cd ..
kdiff3 fossil/ fossil2/
Miscellaneous¶
“I’ve made a huge mistake”¶
Worry not, you can edit almost any of the check-in metadata in the Fossil UI, including:
The check-in comment/message
The background color
Whether the check-in is the start of a new branch. That’s right, you can decide that a check-in should be the start of a new branch after the fact!
Whether the branch should be hidden by default.
Whether the branch is “closed”.
Just click on the check-in you’d like to edit, then click on the “edit” link.
If you make a bad commit (as in, checked in the wrong files), the typical way to deal with it is to edit it so that it’s on a branch named “mistake”, and set it to be hidden.
If you’ve made other commits on top of the bad commit, use fossil
merge --backout
to undo the bad commit only.
Note
You are perhaps entertaining the thought of rewriting the entire
history since the bad commit. Fossil does not provide any facility
to automate that, whereas git allows users to do it via the git
rebase
command. The absence of this “feature” is intentional:
Fossil’s philosophy is to record history as it happened, not as it
“should” have happened.
Note that the rebase feature in git is fraught with peril: if a user rebases a branch in a collaborative project, the other users will find themselves working on commits that don’t belong to any branch, and may experience significant amounts of frustration (depending on how many commits they’ve made based on the branch that had been rebased).
Collaborative development¶
Synchronization¶
If you’ve hosted your fossil project somewhere (say, at
https://example.com/
, you can use fossil sync
to sync all data
between your local repository and the remote repository. Effectively,
they will become copies of each other.
Note that, by default, fossil will attempt to sync before and after every commit to prevent forks in the development.
Fossil update¶
Let’s say that you’re about to commit a change, and someone else just committed something else to the same branch. Fossil helpfully informs you of the situation and tells you:
$ fossil commit
would fork. "update" first or use --allow-fork.
You can go ahead and force a fork using the --allow-fork
option,
then merge back manually as you
normally would.
However, the easier and often faster option is to use the fossil
update
command. This command tries to take the current changes that
you’ve made to the checkout and apply (merge) them on top of a
different version (by default, the latest commit on the same
branch). If the changes in your commit and your coworker’s commit
don’t overlap, all you’ll need to do is:
fossil update
(If it turns they do overlap and it would be a headache to sort out
without an explicit merge, you can undo the fossil update
command
using fossil undo
, then commit with --allow-fork
and perform
an explicit merge afterwards).
Antisocial coding: Private branches¶
Sometimes a branch is intended to be so bad, so hacky, so experimental, that you just don’t want other people to see it. Thankfully, Fossil provides a way for you to avoid sharing that branch with other people: that is, private branches.
To create a private branch, use the --private
switch on your
fossil commit
or fossil branch new
command.
Note that if you do merge the changes from a private branch into a public (normal) branch, you will not see a merge arrow in the timeline. This is normal, because otherwise other people would become aware of the existence of the private branch.
Dealing with larger projects¶
Do not add derivative files¶
If a file is an executable or a pdf or something entirely and automatically derived from source files that are also part of the project (and that are tracked by version control), I recommend against adding it to Fossil. Such derivative files are redundant as they can be produced from your source files, and they will take up space in your repository for all eternity (Fossil tries very hard to not lose your data, which will make your task of removing files after the fact rather difficult (though by all means not impossible)).
If you want to keep a history of derivative files, you may consider tracking them in a separate repository.
If you don’t care about keeping a history of the derivative files, and you just want to (ab)use your fossil repository as your personal dropbox, you can use the unversioned files feature. For example, to store the latest copy of your paper draft PDF as an unversioned file and sync (copy) it to a remote server, you can use:
fossil uv add paper.pdf
fossil sync -u
You can then view the list of remote files at the /uvlist path in your browser. You can copy the link from there and send it to someone else.
Of course, I’m not the boss of you, and you may very well decide that the benefits of tracking the output/derivative files outweigh the costs.
Fossil addremove¶
When dealing with projects with lots and lots of files, you can
accidentally forget to fossil add
a file you created (to tell
Fossil to track it). To avoid this, you can use the fossil extra
command to see all the files in the checkout that fossil is not
currently tracking, and you can use the fossil addremove
command
to tell fossil to automatically fossil add
them.
To see the files that would be added (or marked as removed) by
fossil addremove
without actually adding/removing, use:
fossil addremove -n
To actually add/remove, run:
fossil addremove
Ignoring files¶
The problem is that fossil extra
and fossil addremove -n
will
report all sorts of junk or derivative files you don’t care about.
For example, if you use LaTeX, it’ll list those annoying “foo.aux” and “foo.log” files. If you use C, it’ll list the “foo.o” temporary object files. If you use Python, it’ll list the “*.pyc” bytecode files. If your text editor uses a scheme like “foo.txt~” for backup files, those will be listed there too. This makes it quite hard to see files that you might actually care about.
To avoid that, you can use the “ignore-glob” feature. In the base directory of your project, create a directory “.fossil-settings” (yes, with a leading dot), and inside that, create a text file named “ignore-glob”. i.e.:
mkdir -p .fossil-settings/
k .fossil-settings/ignore-glob
Inside this file you can put, one per line, a glob pattern matching files and directories you wish to ignore. For example, to match the annoying files from above, you could put in:
*.aux
*.log
*.o
*.pyc
*~
I recommend having a separate “output” or “build” or “work” directory containing the files that are produced as a result of running your code, which you can then wholesale ignore by adding a line like:
output/
inside your “.fossil-settings/ignore-glob” file.
Fossil rm¶
To tell fossil to stop caring about a file foo
, use:
fossil rm foo
To also delete the file at the same time, use:
fossil rm --hard foo
Fossil mv¶
To rename a file (and tell fossil about it so it can track changes across the rename), use:
fossil mv --hard oldname newname
To reduce confusion (for both yourself and other people inspecting the history later on), I strongly recommend performing the rename operations as part of a separate commit (not containing any other changes).
Fossil revert¶
Suppose you’d like to revert the changes you’ve made. Use:
# revert changes in file `foo`
fossil revert foo
# revert all changes made to current checkout
fossil revert
If the fossil revert
itself was a mistake, run fossil undo
to
undo it.