I have just finished reading Martin Fowler's excellent paper on Continuous Integration (CI). Well worth a read and if you are not using CI then I urge to spend time implementing it. However there is one idea that I did want to pick up on:
In general you should store in source control everything you need to build anything, but nothing that you actually build. Some people do keep the build products in source control, but I consider that to be a smell - an indication of a deeper problem, usually an inability to reliably recreate builds.
I think in the context of CI or when using most build tools this approach is quite correct. We have limited control on the build environment and the products it creates. Better to start afresh each time. But there is a problem with this: this is a not in fact a reproducible build; it is a reproducible build process -- which is not at all the same thing. In my view a reproducible build is an activity that, when given a specific set of input artifacts (all at a given revision), will create a bit for bit identical output to the previous build. In practice this is hard to achieve for two reasons
  1. We usually have poor control or knowledge of the environment on which the build is performed. E.g. exactly which 3rd party libraries are used, are we certain that no old copies of artifacts are being used in the build.
  2. Things such as date and time differences can modify the built files in trivial ways.
Now insisting on exact bit for bit reproduction may seem a little harsh, but it does allow the use of checksum hashes to verify the integrity of our different environments. For some people this is absolutely critical as production environments are audited. For the rest of us it's also important, for example we can ensure that the correct runtime environment is being used in test to reproduce a production problem. A way to achieve this is to preserve the outputs from our builds in our version control system. Then instead of 'reproducing the build' we release our build artifacts as often as required. However their are two caveats to this: we need to tag the created output files showing
  1. Which source files (and versions) were used as input. This includes files such as build scripts
  2. Shows under which configuration the files were built (should relate back to the software tools, libraries etc. used). Note that this mean all system build environments should be subject to rigorous change control.
Now that we have this information we can also clearly document how our software was build and what file revisions were used to build it. This information is vital, even if the built outputs are not saved in our version control repository. A minor side affect is that we can now re-use built files in later builds, providing the tags are correct. There are still some systems where this potential saving is a significant benefit as complete builds can take days.

powered by performancing firefox