Weblog productivity

As the end of the month approaches, I often look at how many posts I have put on this website, just to make sure I’m not neglecting it and that I have actually thought and written about something constructive. Interestingly enough, this month hasn’t had very many actual posts, but the posts have had real content in them, rather than the normal drivel that flows from my keyboard. I still have a few more posts about Revision Control brewing, and I am planning on writing a bit about my adventures in Cold Fusion.

Is there any topics that you would like me to cover, or are you happy with my sporadic topics and regularity.

What’s wrong with Revision Control?

It seems like I’m suggesting that Revision Control is the curer of cancer, the saviour of the universe and the answer to life the universe and everything. Well, I wish that it was, cause it would be so easy if it was, but unfortunately it does have it’s drawbacks. Some of these have been addressed in more recent systems, while others are inherently part of the system. I think it is important to note, that I don’t believe any of these reasons are a reason not to use Revision Control but things to take into consideration.

Workflow

As I have mentioned early, there are a plethora of tools that integrate revision control systems into most developer’s workflow. This may be a plugin for their IDE, a program that integrates with the shell of the Operating System, or a command line program that is integrated into the build process. However, while these do smooth the path for versioning bliss they are not the golden bullet. Developer training and smoothlining the process is an important part, as changing the well ingrained habits of anyone is extremely difficult. Usually this only involves teaching them how to check-in and make comments and as they see the other benefits further learning will follow. Software developers, whether they know it or not, love the ability to branch and fork their code, re-use components from other projects and basically reduse the amount of work they actually have to do.

Centralisation

This particular problem, is being targeted at a rapid rate of knots at present due to the distributed nature of Open Source development. The essence of the problem is that often the repository is hosted and controlled in a central position, usually the head office or the main webserver. This causes an issue when someone wants to work “offline”, performing updates and checking in is not possible without a connection. This is relieved to some extent by providing good information of any conflicts when the code is finally checked in, however there can be a small amount of extra work at the time. If the project doesn’t have a great deal of people working on it, this will probably not be a problem.

Databases

Most development in this day and age is associated with a database, and Revision Control doesn’t really provide any solutions to maintain the versions of a database, some solutions I have seen involve maintaining a build script of the database somewhere in the source tree and maintaining a versioned copy of this within the repository. This really is not an ideal solution, but will give an idea of what has changed over time.

Backups

I think it is important to note that while Revision Control does provide some sort of code backups, it does not remove the need to keep backups of the actual repository. It may seem obvious, but the actual server the repository is living on is just as susceptible to dataloss as any other server.

Why should you use Revision Control?

Well now that you all know all about what revision control actually is, I’m going to discuss why it should be used. What is the benefit to using a specialised piece of software that I have to learn how to use rather than just zipping up my files before I make the risky changes. There are number of benefits, each which will have an effect on how you perform your everyday operations. Now it is important to note, that this will have an increase in everyday work, this won’t reduce your productivity as the software will normally integrate with your current development environment.

Backups

First and foremost is the benefit of having a working set of backups, like any backup strategy this relies on it being implemented (more on this later), however the major purpose of revision control software is it’s capability to store a set of working backups that are created not regularly, but at the developers whims. This means that hopefully, each revision in the repository should be a working copy, or at least was intentionally stored.

Context

Equally as important as the existence of the backups is knowing where they came from, who made those changes, and probably what crack they (or you) were smoking when they committed those changes. This comes from an important feature of being able to leave a comment as you upload changes, what is left in this comment would normally be left as a policy decision to the team. Normally, the reason for the change (ie. issue number) and any important information that won’t be blatantly evident to the next person.

Along similar lines, being able to see the difference between 2 versions of your project provides an extremely useful information and context which helps to answer other regularly asked questions, “When was this broken?” and “How was it broken?”. Most version control systems will provide you with a diff file, at the very least and most development environments which support a RCS will give give you a pretty view of this. This means that for any moment in the past, you can find out what a particular file looked like, and what it looked like in comparison to what it is now.

Team work

Originally, CVS was designed so that a team could work more effectively on a project, and this is possibly it’s greatest benefit now. Revision control systems introduce a concept called a working copy (more on this later) which is essentially a copy of the whole project. This copy is not the official copy but for all intents and purposes it is the project. This working copy usually stores (in hidden folders) information on it’s relation to the official copy. Each developer has their own working copy, so doesn’t have to worry about saving over another developers changes as the copies are completely independent of each other. Changes are then synchronised back to the central repository and any conflicts are brought to the attention of whoever causes the conflict.

This might all sound like crazy voodoo however in practice it is really quite simple with the correct tools. Most development environments either support or have plug-ins to add support for RCS. For example, in my environment of choice, Eclipse, I can check in with a single right-click, right a comment and click Commit, similarly with checking out a working version or comparing to an existing version.

Revision Control

I’ve decided to write a series of articles discussing version control. Answering questions such as, “Why should you use it?”, “How do you use it?” and “Who should use it?” I have very little doubt that this may be just repeating what has been written elsewhere by much better writers than myself, however I’m sure that the process of writing it will be educational for me and worthwhile for anybody who reads it.

What is Version Control?

As always, the best place to initially find out what something is, is Wikipedia. I thought about quoting something from there, but if I did, someone would probably change it and then I wouldn’t actually be quoting it. Anyway, revision control is any way of keeping track of the changes made to anything. Think about the “track changes” in Microsoft Word, or zipping up the code you are working on and calling it mycode_01032007.zip, these are both rudimentary forms of revision control. Most people have done something like this, probably thinking of it as backing up, but realistically you may want to be able to go back and have a look at the changes you made a week ago, and don’t necessarily want to unzip 3 different files to check which one the changes were made in.

Now, software developers are the laziest people, that’s why they started developing software in the first place, cause they couldn’t be bothered going through an entire file and pasting the same text in, or because they wanted to be able to go to a website to check there email cause they didn’t want to waste 25 seconds of their life opening up Outlook. I could name a million conceited examples of why software developers are lazy, but I suppose what I’m trying to say is, I’m a lazy bastard, and I’m a software developer. I’m getting off the track, but it won’t come as a surprise that some smart cookie, lazy, software developer wrote some software a long time ago called RCS (Revision Control System). Then in 1984 a guy by the name of Dick Grune who doesn’t seem to be a lazy guy at all used RCS to develop CVS so he could work better with his students on a project.

CVS takes a folder of your files and stores it in a central location, then whenever you want to make changes, you commit those changes to the repository. CVS then stores a history of what you had, and what you’ve got, so at any time you can ask it what you had, this seems like a pretty good idea and it is. CVS is an amazingly successful piece of software, with some of the biggest open source software projects still using it (Mozilla and Apache to name 2). 23 years is a long time for a piece of software not to have been superseded. Now to say it hasn’t been superseded is a bit of a lie, there are a number of different programs that do what it does, SVN, Gnu Arch and Visual SourceSafe are a few of these, but essentially they don’t do that much more than CVS did 23 years ago, they just do it differently/better/worse or with more integration into certain products.