Let me tell you a story.
The first day, Juliet, the new developer, starts from a clean slate. Because she is a test-first believer, she first writes a test related to the task. She runs it. It’s obviously red. Juliet then implements the necessary code. She then runs the test again and it’s green. She is happy and commits. The continuous integration server kicks in, runs the one existing test and reports green.The second day, Juliet starts her day looking at the continuous integration report. It’s all green. Her shoulders relax, and she goes on writing a new test. She runs it and as expected it is red. She implements the necessary code, and runs the current test. It is green. She smiles and commits. Overnight, the continuous integration server runs the tests, and they are both green.
The third day, Juliet sips from the cup of coffee while glimpsing through the test report. Green. Everything is as expected. The bitter taste of the coffee gives her a nudge, and she writes a new test. She runs it instinctively knowing it will be red. It is red. She then starts on the path of implementing the code. She finishes. She runs the test. It is green. A feeling of reward creeps up on her and she cannot repress a smile. She commits.
For an untrained eye, Juliet writing tests is a waste of time. Of course, given that she cares about her job, the test she wrote yesterday still hold today. After all, being smart and caring about her job she does take everything into account. However ...
The fourth day, Juliet arrives early. She turns on the monitor, and as she pulls the keyboard towards her a smile starts to make its way on her lips. She spawns the report. It’s red. The test from yesterday is green as expected, but the one from the first day is the troublesome one. Interestingly enough, Juliet’s smile does not go away. She just triggers the failing test, and starting working her way through the error report.
Why did the test fail? Was it because Juliet was careless? Probably not. She is smart and she cares about her job. Then what?
The problem is simply posed by the amount of details a software system contains. Even a medium-size system can contain literally millions of details that interact with each other in multiple ways. However, our brains are designed for handling at most some dozen concepts at a time. We deal with such situations by ignoring details, or by reducing the scope of the problem. The result is that we might miss relevant information.
Building an automated test suite helps us keep track of what is important by delegating to the machine the tedious checking. This approach is the only way to deal with the huge amount of information we need to manage.
Over the past decade we learnt that having developers invest in regression testing is advisable and profitable. Even in the short-term the costs spent on building the test suite is balanced by the costs saved with checking. As a rule of thumb, as soon as you have to deal with information that contains more than a dozen elements, you are probably better off automatizing it.
But what exactly is testing and why can it not be fully automatic? Testing is checking assumptions regarding the functionality of a system. While running the tests is automatic, the process still requires a human to write the assertions, especially when intending to capture the high level meaning of the system.
Testing is a semi-automatic approach because every system is special and thus it requires specific assertions to capture its specificity and to provide a meaningful feedback. The automatic part is critical for obtaining continuous feedback by checking all the time.
Functionality is important and thus it should be treated appropriately. It is for this reason that development processes hold testing as an explicit activity and allocate it an increasingly prominent position in the overall effort allocation. Especially agile processes, being so focused on the importance of feedback, advocate for the integration of testing deep in the development process.
Testing is essential, but it is not at all the only aspect of development that can highly influence the proper evolution of systems. The architecture of systems can be equally important. The interplay of various technologies can be decisive as well. Not to mention security or performance. These are but some of the aspects that must be captured explicitly and be allocated appropriate resources during the development process.
Given all these, why is it that we are still, if at all, only checking the functionality of the system? It is time to go a step further and build a larger discipline of continuous assessment.
Comments
I agree with you. Using Hudson is probably the best we could have done to improve our software since usin unit testing. However, I believe Hudson is just a little step of a long run.
Imagine this: Why not checking the coverage of the unit tests as often as we run them? Why not monitoring the system performance at every single commit? Why not assessing the coverage of the tutorial as often as running the unit tests? Checking the documentation, website, method comments,... A Hudson like continuous server could perfectly do the job. We need to think again what continuous integration means.