Software Engineering

A colleague of mine sent me an article by Tom DeMarco, one of the pioneers of structured analysis and also a strong believer in software engineering processes. Previously advocating metrics, metrics and more metrics in the past, he has come to realise that software engineering is rather a misnomer. In one sense it is engineering but in another sense, it is not. After decades of invaluable experience, he has come to one conclusion:

For the past 40 years, for example, we’ve tortured ourselves over our inability to finish a software project on time and on budget. But as I hinted earlier, this never should have been the supreme goal. The more important goal is transformation, creating software that changes the world or that transforms a company or how it does business.

Since I have also been writing code for a couple of decades since my humble beginnings with LOGO and BASIC, I have to say that I am of the opinion that it is very difficult to control the process of software creation – and it is a creative process, no doubt about it. However, as an engineer, I do believe that metrics is useful but only after the fact. What I mean to say is that metrics are only useful in documenting failures.

It is silly to try to control software creation during its inception and conception. You just need to hire the best people, give them the best tools and then hope for the best. Project managers who try to micro-manage the project will invariably fail because of the nature of the metrics used – they try to attribute success to certain values. Unfortunately, the success of a software rarely depends on the number of lines of code maintained, nor does it depend on the number of faults found.

After a project has come to a finish – and when I say finished, I mean that the people involved have come to a unanimous decision that they are happy with the state of the product at the time – that is when software metrics can be used to measure certain things. For example, it could be useful to measure individual contributions to the code base and identify good managers. It could also possibly measure the number of significant changes made to base code.

Anyway, I think that the video below is as good a metric as any for measuring software quality. I like the fact that the contributors seem to come in waves. I wonder if it correlates with real-world events in any way.

Trust and Verify

I had a chat with my boss the other day, about how good software should be developed. He mentioned that we should use unit-testing and I told him that it was rubbish. Then, he mentioned TDD where tests are written before the actual application and I told him that he risked lying to himself about the quality of the code developed.

Okay, while this is not an entry bashing software testing, I will highlight why testing fails. As one prominent computer security researcher points out:

One researcher with three computers shouldn’t be able to do beat the efforts of entire teams, Miller argued. “It doesn’t mean that they don’t do [fuzzing], but that they don’t do it very well.”

This is because we end up relying on tools to do the testing. Not that there are anything wrong with tools, but the tools are only as good as their operators and the problem lies with the tool operator. When tools are used, the operators generally are either less astute or become lazy. They start to rely on the tool to catch problems and if the tool fails to catch any, they rely on the tool to sign-off on the quality of the code. Tools are inanimate objects and should not be liable for the stuff that they test. The tool operator should always dig in manually to verify the results. Through my 20+ years of active programming, I have been caught by tools so many times and I have developed a distinct distrust towards any tool. I even verify my compiler outputs by disassembling the binaries and checking them by hand once, before running them through a simulator to observe the operations.

objdump -dSC

I argued with my boss that there is only one way to develop good code, that is to just frakin’ write good code, which is not as hard as some people imagine it to be. Writing good code is like writing good prose, there are certain styles that can be followed that will result in better code. Unfortunately, good writing habits are hard to develop and need to be hammered into our apprentices from the very get go. Our universities are failing in this by not driving this point home and trying to teach our programmers to be ever more lazy and rely on automated tools to generate code instead. Okay, code generation is a whole different can of worms and I won’t bother to go into.

I have seen several senior engineers do, what I call, haphazard coding that results in code that sort-of works but without anyone knowing why. A slight change in any part of the code, even in the compiler optimisation settings, can bork out the code. This is usually the result of cut-and-paste coding styles and ineffectual debugging skills. I had to teach my CODE8 apprentice that in debugging, you always follow the flow of the data. There was one instance where I asked her to show me the result that is obtained from the LDAP query and she modified her code to output the data onto the web-application. Unfortunately, this involved obtaining the result from the LDAP query, extracting the relevant fields, reformatting the text before sending it out to the web-app. I then explained to her that this was not the output of the LDAP query. This was the output of the query after several processing steps. Garbage-in-garbage-out. If we do not know that the LDAP query output is accurate, how can we be sure that the reformatting and extraction routines are not frakin’ things up.

In the realm of hardware design, good coding practices are generally easier to enforce because if your code sucks, the hardware tools are not smart enough to do what you want and will usually just bork out. However, if the code is badly written, but just good enough to be understood by the synthesisers the tools will just end up producing very bad hardware design, which will suck up power and slow down performance. As a result, hardware coders are told to only write code in one specific way, in order to produce the hardware that we desire. We get these coding practices drummed into us in a class usually called – design for synthesis.

As for me, I tend to step through code line-by-line in order to see if it is doing what it is supposed to do. Unfortunately, with a language like Java, that can be very difficult to do because of the added complexity of the virtual machine layer. You can compile Java code into bytecode and inspect the bytecode but you won’t know if the JVM will actually run your bytecode the way it is meant to be run because you will just have to trust it to do so.

I subscribe to the principle – “trust and verify”.

Understanding your Compiler

I just came across this set of slides on source code optimisation and I have to agree with the lesson to take away – that is to understand how your compiler works. I do this very often as I have to work with really low-level code at the microprocessor level. So, I need to understand how the compiler works in order to generate the correct sequence of codes.

I thought that I should share this piece of gem here.

The key thing to remember is that the compiler will always generate code depending on what it thinks you are trying to do. While early optimisation is the root of all evil, coders need to help the compiler along by writing code in such a way as to make it easy for the compiler to figure out what it is that we want it to do. If we left everything to be decided by the compiler, we would ultimately end up with code that works, but not optimally.

This is a particularly big problem in the embedded world.

The only way to learn how the compiler works is to decompile and disassemble code – not by reading the above slides only. I primarily work with GCC compilers and I always compile it with the -g flag turned on to generate debugging symbols. This makes it much easier for objdump -dSC to give me human read-able assembly output. Do this for different code paths and we will soon begin to understand how the compiler interprets our code.

Then, the only issue that matters is the purpose for optimisation – size, speed or power.