Software Layers

I had a talk with my apprentice today, on software application architecture. In our case, it had to do with an application that had to interface with a back-end database. I was reviewing her code when I realised that she was replicating a lot of code everywhere – a definite case for code factoring. Everyone likes to talk about factoring but I had to find an easy way to explain it to my apprentice.

I explained to her that it is a good practice to design software in three layers – primitives, middleware and application – and proceeded to describe the layers.

Primitives are low-level functions. This did not seem to make sense to her and so I elaborated that primitives should only do one thing and one thing only. It did not elicit any extra clarity from her and so I decided to put it in context. For our application, the database operations could be made into primitives – insert, update, delete and select. Once I said that, things became obvious to her. This layer would be tied very closely to the low-level architecture, the back-end database in our case.

Middleware serves as glue between application and primitives. This makes sense in context. Our application only supports one database back-end today but we may need to support more database back-ends in the future. This is where middleware comes in. The middleware provides a standard application programming interface (API) that can be used by the application regardless of database back-end. We may have multiple primitives used to access different database back-ends but we can have a single middleware layer that abstracts it all away from the application.

In the case of embedded software, this would be any architecture specific code that need to access the hardware registers and functionality directly. Similarly to databases, these would usually include code that set and get values. A typical example of this would be code that would set, clear and toggle bits in a register. The middleware would be any code that defines processes and functionality by wrapping around the bare-metal primitive code. This could include code that reset, initialised and activated certain functionality.

Applications should be pretty obvious. The application contains all the business processes and logic that is entirely dependent on the application used. Ideally, if everything was done properly, we would be able to use the same set of middleware and primitives across multiple different applications. This should be the idea that we strive for when architecting software applications. I hope that she will always remember this simple rule-of-thumb in her future code.

Seeded Hash

For the sake of security, I had to implement a seeded-hash system to secure passwords in a database. While straight-forward hashes are good for ensuring that passwords are not stores in clear text, they are still vulnerable to rainbow-table attacks. A seeded hash helps to reduce this risk.

However, the question occurred on how to actually do a seeded hash. I got my apprentice to look around and we finally found a useful scheme. The seed and password are usually concatenated before being hashed.

hash = Hash(seed + password);

However, if the seed was fixed, then it does not really help much because it would still be susceptible to rainbow attacks if the secret seed ever got out. So, we had to use a random seed. However, a random seed would generate all manners of rubbish unless we could somehow embed the seed in the hash.

Since this was part of a password storage scheme, it would be perfectly alright to embed the seed with the hash because the size of the hash result is fixed. So, any extra data stored with the hash result would be the seed. We could convert everything to Base64 to store it in clear-text on the server. This was the scheme that we used in the end:

seededhash = Base64(Hash(seed + password) + seed);

This way, to do a password match, the application would need to decode the Base64, separate the seed from the hash and then perform the hash operation on the supplied password with the seed to see if it matched the hash.

Caveat: this solution only works in the situation of password matching – where we already know exactly which record to match against and merely need to verify that the information provided is accurate. This would not be useful for indexing purposes, like that used in Git. In such a scenario, straight-forward hashing would still need to be done.

Social Reader

I came up with this idea a while ago, while thinking about the whole e-reader craze. Since it will not be going to fruition, I thought that I would just write a blog entry about it. Maybe someone might find more use for it that me. Afterall, I lack the wherewithal to work on this project on my own anyway.

The idea that I came up with was a social reader. Yes, most of you will argue that reading is a very personal activity and wonder why anyone would want to have a social reader. However, there is at least one scenario where reading becomes a more social activity – in a classroom setting. So, this applies more to reading text-books rather than say Tom Clancy or Patricia Cornwell.

So, I had three ideas on how to use e-book readers in a more social way and I will talk about them here. Since this is a technical blog, I will elaborate mention some of the more technical aspects. Most of the modes elaborated depend on the modes supported by the 802.11a/b/g/n wireless chipset.

In the first case, the reader works in broadcast mode. This mode is suitable for use in a classroom where we have a lecturer broadcasting information to a bunch of students. In such a scenario, the reader used by the lecturer could be set in master mode and the students’ readers set to connect to it in infrastructure mode. In such a situation, wireless bandwidth is effectively shared between all the readers but since only one device is doing most of the talking, it should be fine. The reader application can then be programmed to transmit notes and synchronise meta-information to the students. This could be easily accomplished using rsync, or other system, in the background.

In the second case, the readers work in peer-to-peer mode. This mode is suited to discussion groups and small group teaching. In such a scenario, all readers are set to ad-hoc mode. This will allow each device to talk to every other device in the group. The reader can then be programmed to push or pull annotation between devices. In the background, a distributed management system such as git, or any other system, could be used to easily share data in a structured and managed way. the ability to do a diff and patch to your notes and that of your friends, could prove invaluable in changing how group study works in the future.

The third mode is a local-reader mode. This mode is suited to reading in the local common room. In such a scenario, readers can connect to a local book store that holds books only accessible from that geographical location – the boundaries of which can be controlled via modulating the transmit power of the book store device. Readers can download books held at the store and even upload books to the store, allowing people to share books and to leave books behind for others to read.

Now for the bad news – battery power. All these modes require the use of wifi, which is pretty power hungry. However, this is where there is opportunity for innovation. The operating system software could be designed to handle power efficiently and to only activate the wireless when needed – such as during the beacon intervals. Additionally, the physical layer could be replaced with something low-power such as blue-tooth or zig-bee or even possible uwb when it makes sense to do so.

In order for such a social reader device to succeed, it would need to answer the problem of power. Readers are supposed to be able to last days if not weeks. However, all the wireless communication will kill it quickly, even if low-power wireless technologies are employed.

Licensing Open Source Software

Disclaimer: I am not a lawyer.

I have recently been approached by several people with regards to the licensing of open source software (OSS). Proprietary companies have no problems stealing open source software for their use but when it comes their turn to sell their solutions, they try to get away with hiding the OSS. This is not only wrong but is actually illegal.

I have been personally told to breach the GPL and I have pretty much told my boss – over my dead body. Just because it is an instruction from the top does not absolve me of the action. I will still be the one responsible for installing DRM onto our GPL derived product.

I have come to realise that most people have many misconceptions on OSS because they tend to lump all OSS together in one homogeneous fold. Unfortunately, there are over 60 different OSI approved open source licenses at the time of writing and they are as heterogeneous as can be. However, they all have exactly one thing in common and one thing only.

I need to make this one thing very clear – OSS licenses are all distribution licenses – they only come into effect when you decide to give away your product whether paid or otherwise.

On one extreme, you have the extremely permissive OSS licenses such as BSD and gang. What this license essentially means is that you are pretty much free to do whatever you want with it, including using it in commercial products without giving away your source code. However, you will need to mention in your product and documentation that you are using the BSD code and that the original authors are exempt from all liabilities.

This is one of the easiest OSS licenses to use in proprietary products. Just take the code, do what you need with it and then sell it. All you need to do is to mention that you have used that code and promise that you will never sue the original authors. That’s it!

On the other extreme, you have the extremely viral OSS licenses such as GPL and gang. This is the license that Steve Balmer calls a cancer because it is infectious. It is very clear in the language that any software that contains GPL code must be at least as free as the GPL and no distribution rights can be taken away from the customers.

This essentially means that if you use GPL code in your product, you cannot do anything that will prevent your customers from using that code in whatever way that they like. GPL3 has very specific language that nullifies patents and attacks digital rights management. In fact, you cannot even stop your customer from reselling your product ad verbatim if they want to.

One misconception that most people in Malaysia have is that OSS is a development license – meaning that if they do not make changes to the original code or develop it, there is no need to open source it. This is only true for LGPL and gang, which require all changes, modifications and additions to the original code to be given away as LGPL. However, it is untrue for either of the previous extremes.

The trouble with most proprietary businesses in Malaysia is that they do not comprehend OSS. OSS licences do not prohibit you from charging money for your products. In fact, you are encouraged to charge money for it so that you have clear proof of distribution, which means that the OSS license can be clearly enforced. You will find open source companies worth billions of dollars amongst the Fortune 500.

Professionally, I always recommend that my clients stay away from GPL code unless they know exactly what they want to do with their products. There is nothing wrong with GPL except that staying away from it gives us flexibility to decide on whether or not to distribute our source code later – once there is a clear business case for it either way.

I was recently told by a colleague that he was informed by legal that we should give away our GPL derived code but practically, we would not need to. That is the kind of legal-doublespeak that makes me hate lawyers.