Seeded Hash

For the sake of security, I had to implement a seeded-hash system to secure passwords in a database. While straight-forward hashes are good for ensuring that passwords are not stores in clear text, they are still vulnerable to rainbow-table attacks. A seeded hash helps to reduce this risk.

However, the question occurred on how to actually do a seeded hash. I got my apprentice to look around and we finally found a useful scheme. The seed and password are usually concatenated before being hashed.

hash = Hash(seed + password);

However, if the seed was fixed, then it does not really help much because it would still be susceptible to rainbow attacks if the secret seed ever got out. So, we had to use a random seed. However, a random seed would generate all manners of rubbish unless we could somehow embed the seed in the hash.

Since this was part of a password storage scheme, it would be perfectly alright to embed the seed with the hash because the size of the hash result is fixed. So, any extra data stored with the hash result would be the seed. We could convert everything to Base64 to store it in clear-text on the server. This was the scheme that we used in the end:

seededhash = Base64(Hash(seed + password) + seed);

This way, to do a password match, the application would need to decode the Base64, separate the seed from the hash and then perform the hash operation on the supplied password with the seed to see if it matched the hash.

Caveat: this solution only works in the situation of password matching – where we already know exactly which record to match against and merely need to verify that the information provided is accurate. This would not be useful for indexing purposes, like that used in Git. In such a scenario, straight-forward hashing would still need to be done.

Trust and Verify

I had a chat with my boss the other day, about how good software should be developed. He mentioned that we should use unit-testing and I told him that it was rubbish. Then, he mentioned TDD where tests are written before the actual application and I told him that he risked lying to himself about the quality of the code developed.

Okay, while this is not an entry bashing software testing, I will highlight why testing fails. As one prominent computer security researcher points out:

One researcher with three computers shouldn’t be able to do beat the efforts of entire teams, Miller argued. “It doesn’t mean that they don’t do [fuzzing], but that they don’t do it very well.”

This is because we end up relying on tools to do the testing. Not that there are anything wrong with tools, but the tools are only as good as their operators and the problem lies with the tool operator. When tools are used, the operators generally are either less astute or become lazy. They start to rely on the tool to catch problems and if the tool fails to catch any, they rely on the tool to sign-off on the quality of the code. Tools are inanimate objects and should not be liable for the stuff that they test. The tool operator should always dig in manually to verify the results. Through my 20+ years of active programming, I have been caught by tools so many times and I have developed a distinct distrust towards any tool. I even verify my compiler outputs by disassembling the binaries and checking them by hand once, before running them through a simulator to observe the operations.

objdump -dSC

I argued with my boss that there is only one way to develop good code, that is to just frakin’ write good code, which is not as hard as some people imagine it to be. Writing good code is like writing good prose, there are certain styles that can be followed that will result in better code. Unfortunately, good writing habits are hard to develop and need to be hammered into our apprentices from the very get go. Our universities are failing in this by not driving this point home and trying to teach our programmers to be ever more lazy and rely on automated tools to generate code instead. Okay, code generation is a whole different can of worms and I won’t bother to go into.

I have seen several senior engineers do, what I call, haphazard coding that results in code that sort-of works but without anyone knowing why. A slight change in any part of the code, even in the compiler optimisation settings, can bork out the code. This is usually the result of cut-and-paste coding styles and ineffectual debugging skills. I had to teach my CODE8 apprentice that in debugging, you always follow the flow of the data. There was one instance where I asked her to show me the result that is obtained from the LDAP query and she modified her code to output the data onto the web-application. Unfortunately, this involved obtaining the result from the LDAP query, extracting the relevant fields, reformatting the text before sending it out to the web-app. I then explained to her that this was not the output of the LDAP query. This was the output of the query after several processing steps. Garbage-in-garbage-out. If we do not know that the LDAP query output is accurate, how can we be sure that the reformatting and extraction routines are not frakin’ things up.

In the realm of hardware design, good coding practices are generally easier to enforce because if your code sucks, the hardware tools are not smart enough to do what you want and will usually just bork out. However, if the code is badly written, but just good enough to be understood by the synthesisers the tools will just end up producing very bad hardware design, which will suck up power and slow down performance. As a result, hardware coders are told to only write code in one specific way, in order to produce the hardware that we desire. We get these coding practices drummed into us in a class usually called – design for synthesis.

As for me, I tend to step through code line-by-line in order to see if it is doing what it is supposed to do. Unfortunately, with a language like Java, that can be very difficult to do because of the added complexity of the virtual machine layer. You can compile Java code into bytecode and inspect the bytecode but you won’t know if the JVM will actually run your bytecode the way it is meant to be run because you will just have to trust it to do so.

I subscribe to the principle – “trust and verify”.

Social Reader

I came up with this idea a while ago, while thinking about the whole e-reader craze. Since it will not be going to fruition, I thought that I would just write a blog entry about it. Maybe someone might find more use for it that me. Afterall, I lack the wherewithal to work on this project on my own anyway.

The idea that I came up with was a social reader. Yes, most of you will argue that reading is a very personal activity and wonder why anyone would want to have a social reader. However, there is at least one scenario where reading becomes a more social activity – in a classroom setting. So, this applies more to reading text-books rather than say Tom Clancy or Patricia Cornwell.

So, I had three ideas on how to use e-book readers in a more social way and I will talk about them here. Since this is a technical blog, I will elaborate mention some of the more technical aspects. Most of the modes elaborated depend on the modes supported by the 802.11a/b/g/n wireless chipset.

In the first case, the reader works in broadcast mode. This mode is suitable for use in a classroom where we have a lecturer broadcasting information to a bunch of students. In such a scenario, the reader used by the lecturer could be set in master mode and the students’ readers set to connect to it in infrastructure mode. In such a situation, wireless bandwidth is effectively shared between all the readers but since only one device is doing most of the talking, it should be fine. The reader application can then be programmed to transmit notes and synchronise meta-information to the students. This could be easily accomplished using rsync, or other system, in the background.

In the second case, the readers work in peer-to-peer mode. This mode is suited to discussion groups and small group teaching. In such a scenario, all readers are set to ad-hoc mode. This will allow each device to talk to every other device in the group. The reader can then be programmed to push or pull annotation between devices. In the background, a distributed management system such as git, or any other system, could be used to easily share data in a structured and managed way. the ability to do a diff and patch to your notes and that of your friends, could prove invaluable in changing how group study works in the future.

The third mode is a local-reader mode. This mode is suited to reading in the local common room. In such a scenario, readers can connect to a local book store that holds books only accessible from that geographical location – the boundaries of which can be controlled via modulating the transmit power of the book store device. Readers can download books held at the store and even upload books to the store, allowing people to share books and to leave books behind for others to read.

Now for the bad news – battery power. All these modes require the use of wifi, which is pretty power hungry. However, this is where there is opportunity for innovation. The operating system software could be designed to handle power efficiently and to only activate the wireless when needed – such as during the beacon intervals. Additionally, the physical layer could be replaced with something low-power such as blue-tooth or zig-bee or even possible uwb when it makes sense to do so.

In order for such a social reader device to succeed, it would need to answer the problem of power. Readers are supposed to be able to last days if not weeks. However, all the wireless communication will kill it quickly, even if low-power wireless technologies are employed.

Polipo DNS Issues

I have been using polipo as my proxy server and recently, it has been developing some weird problems. For one thing, it sometimes refuses to connect to websites without any errors on the client side except for time-outs. I know that the sites are up because I can connect to them if I bypass polipo.

The reason that I use polipo is because of resource requirements – much less than say, squid. As an upstream web proxy, it works really well. I have gotten better results out of it than squid but I simply put that down to my lack of squid config-fu. Polipo is much easier to configure as there are less options to play. However, it is also a plain vanilla proxy and does not try to become the proxy for everything as squid does.

After some investigation, I got a clue from the polipo logs – that it was timing out as well with the following message:

Host ftp.osuosl.org lookup failed: Timeout (131072).

This confused me because it was obvious that the website was up and working. So, I dug around and found out that polipo uses its own DNS resolver by default and not the system resolver. The reason that it does this is in order to obtain the TTL information on the domain directly. However, the information still comes from the same DNS server either way.

Turns out that it was a networking problem. By default, polipo would return the AAAA record instead of the A one if both are present. It is designed to prefer IPv6 over IPv4. My home network is IPv6. This can be controlled with either of the following options:

dnsQueryIPv6 = no
dnsQueryIPv6 = reluctantly

Since I have not yet enabled IPv6 support through my gateway, I decided to just disable IPv6 for polipo entirely. I might turn it back on once I get IPv6 up at home. I really need to get down to activating my Hurricane gateway tunnel and writing a guide for it.

Buffalo Ships DD-WRT

Looks like Buffalo is doing the right thing – shipping dd-wrt with its routers. While I have been a fairly loyal Buffalo customer and own several of their routers, honestly, I have been running all of them on dd-wrt. The only reason why I have been buying Buffalo is because they are the cheaper option to running dd-wrt than say, Linksys. So, by shipping dd-wrt on its routers, it will make life easier for me in the process.

DD-WRT is an Open Source project that provides firmware for running wireless routers. These firmware are based off a Linux kernel, which gives them a lot of power and functionality. For example, it can support VLANs, advanced firewall and routing capabilities and much more, on commodity hardware. Most of its features are available on premium commercial hardware.

I will not buy any router unless it supports dd-wrt. In Malaysia, this essentially limits my choices to Linksys and Buffalo. Buffalo wins on price because the features will be the same as long as they both run the same firmware. But why this obsession with running Linux dd-wrt on routers? It is all a question of control. I like having control over my hardware – including my routers.

For example, I configured my router to dynamically route all web traffic from my network via a transparent proxy server. This benefits all the machines running in my home network as I would not need to configure each machine individually. Everything is automagically cached in my proxy server. So, when I watch a YouTube clip, I can keep watching it again and again without having to reload it from the Internet. I also like the built-in PPPoE and dynamic DNS clients. I use it to maintain the connection to this very blog. Without it, you would not be able to find my blog at all.

If I did not have dd-wrt running on my router, trying to get these feature would be a chore, if it was at all possible.

The question then is why did Buffalo not do it from the start. I think that they have finally realised the advantages of using dd-wrt instead of building and maintaining their own firmware. You need a lot of resources to do that. It would be far easier for them to just customise a dd-wrt firmware with their logos and stuff while leaving all the hard work to the people who are actually passionate enough to work on it. In addition, they may want to consider hiring some permanent staff to contribute to the dd-wrt stack.

All in all, open source actually works better as long as people can change their mind-set on keeping everything to themselves.

The Plan

Alright, I have just created a bunch of repositories on Gitorious for the purpose of managing my multiple projects. My long-term plan is to create an entire stack – from the hardware system all the way to the operating system and application software. The ultimate goal is to be able to run Android on an entirely open stack.

At the moment, the AEMB is the world’s smallest and fastest multi-threaded 32-bit RISC embedded processor. I plan to make some changes to it – integrating some of the ideas that I have had previously such as:

  • Threads – increasing the thread count to four in hardware and to unblock the threads so that they are not interlocked.
  • Compiler – make an LLVM compiler back-end to divorce ourselves from the existing GCC compilers in order to integrate atomic operations.
  • Startup – integrate an in-cache execution environment during pre-boot stage to take off some of the hardware load.
  • Kernel – write a small nano-kernel that abstracts away much of the hardware stuff in order to allow higher level code to integrate better with it.

There is a lot of work involved and I am open to participation – particularly on the software side. If anyone is interested, that is.