Technical Intercourse

Solution, looking for Problem?

Sometimes, I wonder if I am driven to solve problems or that I get caught up with the beauty of the solution instead. There are some technologies that I have built that are definitely useful – for something. However, I may be staring too closely at the solution to actually see the problem space. Therefore, I have decided not to work on the solution for a while but to look outside for inspiration instead. I already know what I want to do. Now, I just need to know where to apply it.

For the next couple of months, I will embark on a serious journey of creation. I plan to document the entire process on this blog and hope that it may one day be useful to somebody. For now, I will go for a short run followed by dinner and some shopping.

FreeNAS with D410PT

I bought myself a D410PT and 1GB of DDR2-800 memory yesterday for only RM290! I decided to use it to build a better NAS for my home. However, there were some issues and I thought that I should note them down here.

Power Supply: My first problem was the ATX power supply. For this purpose, I decided to use an old 120W power supply that I had lying around from an old VIA machine. Turns out that it only had 20 pins while the D410PT had a new 24 pin power socket. However, with some research I found out that the extra 4 pins were not strictly necessary as they were there for extra current. Seeing that this is a low-power design, the extra current was not quite necessary. According to the D410PT user manual, it only takes up 40W under full load with a bunch of devices connected. So, safe.
Network Chip: Another problem happened when I started up FreeNAS. It could not load up the built-in network. Turns out that the chipset used on the D410PT was not recognised by FreeNAS 0.7.1 (latest stable). However, after using the latest nightly build (5266), the link activated. So, I was able to get the network working. Unfortunately, the D410PT only has a 100Mbps chip but that is good enough for my home NAS.
Old Config: At first, I tried importing the old config backup from my previous 0.6.9 server but that caused more problems than it solved. It made it difficult to make configuration changes as the 0.7.2 server would complain about config errors. I guess that something must have changed in between. So, I ended up resetting the configuration and doing all of it manually. The trickiest part was to make sure that the old drives that I moved over from the old machine did not get reformatted or deleted by accident. Otherwise, it worked fine.

AEMB Benchmarked!

The AEMB has actually been benchmarked! I have claimed that the AEMB is the world’s fastest and smallest 32-bit multi-threaded RISC processor. Chapter 2 of this thesis put it in terms of real numbers!

Extracting the pertinent section of the results:
Name/Flipflops/LUTs/MHz AEMB/711/926/279 LEON3/1133/3448/183 OpenFire/207/752/198 OpenRisc/1577/3802/185 Plasma/1297/2457/73

The AEMB has a MHz/LUT result of (0.3 MHz/LUT), which is way ahead of the rest. The thesis goes on to do a software Dhrystone and Fibonacci benchmark and found the performance to be good too.

Unfortunately, there were many issues faced during the implementation of the AEMB the resulted in the author dropping the use of the AEMB in favour of making a custom processor.

The main issue was that the author had difficulties targeting an ASIC platform because the AEMB was designed for the FPGA and optimised for an FPGA (which shows in the results). There were many design trade-offs that were made to make it very small and fast on an FPGA platform. Unfortunately, this is a show stopper as trying to port it to an ASIC technology would essentially require a redesign of the entire architecture.

There were other issues as well, including poor documentation and bad sample software. While I will agree with the part on poor documentation, I think that the author probably mistook the old AEMB sample software for the new AEMB sample software. This is something that can be avoided with better documentation to make it clear. So, ditto – it’s bad documentation. This needs to be fixed for the next generation.

Regardless, I’m happy that we can now put some numbers to my claims!

Google Webfonts

In the course of working on a Web 2.0 like system, I was asked if it was possible to embed fonts into a webpage. From my knowledge, it was not possible to embed fonts. However, I was told that it was possible with newer technology that allows one to embed fonts in a specially formatted CSS file.

After looking around, I think that the Google Webfonts are probably a very good way of integrating it. They provide more than a dozen of custom fonts to choose from and include an API to be inserted into the code.

<link href='http://fonts.googleapis.com/css?family=Cantarell' rel='stylesheet' type='text/css'>
The only catch is that the CSS style-sheet needs to be included as the first file.

OpenSC and MyKAD

I recently bought myself a smart-card reader in order to fool around with the MyKAD (Malaysian National Identity Card). The main reason for this is to figure out how to read the information contained within the MyKAD. I went online to find myself an OpenSC compatible reader and got myself one for just under RM 23. Next, I found some information about MyKAD APDU on a Lowyat forum along with the associated data file offsets.

The result is the following shell script that dumps all the MyKAD information onto the screen. OpenSC automatically detects the MyKAD as an EMV compatible card.
#!/bin/sh AP_JPN="00:A4:04:00:0A:A0:00:00:00:74:4A:50:4E:00:10"


DF_PAGE1="C8:32:00:00:05:08:00:00:A0:00"

PR_PAGE1="CC:00:00:00:08:01:00:01:00:E9:00:A0:00"

SZ_PAGE1="CC:06:00:00:A0"
DF_PAGE4="C8:32:00:00:05:08:00:00:90:00"

PR_PAGE4="CC:00:00:00:08:04:00:01:00:03:00:90:00"

SZ_PAGE4="CC:06:00:00:90"

opensc-tool -v -s $AP_JPN \ -s $DF_PAGE1 -s $PR_PAGE1 -s $SZ_PAGE1 \ -s $DF_PAGE4 -s $PR_PAGE4 -s $SZ_PAGE4 2>/dev/null

All it took was 8 lines of shell script and it would dump all the MyKAD information onto the screen, raw. All that is needed now is some extra processing magic to slice up the data into its constituent parts. Imagine, all this without a single like of C/C++ code and under RM25!

Physical Programming 103: Floating Point Unit

I had someone tell me that they do not need any specialised hardware for their application and all they needed was just simple hardware that was stable and easy to use. The main reason is because their value-add would be in the software algorithms and interface. I had a quick check on the algorithms and they are largely formulae with real numbers.

I almost fell off my chair.

Regardless of how great an uber algorithm is, it will finally get executed on real-world hardware. If the hardware was not designed to support the uber algorithm, it is going to choke. Take the example of an floating-point unit (FPU). In most desktops, we take floating point calculations for granted but this was not always the case in the past and is not always the case for embedded products today.

If a processor does not have an FPU and the uber algorithm works on single-precision floats, performance is going to get killed because of software emulation for the floating-point operations. You can basically forget about using double-precision. It would produce the same results as one that ran on an FPU but it would suffer from severe performance issues.

Another alternative is to use a fixed-point algorithm instead. This allows the algorithm to be implemented entirely without exploiting any floating-point numbers. However, this would require a redesign of the algorithm from the ground up to support fixed-point operations. This is not the kind of decision that one wants to make towards the end of a development cycle.

If on the other hand, a system comes with a whole host of FPU units, these can be exploited to perform multiple floating-point operations at a time. If the software was not written to cater to this (such as exploiting specific extensions that are available like SSE) then it would be a total waste of processing power and we can save cost by buying a less featured processor instead.

Therefore, the design of an algorithm has to always follow what the hardware allows. It cannot be done independently of the hardware, unless performance is not a concern – as highlighted in an earlier quote:

Quicksort. Divide and conquer. Search trees. These and other algorithms form the basis for a classic undergraduate algorithms class, where the big ideas of algorithm design are laid bare for all to see, and the performance model is one instruction, one time unit. “One instruction, one time unit? How quaint!” proclaim the cache-oblivious algorithm researchers and real world engineers. They know that the traditional curriculum, while not wrong, is quite misleading. It’s simply not enough to look at some theoretical computing machine: the next-generation of high performance algorithms need to be in tune with the hardware they run on. They couldn’t be more right.

Update@2010-08-18: Anyone who works with floating-points should read this paper.