End of File

I’ve found that many novice programmers are confused when it comes to reading files. I had to educate an apprentice today on this subject and I thought that I would put down a short summary of my lesson today.

To most people, there are many different types of files out there – videos, images, music, text and what nots.

All these do not mean anything to programmers. A programmer is only concerned with two types of files – text and binary. A text file is delimited by a carriage-return on Unices and a carriage-return line-feed on Windows. A binary file has no delimiter. Therefore, a text file is read in line by line while a binary file is read in blob by blob. In addition, it is possible to terminate a text file with an end-of-file marker (^Z) but that would not be possible in a binary file.

So, when trying to read a text file, it is possible to read in line by line until an EOF is encountered. However, there are a few different ways to read a binary file.

It is possible to keep reading the file in blobs until the number of bytes actually read is less than the size of a blob. This indicates that the end of file has been reached. This is the preferred method of reading binary files. It is also possible to detect the size of a file and keep track of it with a counter until it is decremented to zero. This requires careful management of the size counter and limits the size of the file to the maximum value for an integer.

Exascale Communications

I got the opportunity to attend a short training and workshop on computing on a massive scale conducted by SGI. The speaker was a highly technical guy, Michael A. Raymond, who delivered a very good presentation. Personally, I learned some things and got some of my queries answered.

He mentioned that the main issue in large scale computation today is communication and not computation. While I can understand his logic in saying that, I am hazarded by the thought that these two things tend to take turns being the problem. Once communication problems are slowly chipped away, the computation problem will recur.

Actually, I would say that the computation problem is already recurring – with the introduction of heterogeneous computation into the mix. These days, lots of computation are off-loaded onto computation engines, which may be made up of custom microprocessors, FPGAs or even DSPs and GPUs. Not only that, the number of cores available may not be a nice friendly number – like 6 cores in newer Intel/AMD chips and 7 cores in a IBM Cell processor.

These little things do tend to throw a wrench in a works a bit.

I wanted to attend this talk in order to learn a bit more about multi processing because my next generation of AEMB will definitely feature more and more parallelism features and I wanted to get an overview of how things are like in this area. However, much of the talk focused on HPC applications, which may not be what I am interested in. That said, I still learned some interesting things about how they solved certain problems.

I intend to introduce multi-process, in addition to multi-threading already on the AEMB. It is the next logical step to improve the processor before moving it up to multi-core. I will probably need to build in some sort of fast inter-core interconnect to enable multi-core processing on multi-threaded and multi-processed AEMBs. My littlest processor may just end up being the world’s leader in on-chip parallelism.

Fun days ahead!

Vanishing Varnish

I was recently saddled with a bunch of cryptic Varnish errors. For some reason, the varnish daemon just kept dying on me. I have previously used varnish in many places and I have never had to face such problems before. Varnish would actually start up and die about 5 seconds later, as shown in the log message below, which was very weird.


Jun 17 15:39:11 earth varnishd[2771]: child (2772) Started
Jun 17 15:39:16 earth varnishd[2771]: Pushing vcls failed: CLI communication error

The first thing I did was to start varnish on the command line using a -d -d parameter, which started varnish in debug mode. Everything worked normally in debug mode and nothing was amiss. So, it really vexed me for a while, not able to figure out why my cache was dying mysteriously.

Then, after lots of digging, turns out that on slower machines, varnish can kill the processes due to heavy load or timeout.

So, I had to add a startup parameter to the configuration file: cli_timeout.


DAEMON_OPTS="-a :8080 \
-T localhost:6082 \
-b localhost:80 \
-u varnish -g varnish \
-p cli_timeout=10 \
-s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,1G"

That fixed it! It seems that the default timeout was 3 seconds and it took about 7 seconds to start up Varnish and it would saturate the 100% of my processor. So, increasing the timeout to 10 seconds did the trick. I think that I would probably need to increase the number of VCPUs tied to this VM as well to cope with the increased load.

Libvirt vs Virt-Manager on Lenny and Lucid

I ran into this random problem with virtualisation recently. For some reason, I just could not manage the LVM storage pools on my virtualisation server from my workstation. My workstation was running on Kubuntu 10.04 and my server was running Debian 5.04 using virt-manager and libvirt on each.

This was a very weird problem because I could access the LVM if there were no allocated logical volumes in them. However, the moment there was anything in them, virt-manager would fail to start the storage pool. This was a really weird problem because I did not have this problem on some of my other installations.

After spending days digging into it, I found out the cause of the problem.

It seems that the libvirt people changed the protocol in version 0.5.0 and swapped the colon delimeter to a comma delimeter. The workstation had a newer version of virt-manager while the server had the older version of libvirt. So, all I had to do was upgrade the libvirt from lenny-backports and that fixed the problem entirely.

The reason why I had not seen this in some other machines is because of the hardware was different. On this particular server, the harddisk was not seen as /dev/sdaX but parked under /dev/blocks/XXX:X instead. So, that is why the confusion with the “:” (colon) came into the picture between the two different versions.

Stress.