KVM Entropy

I increased the SSL performance further by increasing the entropy pool. I tried following the instructions elsewhere but ran into some problems. While it increased the entropy pool on the host machine, it did nothing for the VM.

So, in order to increase the entropy pool for the VM, I had to enable an extra setting:

RNGDOPTIONS="--fill-watermark=90% --feed-interval=1"

That quickly drove up the entropy pool. The net result is better SSL performance with an improvement of about +30% concurrent connections. I don’t know about the quality of the random numbers but for the purpose of my application, it should suffice.

The only consumer of the random numbers is the SSL. It is only use for ephemeral data transfers. It does not matter if someone deciphers the information because by the time they do, it will no longer be very useful.

SSL CipherSuite

Just read a nice article on speeding up SSL operations on Apache. There were a number of different techniques proposed but the one that caught my eye – being immediately usable – was to replace the ciphers with faster ones.

Security has always been a game of trade-offs. In this case, trading off ciphers with higher security and larger keys with smaller and faster ones. As long as it provides a sufficient amount of security, not everyone needs AES256.

So, after configuring Apache away from AES256 to using RC4 instead, I got about a 20% performance boost in the sense that I could now have increased concurrency since the server was now able to process the SSL connections faster.

This got me thinking about embedded security. I think that for embedded servers, like those found on routers and such, using something like RC4-SHA would be more than sufficient for protecting something like password logins from casual snooping. These small embedded devices are just not made for serving highly secure services – unless they came with hardware accelerators.

Something to think about further.

Vanishing Varnish

I was recently saddled with a bunch of cryptic Varnish errors. For some reason, the varnish daemon just kept dying on me. I have previously used varnish in many places and I have never had to face such problems before. Varnish would actually start up and die about 5 seconds later, as shown in the log message below, which was very weird.


Jun 17 15:39:11 earth varnishd[2771]: child (2772) Started
Jun 17 15:39:16 earth varnishd[2771]: Pushing vcls failed: CLI communication error

The first thing I did was to start varnish on the command line using a -d -d parameter, which started varnish in debug mode. Everything worked normally in debug mode and nothing was amiss. So, it really vexed me for a while, not able to figure out why my cache was dying mysteriously.

Then, after lots of digging, turns out that on slower machines, varnish can kill the processes due to heavy load or timeout.

So, I had to add a startup parameter to the configuration file: cli_timeout.


DAEMON_OPTS="-a :8080 \
-T localhost:6082 \
-b localhost:80 \
-u varnish -g varnish \
-p cli_timeout=10 \
-s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,1G"

That fixed it! It seems that the default timeout was 3 seconds and it took about 7 seconds to start up Varnish and it would saturate the 100% of my processor. So, increasing the timeout to 10 seconds did the trick. I think that I would probably need to increase the number of VCPUs tied to this VM as well to cope with the increased load.

Multi-core Parallelism

As I mentioned in a previous blog, I am running a six-core machine – it is actually a virtual machine. Regardless, I noticed one thing while doing it. As I went from single-core to six-core, performance improved accordingly. But when I went up to eight-core, things deteriorated. The reason for this is probably because of the arrangement of the processors. I am running a two-socketed six-core AMD system. So, for an eight-core machine to work, it would need to cross sockets, which is not a good idea due to the communications hierarchy.

So, that’s why I ended up using a six-core VM.

Prefork vs Worker

Apache2 comes with several threading/process models to handle requests. The traditional method is maintained as prefork while the newer method is worker. The main difference is that the former relies on forking processes while the latter method uses multiple threads. The result is that worker is faster than prefork under load.

I tested this by setting up two servers, one with prefork and the other with worker. I used Apache Bench to hit both systems. At low concurrency of around 200 connections, both of them worked equally as well because in both cases, the main bottleneck was CPU. After I increased the CPU from single to six-cores, the load was handled equally well until I upped the ante to 500 concurrent connections.

At this point, the worker model would still run fine while the prefork model would start to fail periodically. It is like it has trouble handling the sudden spike. Once it has been hit a few times, it will also be able to handle the load but ramping up is less graceful. This is pretty obvious once you think about it a bit and preforking processes will have a larger overhead than spawning threads.

The only advantage of using preforking is when handling non-thread safe code. However, since my Apache serves only as a reverse proxy front-end, it does not need to handle any non-thread safe scripts. These are all handled at the back-end by other servers. So, there is really no necessity for the prefork model on my setup.