Build Stress Testing

Introduction

Now that we've sped up the build process the next logical question that is often asked is: How many builds can be done simultaneously before the server is saturated? This page will attempt to answer this question. A test was set up to saturate the build server and basic performance statistics were gathered. For the quick answer the number is 5 - 5 simultaneous builds can be performed before the build server starts to stress. The saturation occurs mainly in the CPU followed by the network. Of course, 5 simultaneous builds will not run as quickly as 1 build. Build times decrease. For example, a single build averages around 200 seconds or 3 minutes and 20 seconds. However when 5 builds are going their build times average around 560 seconds or 9 minutes and 20 seconds, roughly a 3 fold increase in build times but still slightly under 10 minutes.

Test Environment

In order to stress the build server I created 10 views. Next I adapted a script to perform a series of 10 timed builds for each of the 10 views. The builds were started in a staggered fashion offset by 5 minutes. Eventually 10 builds were running simultaneously. Additionally the build server was monitored with Perfmon. Perfmon was capturing 4 key measurements, Processor performance, Memory performance, Disk performance and Network performance. The results are displayed here

Combined Performance

The first graph here shows the 4 performance indicators graphed together. Scaling was done to get all figures in a range from 1 to 100. Additionally horizontal bars are shown to indicate when additional build jobs are started. By the end of the graph all 10 builds were running simultaneously. (Click on the image to enlarge).

Combined Performance

As can be seen, processor performance, represented in red, seems to be the limiting factor. You can see the load go up as 1, 2 and 3 simultaneous builds start on the server. At approximately 5 simultaneous builds the CPU is running at around 90% and thereafter no appreciable gain in performance is obtained. Even at 5 builds build time has tripled.

Memory availability, represented in fuisha ,  drops steadily however with 1 gig of memory there is still approximate 620 meg still available at 10 builds. So memory doesn't appear to be the problem.

Disk activity, in pale blue, seems erratic at times but pretty steady and does not seem to climb as the number of builds increase.

Strangely network activity, in yellow,  climbs with the number of builds. The only think I can think of is that there is slightly more network communication that is happening through smake or perhaps Clearcase itself is doing some unnecessary network calls.

Processor Performance

The factor most limiting the number of simultaneous builds appears to be processor performance. Sons-clearcase is a dual CPU machine and has 2 1 gHz Pentium processors. The graph below shows the combined CPU load.

Processor Performance

As can be seen at 4 simultaneous builds the CPUs are ~88% busy. At 5 builds we range from 88 - 92 % utilization. Above 5 builds the processor appears to become saturated. Also at 5 simultaneous builds build times increased by ~150% but still finished under 9 minutes.

Memory Performance

Memory performance does not show that much of a concern. Sons-clearcase has 1 gig of memory. Even at 10 simultaneous builds there was still 620 meg available.

Memory Performance

Disk Performance

Disk performance was spiky but on average did not increase that much even at 10 builds.

Disk Perrformance

Network Performance

Network performance was worrisome. I have done an investigation on how to best optimize remote building on sons-clearcase. It's clear that as much as possible one should avoid having to do network reads or writes. I've documented my findings and implemented optimizations to smake (Smake Optimizations). Still this graph shows that network degradation increased with processor utilization and when more builds ran simultaneously the network slowed down. I do not know why this is so as I have strived to make smake as local as possible. My only thoughts are that there is some network utilization simply because smake uses rsh(1) though that does not explain this graph. Perhaps Clearcase is still causing some network overhead.

Network Performance