1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3 <html xmlns="http://www.w3.org/1999/xhtml" id="sixapart-standard">
5 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
6 <meta name="generator" content="Movable Type 5.2.3" />
8 <link rel="stylesheet" href="http://defaria.com/blogs/Status/styles-site.css" type="text/css" />
9 <link rel="alternate" type="application/atom+xml" title="Atom" href="http://defaria.com/blogs/Status/atom.xml" />
10 <link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="http://defaria.com/blogs/Status/index.xml"$>" />
12 <title>Status for Andrew DeFaria: September 24, 2006 - September 30, 2006 Archives</title>
14 <link rel="start" href="http://defaria.com/blogs/Status/" title="Home" />
15 <link rel="prev" href="http://defaria.com/blogs/Status/archives/week_2006_09_17.html" title="September 17, 2006 - September 23, 2006" />
16 <link rel="next" href="http://defaria.com/blogs/Status/archives/week_2006_10_01.html" title="October 1, 2006 - October 7, 2006" />
18 <body class="layout-one-column">
20 <div id="container-inner" class="pkg">
23 <div id="banner-inner" class="pkg">
24 <h1 id="banner-header"><a href="http://defaria.com/blogs/Status/" accesskey="1">Status for Andrew DeFaria</a></h1>
25 <h2 id="banner-description">Searchable status reports and work log</h2>
30 <div id="pagebody-inner" class="pkg">
32 <div id="alpha-inner" class="pkg">
34 <p class="content-nav">
35 <a href="http://defaria.com/blogs/Status/archives/week_2006_09_17.html">« September 17, 2006 - September 23, 2006</a> |
36 <a href="http://defaria.com/blogs/Status/">Main</a>
37 | <a href="http://defaria.com/blogs/Status/archives/week_2006_10_01.html">October 1, 2006 - October 7, 2006 »</a>
43 <h2 class="date-header">September 29, 2006</h2>
45 <div class="entry" id="entry-575">
46 <h3 class="entry-header">Load Balancing Redirection</h3>
47 <div class="entry-content">
48 <div class="entry-body">
50 <li>Implemented a load balancing redirection scheme for cqweb</li>
53 <h2>Load Balancing CQ Web Servers based on Number of CQ Web Users</h2>
55 <p>The task at hand was to write a redirector that load balances amongst a number of CQ Web servers based on the number of CQ Web Users currently on each server. Additionally, based on how the user came into the CQ Web server farm, redirect them to the proper schema.</p>
57 <h3>Determining Load</h3>
59 <p>The old IIS CQ Web Server used to allow you to query the number of active CQ Web Users. The new Apache/Tomcat server only allows admins to do this. Additionally the admin need to be logged in, thus have a valid token. IBM/Rational suggests using Apache's server-status URL to determine load. However that only displays number of Apache requests in progress not number of CQ Web Users.</p>
61 <p>If ExtendedStatus is turned on then Apache lists each connection and the URL they are working on. By filtering "GET /cqweb" we can get a rough estimate of the number of CQ Web Users. There is a problem in that the redirector script cannot query the same web server that it's running on. Additionally this information can only be obtained if ExtendedStatus is turned on.</p>
63 <h2>Algorithm for selecting a server</h2>
65 <p>The algorithm for selecting a non busy server is described as:</p>
67 <p>Pick a lightly loaded server out of the pool. Note that if a server is not running with ExtendedStatus on then $cq_users will be undef. This is different than the case where the server has ExtendedStatus on but there just aren't any CQ Web users (which would be denoted by $cq_users = 0). Thus we may have the condition where:
69 <table cellspacing=0 cellpadding=2 border=1>
74 <th>ExtendedStatus</th>
99 <p>In such a case we wish to pick server3 since it has no current CQ web users.
101 <p>The algorithm used here will be to remove all servers from the pool who are not running with ExtendedStatus on since we cannot reliably tell how loaded the server is from the standpoint of CQ Web users. If, however, no servers have ExtendedStatus on (thus all $cq_users return as undef) then we will consider the $nbr_apache_requests. IOW $nbr_apache_requests is not equivalent with $cq_users and thus they cannot be compared together. But if no server is running with ExtendedStatus on we need to pick something!</p>
103 <p><b>Note:</b> If $random then a server is simply randomly chosen.</p>
105 <p>Unfortunately, given this algorithm, if we had the following situation:</p>
107 <table cellspacing=0 cellpadding=2 border=1>
112 <th>ExtendedStatus</th>
137 <p>Then this algorithm will always return server4.</p>
139 <p><b><font color=red>Important Note:</font></b> The web server doing the redirection cannot be queried. Attempting to do so hangs! Therefore it cannot participate in the server pool. It is recommended that another web server be set up as the redirector and the DNS name cqweb assigned to it. This web server can, however, participate by being a Clearquest Request Manager.</p>
141 <h3>Random Redirection</h3>
143 <p>The script can also redirect randomly instead of relying on load of CQ Web Users. Currently there are 3 servers in the pool. Only one of them has ExtendedStatus turned on. As such redirecting by load will always resolve to the one server using, the one running with ExtendedStatus on. This is not good. So currently it just picks a server randomly from the pool. This behavior is controlled by the <tt>lb</tt> parameter (currently defaulted to off meaning pick server randomly).
145 <h3>Defining the Server Pool</h3>
147 <p>The server pool is defined by a small file, servers.cfg, which simply list the servers participating in the pool. Servers can be added or removed dynamically.</p>
149 <h3>Mapping Redirection</h3>
151 <p>In the past users went to http://cqweb/<area>. These <areas> were HTML files in the DocumentRoot which redirected to a series of redirection scripts. It was hoped that HTTP_REFERER could be used to determine where to redirect the visitor. Unfortunately HTTP_REFERER is not guaranteed and indeed it's undefined on the web servers!</p>
153 <p>Instead one must specify the <tt>group</tt> parameter to the redirector script. The script then maintains a map between <areas> -> Schema/ContextIDs. If the group is not specified or not in the map then the user is redirected to the main login page. This is not viewed as a hardship because we need redirecting <area> files anyway. The new form of redirecting <area> file is:
155 <div class=code><pre>
158 <mdeta http-equiv="refresh" content="0; url=http://cqweb.itg.ti.com/cgi-bin/redirect.pl?group=<<i>area</i>>>
162 <h3>Redirect Map</h3>
164 <p>The redirect map, stored in redirect.map, is a file of key/value pairs. For example:</p>
166 <div class=code><pre>
167 CMDT: &schema=CMDT.2003.06.00&contextid=CMDT
168 CSSD: &schema=omap.2002.05.00&contextid=OMAPS
169 DLP-Play: &schema=DLP.2003.06.00&contextid=Play
170 DLP: &schema=DLP.2003.06.00&contextid=DLP
171 DMD-p: &schema=DLP.2003.06.00&contextid=DMD-p
172 DMD: &schema=DLP.2003.06.00&contextid=DMD
173 GCM: &schema=CMDT.2003.06.00&contextid=GCM
174 HPALP: &schema=HPA_MKT_LP&contextid=HPALP
175 LDM: &schema=CMDT.2003.06.00&contextid=LDM
176 NV: &schema=CMDT.2003.06.00&contextid=NV
177 SDO: &schema=SDS.2003.06.00&contextid=SDSCM
178 SDO_TEST: &schema=SDS_TST_DEV&contextid=SDSCM
179 WiMax: &schema=WiMax.SR5&contextid=WiMax
180 mDTV: &schema=mDTV.2003.06.00&contextid=MDTV
181 mDTV_play: &schema=mDTV.2003.06.00&contextid=PLAY
184 <h3>Parameters for redirect.pl</h3>
186 <p>The following parameters, specified in the URL, are supported by redirect.pl:</p>
190 <dd>Specifies the key into the redirect.map for the schema/contextid. If not specified then defaults to main login page of the selected server</dd>
193 <dd>If set then load balancing is attempted based on ExtendedStatus and CQ Web Users as described above. Default: undefined (off)</dd>
196 <dd>If specified the user is not redirected rather debugging information is output.</dd>
199 <p class="entry-footer">
200 <span class="post-footers">Posted by at 2:35 PM</span> <span class="separator">|</span> <a class="permalink" href="http://defaria.com/blogs/Status/archives/000575.html">Permalink</a>
210 <h2 class="date-header">September 28, 2006</h2>
212 <div class="entry" id="entry-574">
213 <h3 class="entry-header">JVM Stack/Heap Sizes</h3>
214 <div class="entry-content">
215 <div class="entry-body">
217 <li>Looked into JVM stack and heap sizes on dfls83-85</li>
220 <p>As you know there have been service interruptions in CQWeb. I keep looking at the logs for clues. About the only consistent thing is an error similar to this:</p>
222 <div class=code><pre>
223 2006-09-28 00:10:20 Ajp13Processor[8009][15] process: invoke
224 java.net.SocketException: Connection reset by peer: socket write error
225 at java.net.SocketOutputStream.socketWrite0(Native Method)
226 at java.net.SocketOutputStream.socketWrite(Unknown Source)
227 at java.net.SocketOutputStream.write(Unknown Source)
228 at org.apache.ajp.Ajp13.send(Ajp13.java:525)
229 at org.apache.ajp.RequestHandler.finish(RequestHandler.java:495)
230 at org.apache.ajp.Ajp13.finish(Ajp13.java:395)
231 at org.apache.ajp.tomcat4.Ajp13Response.finishResponse(Ajp13Response.java:196)
232 at org.apache.ajp.tomcat4.Ajp13Processor.process(Ajp13Processor.java:464)
233 at org.apache.ajp.tomcat4.Ajp13Processor.run(Ajp13Processor.java:551)
234 at java.lang.Thread.run(Unknown Source)
237 <p>Now "Connection reset by peer" could be an error that the process gets because the service has stopped so this could be more of a symptom than a cure. However searching for "Ajp13Processor socket write error" points me to <a href="http://mail-archives.apache.org/mod_mbox/tomcat-users/200203.mbox/%3C751828609.20020327090907@e-box.dk%3E">this
238 post</a> which suggests increasing the stack and heap sizes for the JVM. Problems that are intermittent can be consistent with running out of stack or heap size.<./p>
240 <p>According to the Clearquest Web Administration Guide:</p>
243 <hr size="2" width="100%">
244 <h3>Controlling Java VM Memory Consumption </h3>
246 <p>You can configure the memory consumption of Java processes used by New ClearQuest Web by adjusting the parameters in property files under the various components.</p>
250 <p>This section describes the configuration changes for New ClearQuest Web Java VM memory consumption for processes running on Microsoft Windows. To specify the VM memory consumption:</p>
253 <li>Open the appropriate configuration file for the New ClearQuest Web component whose memory consumption you want to reconfigure. For the ClearQuest Web application:
255 <table border="1" cellpadding="2" cellspacing="0" width="100%">
258 <th align="left" valign="top">Component</th>
259 <th align="left" valign="top">Configuration file</th>
262 <td valign="top">Apache Tomcat Server</td>
263 <td valign="top">C:\Program
264 Files\Rational\Common\rwp\bin\jk_service2.in.properties</td>
267 <td valign="top">Rational Web Platform</td>
268 <td valign="top">C:\Program
269 Files\Rational\Common\rwp\bin\jk_service2.properties</td>
274 <p>For the ClearQuest server:</p>
275 <table border="1" cellpadding="2" cellspacing="0" width="100%">
278 <th align="left" valign="top">Component</th>
279 <th align="left" valign="top">Configuration file</th>
282 <td valign="top">ClearQuest Request Manager</td>
283 <td valign="top">C:\Program
284 Files\Rational\ClearQuest\cqweb\cqserver\requestmgr_service.properties</td>
287 <td valign="top">ClearQuest Registry Server</td>
288 <td valign="top">C:\Program
289 Files\Rational\ClearQuest\cqweb\cqregsvr\cqregsvr_service.properties</td>
294 <li>Modify the section shown below:<br>
296 <div class=code><pre>
299 # -Xms2m = Initial heap size, modify for desired size
300 # -Xmx256m = Maximum heap size, modify for desired size
301 # -Xrs = Available in Jdk1.3.1 to avoid JVM termination during logoff
303 wrapper.jvm.options=-Xrs -Xms2m -Xmx256m
306 <hr size="2" width="100%">
309 <p>I looked at these config files on the three machines (dfls83-85) and they were pretty much set to the default:</p>
311 <div class=code><pre>
312 <font color=blue><b>Ltx0062320:</b></font><u>for server in 83 84 85; do grep
313 wrapper.jvm.options= //dfls$server/Rational/Common/rwp/bin/jk_service2*properties
314 //dfls$server/Rational/ClearQuest/cqweb/cqserver/requestmgr_service.properties
316 $server/Rational/ClearQuest/cqweb/cqregsvr/cqregsvr_service.properties;
318 //dfls83/Rational/Common/rwp/bin/jk_service2.default.properties:wrapper.jvm.options=-Xrs
320 //dfls83/Rational/Common/rwp/bin/jk_service2.in.properties:wrapper.jvm.options=-Xrs
322 //dfls83/Rational/Common/rwp/bin/jk_service2.properties:wrapper.jvm.options=-Xrs
324 //dfls83/Rational/ClearQuest/cqweb/cqserver/requestmgr_service.properties:wrapper.jvm.options=-Xrs<br>
325 //dfls83/Rational/ClearQuest/cqweb/cqregsvr/cqregsvr_service.properties:wrapper.jvm.options=-Xrs<br>
326 //dfls84/Rational/Common/rwp/bin/jk_service2.default.properties:wrapper.jvm.options=-Xrs
328 //dfls84/Rational/Common/rwp/bin/jk_service2.in.properties:wrapper.jvm.options=-Xrs
330 //dfls84/Rational/Common/rwp/bin/jk_service2.properties:wrapper.jvm.options=-Xrs
332 //dfls84/Rational/ClearQuest/cqweb/cqserver/requestmgr_service.properties:wrapper.jvm.options=-Xrs<br>
333 //dfls84/Rational/ClearQuest/cqweb/cqregsvr/cqregsvr_service.properties:wrapper.jvm.options=-Xrs<br>
334 //dfls85/Rational/Common/rwp/bin/jk_service2.default.properties:wrapper.jvm.options=-Xrs
336 //dfls85/Rational/Common/rwp/bin/jk_service2.in.properties:wrapper.jvm.options=-Xrs
338 //dfls85/Rational/Common/rwp/bin/jk_service2.properties:wrapper.jvm.options=-Xrs
340 //dfls85/Rational/ClearQuest/cqweb/cqserver/requestmgr_service.properties:wrapper.jvm.options=-Xrs<br>
341 //dfls85/Rational/ClearQuest/cqweb/cqregsvr/cqregsvr_service.properties:wrapper.jvm.options=-Xrs<br>
345 <p>All these machines have 2 gig of main memory and largely just serve CQWeb. Indeed the CQWeb service processes are consuming most of the memory:</p>
347 <div align="center"><img title="dfls83" src="/blogs/Status/images/dfls83.jpg" alt="" height="541" width="524">
349 <img title="dfls84" src="/blogs/Status/images/dfls84.jpg" alt="" height="545" width="525">
352 <img title="dfls85" src="/blogs/Status/images/dfls85.jpg" alt="" height="577" width="528">
356 <p>I think we should try setting at least the following:</p>
358 <div class=code><pre>
359 wrapper.jvm.options=-Xrs -Xms128m -Xmx512m
362 <p>A restart of all CQ Web services would probably be needed for the changes to become effective. The above settings start off the jvm @ 128m for all 4 processes thus a total memory footprint of 512 Meg and limit each process to 512 Meg max for a total footprint of 2 Gig (when full). We might want to bounce this idea off IBM/Rational support to see if all 4 process should have the same settings or if we show vary
365 <p class="entry-footer">
366 <span class="post-footers">Posted by at 11:53 AM</span> <span class="separator">|</span> <a class="permalink" href="http://defaria.com/blogs/Status/archives/000574.html">Permalink</a>
376 <h2 class="date-header">September 27, 2006</h2>
378 <div class="entry" id="entry-570">
379 <h3 class="entry-header">OMAPS.pm bug</h3>
380 <div class="entry-content">
381 <div class="entry-body">
383 <li>Tracked down and fixed minor bug in OMAPS</li>
386 <p>I found a minor bug in OMAPS.pm that is called from the CSSD ClearQuest Account Creation page (http://dfls85/cgi-bin/create.pl). The error appears in the log files as:</p>
388 <div class=code><pre>
389 [Wed Sep 27 10:52:39 2006] [error] [client 128.247.39.85] [Wed Sep 27 10:52:39 2006]
390 create.pl: Useless use of concatenation (.) or string in void context at OMAPS.pm line 318, <CNF> line 139.
393 <p>Line 318 of OMAPS.pm is:</p>
394 <div class=code><pre>
395 debug ("add user $data->{login_name} to team "<u><font color="#ff0000"><b>)</b></font></u> . $cgi->param("Team");
397 But it should read:<br>
398 <div class=code><pre>
399 debug ("add user $data->{login_name} to team " . $cgi->param("Team")<u><font color="#009900"><b>)</b></font></u>;
402 <p>As we are watching the log files carefully for signs of Clearquest web hangs and outages it would be helpful if this superfluous error were eliminated.</p>
404 <p>I fixed this by hand on dfls[83-85] but it should be fixed in the original.</p>
406 <p class="entry-footer">
407 <span class="post-footers">Posted by at 9:08 AM</span> <span class="separator">|</span> <a class="permalink" href="http://defaria.com/blogs/Status/archives/000570.html">Permalink</a>
417 <h2 class="date-header">September 26, 2006</h2>
419 <div class="entry" id="entry-569">
420 <h3 class="entry-header">CQ log files</h3>
421 <div class="entry-content">
422 <div class="entry-body">
424 <li>Looked into yet another hang up with CQ web servers</li>
429 <p>We get a lot of errors in the logs of the form:</p>
431 <div class=code><pre>
432 [Tue Sep 26 11:08:51 2006] [error] [client 128.247.39.85] File does not exist:
433 C:/Program Files/Rational/Common/rwp/webapps/cqweb/dct/html/images, referer:
434 http://dfls85.itg.ti.com/cqweb/dct/html/download_en.html
435 [Tue Sep 26 11:08:51 2006] [error] [client 128.247.39.85] File does not exist:
436 C:/Program Files/Rational/Common/rwp/webapps/cqweb/dct/html/images, referer:
437 http://dfls85.itg.ti.com/cqweb/dct/html/download_en.html
440 These errors are not a big deal except they cloud the log files with meaningless stuff that you need to skip over all the time. I decided to look into this and see where they were coming from. In the file .../rwp/webapps/wre/common/script/common.js there appeared the following code:
442 <div class=code><pre>
443 var arrowOff=new Image();
444 arrowOff.src="images/shim.gif";
445 var arrowOn=new Image();
446 arrowOn.src="images/arrow_red.gif" ;
449 <p>This appears to be causing the problem so I updated that JavaScript to:</p>
451 <div class=code><pre>
452 var arrowOff=new Image();
453 arrowOff.src="/wre/common/images/shim.gif";
454 var arrowOn=new Image();
455 arrowOn.src="/wre/common/images/arrow_red.gif" ;
458 <p>I'm not sure if this is a Rational problem or something that TI has done but with the above fix the error seems to go away. Well at least for me. I suspect others are still generating the error because JavaScript is cached by the browser. Hopefully as people restart there browsers this will go away.</p>
460 <p>Additionally the following error is still appearing in the logs:</p>
462 <div class=code><pre>
463 [Mon Sep 25 20:31:06 2006] [error] [client 172.24.80.20] File does not exist:
464 C:/Program Files/Rational/Common/rwp/htdocs/favicon.ico
467 <p>I've put a favicon.ico in the proper area on dfls85. The error seems to have diminished however I don't see a favicon in the browser so I'm not sure if this is working.</p>
469 <p>The two "fixes" above will need to be replicated to the other servers (dfls83 and 84) at some time.</p>
471 <p>Finally another error shows up:</p>
473 <div class=code><pre>
474 [Tue Sep 26 18:31:18 2006] [error] [client 128.247.39.85] [Tue Sep 26 18:31:18 2006] create.pl:
475 Useless use of concatenation (.) or string in void context at OMAPS.pm line 318, <CNF> line 139.
478 <p>This seems to be an error in CSSD's code which is located under .../Rational/cgi-bin. Going to <a href="http://dfls85/cgi-bin/create.pl">http://dfls85/cgi-bin/create.pl</a> first redirects me to TI's authentication web page but then back to, in my case, <a href="http://dfls85/cgi-bin/create.pl?stoken=nJAROWWcvXf2s4lwHoGETsko1vphx%2fLqHxNu7IU4L6axrg6DrEJiVoIi8ACLnoycJTlPsGJlCDLr3MhbcRx4DXgvut4ea6d%2bU%2bWzQR3oBpa6NsxSH7EPctkp96%2b9UkgIHfwh%2fWYSI0pO1wu6uHVcGgpT5tAuSpF0a3vBMgRZN8hoU0TPYJEQ9z6nNI99fKhjRh6SRNDmDdM%3d">here</a>. Going to that page generates this error in the error log everytime.</p>
480 <p class="entry-footer">
481 <span class="post-footers">Posted by at 4:32 PM</span> <span class="separator">|</span> <a class="permalink" href="http://defaria.com/blogs/Status/archives/000569.html">Permalink</a>