Load Balancing Redirection
- Implemented a load balancing redirection scheme for cqweb
Load Balancing CQ Web Servers based on Number of CQ Web Users
The task at hand was to write a redirector that load balances amongst a number of CQ Web servers based on the number of CQ Web Users currently on each server. Additionally, based on how the user came into the CQ Web server farm, redirect them to the proper schema.
Determining Load
The old IIS CQ Web Server used to allow you to query the number of active CQ Web Users. The new Apache/Tomcat server only allows admins to do this. Additionally the admin need to be logged in, thus have a valid token. IBM/Rational suggests using Apache's server-status URL to determine load. However that only displays number of Apache requests in progress not number of CQ Web Users.
If ExtendedStatus is turned on then Apache lists each connection and the URL they are working on. By filtering "GET /cqweb" we can get a rough estimate of the number of CQ Web Users. There is a problem in that the redirector script cannot query the same web server that it's running on. Additionally this information can only be obtained if ExtendedStatus is turned on.
Algorithm for selecting a server
The algorithm for selecting a non busy server is described as:
Pick a lightly loaded server out of the pool. Note that if a server is not running with ExtendedStatus on then $cq_users will be undef. This is different than the case where the server has ExtendedStatus on but there just aren't any CQ Web users (which would be denoted by $cq_users = 0). Thus we may have the condition where:
Server | cq_users | ExtendedStatus |
---|---|---|
server1 | undef | off |
server2 | 20 | on |
server3 | 0 | on |
server4 | 10 | on |
Server | cq_users | ExtendedStatus |
---|---|---|
server1 | undef | on |
server2 | undef | off |
server3 | undef | off |
server4 | 10 | on |
Component | Configuration file |
---|---|
Apache Tomcat Server | C:\Program Files\Rational\Common\rwp\bin\jk_service2.in.properties |
Rational Web Platform | C:\Program Files\Rational\Common\rwp\bin\jk_service2.properties |
For the ClearQuest server:
Component | Configuration file |
---|---|
ClearQuest Request Manager | C:\Program Files\Rational\ClearQuest\cqweb\cqserver\requestmgr_service.properties |
ClearQuest Registry Server | C:\Program Files\Rational\ClearQuest\cqweb\cqregsvr\cqregsvr_service.properties |
# # JVM Options # # Useful Options: # -Xms2m = Initial heap size, modify for desired size # -Xmx256m = Maximum heap size, modify for desired size # -Xrs = Available in Jdk1.3.1 to avoid JVM termination during logoff # wrapper.jvm.options=-Xrs -Xms2m -Xmx256m
I looked at these config files on the three machines (dfls83-85) and they were pretty much set to the default:
Ltx0062320:for server in 83 84 85; do grep wrapper.jvm.options= //dfls$server/Rational/Common/rwp/bin/jk_service2*properties //dfls$server/Rational/ClearQuest/cqweb/cqserver/requestmgr_service.properties //dfls $server/Rational/ClearQuest/cqweb/cqregsvr/cqregsvr_service.properties; done //dfls83/Rational/Common/rwp/bin/jk_service2.default.properties:wrapper.jvm.options=-Xrs -Xms2m -Xmx256m
//dfls83/Rational/Common/rwp/bin/jk_service2.in.properties:wrapper.jvm.options=-Xrs -Xms2m -Xmx256m
//dfls83/Rational/Common/rwp/bin/jk_service2.properties:wrapper.jvm.options=-Xrs -Xms2m -Xmx256m
//dfls83/Rational/ClearQuest/cqweb/cqserver/requestmgr_service.properties:wrapper.jvm.options=-Xrs
//dfls83/Rational/ClearQuest/cqweb/cqregsvr/cqregsvr_service.properties:wrapper.jvm.options=-Xrs
//dfls84/Rational/Common/rwp/bin/jk_service2.default.properties:wrapper.jvm.options=-Xrs -Xms2m -Xmx256m
//dfls84/Rational/Common/rwp/bin/jk_service2.in.properties:wrapper.jvm.options=-Xrs -Xms2m -Xmx256m
//dfls84/Rational/Common/rwp/bin/jk_service2.properties:wrapper.jvm.options=-Xrs -Xms2m -Xmx256m
//dfls84/Rational/ClearQuest/cqweb/cqserver/requestmgr_service.properties:wrapper.jvm.options=-Xrs
//dfls84/Rational/ClearQuest/cqweb/cqregsvr/cqregsvr_service.properties:wrapper.jvm.options=-Xrs
//dfls85/Rational/Common/rwp/bin/jk_service2.default.properties:wrapper.jvm.options=-Xrs -Xms2m -Xmx256m
//dfls85/Rational/Common/rwp/bin/jk_service2.in.properties:wrapper.jvm.options=-Xrs -Xms2m -Xmx256m
//dfls85/Rational/Common/rwp/bin/jk_service2.properties:wrapper.jvm.options=-Xrs -Xms2m -Xmx256m
//dfls85/Rational/ClearQuest/cqweb/cqserver/requestmgr_service.properties:wrapper.jvm.options=-Xrs
//dfls85/Rational/ClearQuest/cqweb/cqregsvr/cqregsvr_service.properties:wrapper.jvm.options=-Xrs
All these machines have 2 gig of main memory and largely just serve CQWeb. Indeed the CQWeb service processes are consuming most of the memory:
![dfls83](/blogs/Status/images/dfls83.jpg)
![dfls84](/blogs/Status/images/dfls84.jpg)
![dfls85](/blogs/Status/images/dfls85.jpg)
I think we should try setting at least the following:
wrapper.jvm.options=-Xrs -Xms128m -Xmx512m
A restart of all CQ Web services would probably be needed for the changes to become effective. The above settings start off the jvm @ 128m for all 4 processes thus a total memory footprint of 512 Meg and limit each process to 512 Meg max for a total footprint of 2 Gig (when full). We might want to bounce this idea off IBM/Rational support to see if all 4 process should have the same settings or if we show vary them.
September 27, 2006
OMAPS.pm bug
- Tracked down and fixed minor bug in OMAPS
I found a minor bug in OMAPS.pm that is called from the CSSD ClearQuest Account Creation page (http://dfls85/cgi-bin/create.pl). The error appears in the log files as:
[Wed Sep 27 10:52:39 2006] [error] [client 128.247.39.85] [Wed Sep 27 10:52:39 2006] create.pl: Useless use of concatenation (.) or string in void context at OMAPS.pm line 318, <CNF> line 139.
Line 318 of OMAPS.pm is:
debug ("add user $data->{login_name} to team ") . $cgi->param("Team");
debug ("add user $data->{login_name} to team " . $cgi->param("Team"));
As we are watching the log files carefully for signs of Clearquest web hangs and outages it would be helpful if this superfluous error were eliminated.
I fixed this by hand on dfls[83-85] but it should be fixed in the original.
September 26, 2006
CQ log files
- Looked into yet another hang up with CQ web servers
CQ Web logs
We get a lot of errors in the logs of the form:
[Tue Sep 26 11:08:51 2006] [error] [client 128.247.39.85] File does not exist: C:/Program Files/Rational/Common/rwp/webapps/cqweb/dct/html/images, referer: http://dfls85.itg.ti.com/cqweb/dct/html/download_en.html [Tue Sep 26 11:08:51 2006] [error] [client 128.247.39.85] File does not exist: C:/Program Files/Rational/Common/rwp/webapps/cqweb/dct/html/images, referer: http://dfls85.itg.ti.com/cqweb/dct/html/download_en.html
var arrowOff=new Image(); arrowOff.src="images/shim.gif"; var arrowOn=new Image(); arrowOn.src="images/arrow_red.gif" ;
This appears to be causing the problem so I updated that JavaScript to:
var arrowOff=new Image(); arrowOff.src="/wre/common/images/shim.gif"; var arrowOn=new Image(); arrowOn.src="/wre/common/images/arrow_red.gif" ;
I'm not sure if this is a Rational problem or something that TI has done but with the above fix the error seems to go away. Well at least for me. I suspect others are still generating the error because JavaScript is cached by the browser. Hopefully as people restart there browsers this will go away.
Additionally the following error is still appearing in the logs:
[Mon Sep 25 20:31:06 2006] [error] [client 172.24.80.20] File does not exist: C:/Program Files/Rational/Common/rwp/htdocs/favicon.ico
I've put a favicon.ico in the proper area on dfls85. The error seems to have diminished however I don't see a favicon in the browser so I'm not sure if this is working.
The two "fixes" above will need to be replicated to the other servers (dfls83 and 84) at some time.
Finally another error shows up:
[Tue Sep 26 18:31:18 2006] [error] [client 128.247.39.85] [Tue Sep 26 18:31:18 2006] create.pl: Useless use of concatenation (.) or string in void context at OMAPS.pm line 318,line 139.
This seems to be an error in CSSD's code which is located under .../Rational/cgi-bin. Going to http://dfls85/cgi-bin/create.pl first redirects me to TI's authentication web page but then back to, in my case, here. Going to that page generates this error in the error log everytime.
September 21, 2006
CQ: DMD Date changes
- Looked at DMD requests
- Fixed problem where Needed_Date and Target_Date could not be set to Submit_Date
September 18, 2006
enable_ldaptk
- Started coding a PerlTK version of enable_ldap
- Solved problem with not being able to write to Samba mounted home drive. Seems one should not use smbntsec in $CYGWIN when the Samba Server is not in the domain
September 15, 2006
enable_ldap
- Added LDAP calls to enable_ldap to check the parms as we go
Integrating LDAP to enable_ldap
I decided it would be good if as enable_ldap gathers parameters, it checks to see if they are correct. It does this by actually calling LDAP calls to validate the things like the server, port, etc. The goal is to make enable_ldap insure that the parameters are indeed correct. Unfortunately this makes enable_ldap dependent on the Net::LDAP module but I think it's worth it to allow enable_ldap to check the parameters and the mapping the user is describing.
I still need to tighten up the code where it queries LDAP and attempts to prove to the user that the mapping is correct. As I understand it you are basically attempting to map a Clearquest field to an LDAP field so that Clearquest can find the correct record. Once that linkage is established Clearquest can "pull" the password from LDAP and thus authenticate the user's password to the LDAP password.
What enable_ldap does is effectively this, however, it's not that informative to simply say "The user ID 'foo' was found in the LDAP directory" rather I want to say "The user id 'foo' corresponds with '<fullname>'". However does "fullname" always appear exactly as that in LDAP?
Additionally, I need to handle the cases where it's not a match or where say multiple entries are returned (not sure how that can happen unless the user specifies an attribute that can have dups or perhaps enters in a wildcard, e.g. "defaria*").
September 11, 2006
Clearquest License Server
- Investigated Clearquest License server
Time Spent: 3 Hours
Dylan Ko wrote:
I have already turned on the sons-clearcase. As we are busy integrating on several projects now, we can not afford to have sons-clearcase down and thus cripple the ClearQuest and the sync between SC and SH office.
We’ll have to find some other appropriate time to turn off sons-clearcase and look into these issues further. Preferably that time that both sites are off – between 3AM to 9AM PST.
OK, here's what I found out so far. Using adefaria as a test machine I first checked to see what FlexLM license server was being used on that machine by selecting Start: All Programs: Rational Software: Rational License Key Administrator. It was using just sons-clearcase. Next I attempting to talk to Clearquest by both the Clearquest GUI and cqc. Then I stopped the FlexLM service on sons-clearcase. I then started the Clearquest GUI and it complained about no license server. Interestingly cqc continued to work. This may be because cqc/cqd opens the Clearquest database in a read only mode.
Next I added sons-sc-cc as a FlexLM License server and retested. Both the Clearcase GUI and cqc were able to obtain a license from sons-sc-cc with no problems. I even shutdown Clearcase on sons-clearcase and I was still able to use Clearcase GUI and cqc from adefaria with no problems.
I then restarted both FlexLM and Clearcase on sons-clearcase.
Dylan, perhaps you want to test this on your workstation. Try adding sons-sc-cc as a FlexLM license server for your desktop. You can toggle off sons-clearcase as a license server and attempt to access Clearquest. You can then stop the FlexLM service on sons-clearcase and test Clearquest access from your machine again. Finally try shutting down Clearcase on sons-clearcase and retest Clearquest access from your machine.
Adding a FlexLM License Server to your Desktop
- Select Start: All Programs: Rational Software: Rational
License Key Administrator:
- Select License Keys: License Key Wizard
- Select Next
- Select Advanced Server Options:
- Select Add Server:
- Click on Values under the Settings on the right
on the Server Name column type sons-sc-cc
and Enter. The server name should change from "New Server"
-> "sons-sc-cc":
At this point you can toggle on or off either sons-clearcase or sons-sc-cc as a license server provider.
- Select OK to close this dialog box. You should see
something like:
Note that there are 3 lines, two serviced by sons-clearcase (Rational ClearCase LT and Rational ClearQuest) and one serviced by sons-sc-cc (Rational ClearQuest). Also note that if you stop FlexLM on sons-clearcase and refresh or return the Rational Key License Administrator then licenses served by sons-clearcase will not be listed (since the server cannot be contacted.
Similarly, all desktops (or Clearquest GUI clients) will have to adjust their FlexLM License Key Server in a similar manner. Also note that you can, instead of adding a server, simply click on the sons-clearcase server then click on Values under the Settings on the right on the Server Name column and replace sons-clearcase with sons-sc-cc.
If the test is successful then I believe that sons-clearcase can be powered off. Again, I think we should still run about a week this way and if things are OK then I can rmreplica the US replicas leaving only the China (sons-cc) ones and the SantaClara (sons-sc-cc) ones.
September 10, 2006
Clearquest Install
- Looked into Clearquest install area
Silent Install and Multiple License Servers
I had hoped to be able to user the /g parm to setup.exe so that the install would be silent and automatic. Clearquest now installs without need for a reboot. But silent install installs silently but then reboots! Ugh!
Also wanted to be able to specify multiple license servers as TI uses 3 of them. I hoped that merely updating sitedefs.dat to list three of them, comma separated would work. But it doesn't. Perhaps spaces? Also need to check siteprep to see if the area can be reprepped with 3 license servers
September 1, 2006
Lost Packet
- Fixed Multisite problems
Time spent: 2 hours
I looked on sons-sc-cc first for multisite errors. In /apps/Rational/Clearcase/var/logs there are logs regarding multisite. Some of the log files pointed me to look in the Event Viewer. In the Event Viewer I saw things like:
Event Type: Error Event Source: ClearCase Event Category: Shipping_server Event ID: 1024 Date: 7/15/2006 Time: 10:59:43 AM User: SALIRA\ccadmin Computer: SONS-SC-CC Description: shipping_server.exe(4448): Error: unable to contact the albd on host 'sons-cc': timed out trying to communicate with ClearCase remote server Data: 0000: 60 11 00 00 `...
I remember problems with sons-cc occasionally having its albd_server go whacky and taking up 50% of the CPU. Doesn't seem to be the case this time. Then again, the RDP session you have on adefaria -> sons-cc died. Perhaps the rebooted sons-cc. It's been up for about a day now.
I RDPed to sons-cc and CC Doctor complained about a version incompatibility between CC and CQ. Looked around on sons-cc - there's no CQ installed there! Why was it removed?
Hmmm... I RDPed to sons-clearcase. Seems it's only been up 3 hours! Must have been recently rebooted....
It seems that a packet was lost somewhere. This is the complicated to do and complicated to explain procedure of setting epoch numbers back in time so as to replay the transactions and get everyone in the replica family on the same page...
I think I got it all straightened out now...