Lost Packet
- Fixed Multisite problems
Time spent: 2 hours
I looked on sons-sc-cc first for multisite errors. In /apps/Rational/Clearcase/var/logs there are logs regarding multisite. Some of the log files pointed me to look in the Event Viewer. In the Event Viewer I saw things like:
Event Type: Error Event Source: ClearCase Event Category: Shipping_server Event ID: 1024 Date: 7/15/2006 Time: 10:59:43 AM User: SALIRA\ccadmin Computer: SONS-SC-CC Description: shipping_server.exe(4448): Error: unable to contact the albd on host 'sons-cc': timed out trying to communicate with ClearCase remote server Data: 0000: 60 11 00 00 `...
I remember problems with sons-cc occasionally having its albd_server go whacky and taking up 50% of the CPU. Doesn't seem to be the case this time. Then again, the RDP session you have on adefaria -> sons-cc died. Perhaps the rebooted sons-cc. It's been up for about a day now.
I RDPed to sons-cc and CC Doctor complained about a version incompatibility between CC and CQ. Looked around on sons-cc - there's no CQ installed there! Why was it removed?
Hmmm... I RDPed to sons-clearcase. Seems it's only been up 3 hours! Must have been recently rebooted....
It seems that a packet was lost somewhere. This is the complicated to do and complicated to explain procedure of setting epoch numbers back in time so as to replay the transactions and get everyone in the replica family on the same page...
I think I got it all straightened out now...