Lost Packet

  • Posted on
  • by
  • in
  • Fixed Multisite problems

Time spent: 2 hours

I looked on sons-sc-cc first for multisite errors. In /apps/Rational/Clearcase/var/logs there are logs regarding multisite. Some of the log files pointed me to look in the Event Viewer. In the Event Viewer I saw things like:

Event Type:        Error
Event Source:      ClearCase
Event Category:    Shipping_server
Event ID:          1024
Date:              7/15/2006
Time:              10:59:43 AM
User:              SALIRA\ccadmin
Computer:          SONS-SC-CC
Description:
    shipping_server.exe(4448): Error: unable to contact the
    albd on host 'sons-cc': timed out trying  to communicate
    with ClearCase remote server
Data:
    0000: 60 11 00 00               `...   

I remember problems with sons-cc occasionally having its albd_server go whacky and taking up 50% of the CPU. Doesn't seem to be the case this time. Then again, the RDP session you have on adefaria -> sons-cc died. Perhaps the rebooted sons-cc. It's been up for about a day now.

I RDPed to sons-cc and CC Doctor complained about a version incompatibility between CC and CQ. Looked around on sons-cc - there's no CQ installed there! Why was it removed?

Hmmm... I RDPed to sons-clearcase. Seems it's only been up 3 hours! Must have been recently rebooted....

It seems that a packet was lost somewhere. This is the complicated to do and complicated to explain procedure of setting epoch numbers back in time so as to replay the transactions and get everyone in the replica family on the same page...

I think I got it all straightened out now...