Vob corruption/Email/Chmaster

  • Posted on
  • by
  • in
  • Fixed problem with email from Multisite jobs
  • Investigated vob corruption
  • Looked into chmaster

Total time: 5 hours

Email server for Multisite messages

Multisite needs to send email if there is a problem with synchronization. The setting for which SMTP server to use is in the Clearcase Control Panel under Advanced. Somehow that got set to sons-exch02 which is no longer a valid SMTP server. Changed this to sons-exch01.

Database identifier not found when doing syncreplica import

Dear IBM/Rational Tech Support: My name is Andrew DeFaria and I perform Clearcase/Clearquest consultant services. One of my clients, Salira Optical Network Systems (a former employer of mine), has been experiencing a problem, described below, and has asked me to look into it for them. They are also in the process of migrating to newer server hardware and migrating up to the latest version of Clearcase/Clearquest. I've been performing this migration. So far we have the new server up and have Multisite replicating things between 3 "sites" - a remote one in Shanghai (sons-cc) and two in Santa Clara: the old server (sons-clearcase) and the new server (sons-sc-cc).

I had been working on this problem for a while last night. I was seeing what you guys were seeing - the db_server process will be running wildly and taking up 50% of the CPU. This seems to happen whenever the scheduled syncreplica -import runs on sons-clearcase. As a result sons-clearcase is not being synced. As far as I can tell this syncreplica -import never finishes and the db_server process consumes 50% of the CPU until killed.

I've also seen several of the following error in the db_server log:

Database identifer <x> not found in "..db__obj.c" line 731

While the line number remains the same the id's I've seen are 427883, 427919, 427922.

Thinking that this was some sort of vob database corruption I ran checkvob and it reported some minor missing references to source containers. I then ran it in fix mode which cleared up the missing source containers but the missing db identifiers remain.

Next I tried running recoverpacket hoping to set the epoch numbers back a few days and thinking maybe the syncreplica would repair itself. On sons-sc-cc (SantaClara replica) I issued the following command:

[ccadmin] sons-sc-cc:mt recoverpacket -since 20-Jun-05 SantaClara@\salira
Using epoch information from Monday, June 19, 2006 11:00:03 Pm
Epoch row for replica "US" successfully reset

Then back on sons-clearcase (US replica) I issued:

[ccadmin] sons-clearcase:mt syncreplica -export -fship SantaClara@\salira

This went on to create a huge packet (growing over 1 gig!) before the scheduled syncreplica -import starts and ties up the db_server process.

Searching IBM/Rational support the closest thing I see is multitool syncreplica -export fails with Database identifier 0 not found in "../db__ver.c" line 505. I know this speaks of syncreplica -export and references db__ver.c not db__obj.c it is the closest problem report that I could find. And it has the onerous note of:

Note: This defect may also occur in ClearCase MultiSite 2002.05 (5.0), however, the fix will not be back patched, you either need to back out of the patch that introduced this, or upgrade to a later version of ClearCase MultiSite to recover.

The clearcase version on sons-clearcase is:

[ccadmin] sons-clearcase:ct -ver
ClearCase version 2002.05.00 (Tue Oct 30 08:27:59 2001)
clearcase patch p2002.05.00 NT-8 (Mon Jun 10 14:44:04 2002)
clearcase patch p2002.05.00 NT-12 (Thu Sep 12 11:15:10 2002)
@(#) MVFS version 2002.05.00+ (May 25 2002 03:14:49)
cleartool                         2002.05.00 (Fri Oct 26 20:24:09  2001)
db_server                         2002.05.00+ (Fri Aug 30 11:48:28 2002)

As I said, we are in the process of migrating to 2003.06 and we are already halfway there - however, at this point people have not yet fully migrated their views over to the new server and the old server still serves Clearcase licenses.

Finally, as I am only a part time consultant at Salira you may wish to contact Jeff Stribling (408-845-5200) directly to gather more info and possibly try some solutions. My contact information is at http://defaria.com/contact.php but realize that during the day I'm at another client.

Chmaster

There are a few Clearcase objects that can have mastership. These are:

  1. Label types
  2. Branch types
  3. Trigger types
  4. Hyperlink types
  5. Attribute types
  6. Element types

1 & 2 above are the ones that concern me and that need to eventually get transfered. #3 has already been done by my mktriggers script with added the triggers to the vobs over on sons-sc-cc long ago.#4, 5 and 6 Salira doesn't really use anyway (there are only the predefined types for 4, 5 and 6 anyway).

It might be good for you, or perhaps Vijay, to experiment a bit on changing the mastership of a branch type, a branch that is not heavily used. I would:

  • Set up a new view on sons-sc-cc oriented to working on this test branch that is mastered by sons-clearcase
  • Verify that there is a problem and how it manifests itself attempting to use this view on sons-sc-cc. IOW, while working on this new view on sons-sc-cc verify that you cannot checkout to this test branch because it's mastered at sons-clearcase
  • Transfer mastership of this test branch over to sons-sc-cc
  • Test that the problem has gone away

Of course I realize that the overriding problem is the current vob database problem on sons-clearcase described in my earlier email. Assuming that that's fixed and multisiting is working...

You can see these types by right clicking on the vob in the Clearcase Explorer on sons-clearcase and selected Explore Types. You can double click on branch type and find your test branch. Right click on it and select properties. Go to the Mastership tab and click on Change. Select SantaClara (that's sons-sc-cc). You might need to also perform a syncreplica (run the scheduled job for sync export on sons-clearcase and the sync import on sons-sc-cc.

There are about 30 branch types that are mastered on sons-clearcase. There are far more label types that are mastered on sons-clearcase. IOW while I might change mastership by hand for the 30 or so branch types, I wouldn't want to change mastership for all those labels.

What I have not verified is what happens if one was working on sons-sc-cc, checks in a bug and the trigger attempts to move that pre existing label (this was an update to an old bug) to point to the new version about to be checked in. Since that label is mastered by sons-clearcase will that be a problem?

If your test above regarding changing the mastership of a test branch type is successful you may wish to move a more used branch type's mastership in the same manner then instructed the engineers involved in that branch to move to the new server. Then the next branch type, etc.