Status for Andrew DeFaria: December 2006 Archives

Mkcc.pm and mkvob

Pulled out common code from mkview that will be needed for mkvob into Mkcc.pm Perl module
Re-engineered mkview to use Mkcc.pm
Started recoding mkvob to use GPDB and the new Mkcc.pm
Code complete on mkvob - need to test
Issue: It seems that we have multiple versions of mkview_linked and that UK has continued development of their version of mkview_linked while we've been developing on mkview_linked here in Dallas. IOW there are two versions in separate vobs!

The old mkvob_db

There are some oddities in the old mkvob_db code. For example, there is an insistence in setting umask to 0 before creating directories, which are created with specific permissions. Setting umask is only for the new creation of a directory or file. One could, instead, simply make the directory then chmod it to whatever you really wanted the permissions to be. The new subroutine MakeDir in Mkcc.pm does just that, as well as chown if owner and group are passed in as well as only making the directory if it is needed. This simplifies the code a lot and removes the need to save, set and restore umask.

Another oddity of the old code can be seen in the following:

foreach $rgn (@regionList) {
  $debug && print "DEBUG: Checking tag in $rgn\n";

  $cmd = 'lsvob -s -region $rgn';

  @output = ctcmd($cmd);

  foreach $item (@output)
  {
    if ($item =~ /^$tag$/)
      {
	print "VOB tag $tag already exists in $rgn\n\n";
	exit (1);
      }
  }
}

Looks innocuous enough - run through a list of regions looking to see if this newly requested vob tag already exists in any of the regions. However there are bugs and inefficiencies in the above code. For starters the single quote on the setting of $cmd says to not expand the variable $rgn. The resulting command then becomes "cleartool lsvob -s region $rgn" which, of course, would return an error, which, of course, is not checked for here. So the code, while trying to verify that this vob tag does not exist in any of the regions, fails to actually check it!

However if we just correct that by using double quotes we will now incur a performance penalty. Now the registry server will be required to return all the vob tags for each and every region which can take some time. Is there any way we can make this faster? Why yes there is! Specify the actual vob tag (as in cleartool lsvob -s -region <region name> <vob tag>)! You know sometimes people say that Clearcase always returns all of the registry data which is then parsed locally. Then again, sometimes it's the programmer who asks for it!

So the new code looks like:

foreach (@regionList) {
  debug "Checking for $tag in region $_";

  @output = ctcmd "lsvob -s -region $_ $tag";

  error "VOB tag $tag already exists in region $_", 1 if !grep /Error/, @output;
} # foreach

Note the simplifications for debug (now a subroutine from Mkcc.pm), usage of the default variable ($_) and error reporting (also a subroutine from Mkcc.pm). This algorithm performs the same task, correctly and orders of magnitude quicker.

Posted by at 4:50 PM | Permalink

Reworking GPDB tables, mkview

Finished gpdb_add_project.pl with new table layout
With gpdb_getProjectsAtSite I can now return to mkview. Implemented new paging usage listing projects at the site
Finished coding of mkview such that it is now functional with GPDB and can make views

Posted by at 4:46 PM | Permalink

GPDB Database performance

Moved convertdb and gpdb_add_vob into Clearcase
Attempting to standardize which Perl to use, which Oracle.pm to pickup and how to insure that other sites have the proper prerequisites for GPDB
Discovered that Oracle is not supported on Linux here at TI. This will be a problem for GPDB
Still working on issues of the new GPDB design and attempting to get gpdb_add_project.pl to work with it
Got definition of performance problem that Donna is experiencing. She is attempting to populate a pull down with just the project names for a site. Doing so causes lots of transfer of data as the current GPDB API gpdb_getProject effectively transfers all kinds of project information where Donna needs only the project names.
Developed a new API, gpdb_getProjectsAtSite that returns only the project names in a more efficient manner

gpdb_getProjectsAtSite

Donna may be right and we may need to enlist the help of Ajay here.

I coded up a gpdb_getProjectsAtSite function:

sub gpdb_getProjectsAtSite ($$) {
  my ($site_name, $resource) = @_;

  resetErr ();
 
  unless (lc $resource eq "clearcase" or
          lc $resource eq "designsync") {
    setError (-1, "gpdb_getProjectsAtSite: Resource must be one of 'clearcase' or 'designsync'");
    return ();
  } # unless

  my $siteID    = siteID $site_name;
  my $condition    = "site_id = $siteID and $resource = 'Y'";

  my @projects = @{GPDB::primitive::searchData ("projects", $condition)};
  my @project_names;

  foreach (@projects) {
    my %project    = %{$_};
    my $name    = projectName $project {PARENT_PROJ_ID};

    next if !$name;

    push @project_names, $name;
  } # foreach

  return @project_names;
} # gpdb_getProjectsAtSite

Basically you call it with a site name and a resource (being clearcase or designsync). It does some housekeeping (resetting the error variables and checking that resource is one of clearcase or designsync). Next it translates the site name to an ID. We need an ID and we shouldn't burden the users with having to supply that. The siteID function is a new internal function for gpdb.pm because I often find the need to translate a site name to an ID. Next we compose a condition which is the part after "where" that says find things that have "site_id = $siteID and $resource = 'Y'". Remember resource is either "clearcase" or "designsync" and we wish to find project records where the site matches and the resource is toggled on (i.e. = 'Y').

There's a new primitive, searchData because getData only finds single records by "ID" only and findData will return multiple records based on a fieldname = value specific condition. Here we want two different fieldname/value pairs and an "and" condition. Therefore the searchData primitive takes two parameters, the table name and the condition, and composes a "select * from $tableName where $condition" and returns an array of hashes like findData does.

At this point we have an array of projects whose site IDs match our passed in Site Name and whose $resource is toggled on as 'Y'. But we want to return project names to be nice for the user and that's what the foreach loop does. Note it calls another new internal routine called projectName which returns the project's name for the product ID (the parent project ID that is). Note also that projectName will return undef if the project is retired. That's what the "next if !$name" statement is for. All non-retired project names therefore are pushed onto @project_names which are returned from the subroutine.

I do not see how I could make this any faster.

Well how did it perform? Selecting on Dallas and Clearcase projects (because gpdb_add_project.pl that does DesignSync additions is still not working well) there are 194 Clearcase projects at the Dallas site. Running a small test script here and at Manchester yields:

Dallas:time testproj_names.pl
real    0m7.156s
user    0m1.260s
sys     0m0.310s
 
Manchester:time testproj_names.pl
real    0m23.708s
user    0m0.490s
sys     0m0.170s

That's 24 seconds from Manchester or roughly 3.5 times as slow.

Switching over to selecting designsync records, there are 9 of them at Dallas. Timings for this are:

Dallas:time testproj_names.pl
real    0m1.402s
user    0m0.810s
sys     0m0.120s

Manchester:time testproj_names.pl
real    0m3.646s
user    0m0.300s
sys     0m0.080s

Again in the order of 3 times as slow

Posted by at 5:32 PM | Permalink

Users/Sites and Projects

Implemented schema changes for GPDB based on previous meetings
Dropped PDB_ from table names to make them shorter and more understandable
Created all new tables and mapping tables. Adjusted sequencing
Changed users table to use AXID for key. Associated changes to other tables
Added synonyms where required
Updated drop_all.sql to reflected all new tables, views and sequences
Reorienting gpdb.pm for new database layout
Initially dropped the projects_by_site view as the new data structures handle this in a different way. Later added the view back this time gathering information from 3 different tables.
Got convertdb to be able to add users, sites and projects.
Got gpdb_add_vobs.pl working again in new structure
Started getting web pages to reveal data in new structures
Working on gpdb_add_project.pl to accommodate new schema layout
Changed gpdb_add_project.pl to use sites table instead of a sites file
Changed -s parm for gpdb_add_project.pl to instead specify a site to process - default being all sites

Posted by at 3:48 PM | Permalink

GPDB Bug Fixed

Fixed bug in domain_ranges.sql script
Finished Create site
Added replica_name field
Fixed naming error with snapto_dir and owning_group
Expanded size of snap_notify to 1024 bytes

Posted by at 11:21 AM | Permalink

Status for Andrew DeFaria

Searchable status reports and work log

December 28, 2006

Mkcc.pm and mkvob

The old mkvob_db

December 27, 2006

Reworking GPDB tables, mkview

December 19, 2006

GPDB Database performance

gpdb_getProjectsAtSite

December 14, 2006

Users/Sites and Projects

December 1, 2006

GPDB Bug Fixed