GPDB Database performance

  • Posted on
  • by
  • in
  • Moved convertdb and gpdb_add_vob into Clearcase
  • Attempting to standardize which Perl to use, which Oracle.pm to pickup and how to insure that other sites have the proper prerequisites for GPDB
  • Discovered that Oracle is not supported on Linux here at TI. This will be a problem for GPDB
  • Still working on issues of the new GPDB design and attempting to get gpdb_add_project.pl to work with it
  • Got definition of performance problem that Donna is experiencing. She is attempting to populate a pull down with just the project names for a site. Doing so causes lots of transfer of data as the current GPDB API gpdb_getProject effectively transfers all kinds of project information where Donna needs only the project names.
  • Developed a new API, gpdb_getProjectsAtSite that returns only the project names in a more efficient manner

gpdb_getProjectsAtSite

Donna may be right and we may need to enlist the help of Ajay here.

I coded up a gpdb_getProjectsAtSite function:

sub gpdb_getProjectsAtSite ($$) {
  my ($site_name, $resource) = @_;

  resetErr ();
 
  unless (lc $resource eq "clearcase" or
          lc $resource eq "designsync") {
    setError (-1, "gpdb_getProjectsAtSite: Resource must be one of 'clearcase' or 'designsync'");
    return ();
  } # unless

  my $siteID    = siteID $site_name;
  my $condition    = "site_id = $siteID and $resource = 'Y'";

  my @projects = @{GPDB::primitive::searchData ("projects", $condition)};
  my @project_names;

  foreach (@projects) {
    my %project    = %{$_};
    my $name    = projectName $project {PARENT_PROJ_ID};

    next if !$name;

    push @project_names, $name;
  } # foreach

  return @project_names;
} # gpdb_getProjectsAtSite

Basically you call it with a site name and a resource (being clearcase or designsync). It does some housekeeping (resetting the error variables and checking that resource is one of clearcase or designsync). Next it translates the site name to an ID. We need an ID and we shouldn't burden the users with having to supply that. The siteID function is a new internal function for gpdb.pm because I often find the need to translate a site name to an ID. Next we compose a condition which is the part after "where" that says find things that have "site_id = $siteID and $resource = 'Y'". Remember resource is either "clearcase" or "designsync" and we wish to find project records where the site matches and the resource is toggled on (i.e. = 'Y').

There's a new primitive, searchData because getData only finds single records by "ID" only and findData will return multiple records based on a fieldname = value specific condition. Here we want two different fieldname/value pairs and an "and" condition. Therefore the searchData primitive takes two parameters, the table name and the condition, and composes a "select * from $tableName where $condition" and returns an array of hashes like findData does.

At this point we have an array of projects whose site IDs match our passed in Site Name and whose $resource is toggled on as 'Y'. But we want to return project names to be nice for the user and that's what the foreach loop does. Note it calls another new internal routine called projectName which returns the project's name for the product ID (the parent project ID that is). Note also that projectName will return undef if the project is retired. That's what the "next if !$name" statement is for. All non-retired project names therefore are pushed onto @project_names which are returned from the subroutine.

I do not see how I could make this any faster.

Well how did it perform? Selecting on Dallas and Clearcase projects (because gpdb_add_project.pl that does DesignSync additions is still not working well) there are 194 Clearcase projects at the Dallas site. Running a small test script here and at Manchester yields:

Dallas:time testproj_names.pl
real    0m7.156s
user    0m1.260s
sys     0m0.310s
 
Manchester:time testproj_names.pl
real    0m23.708s
user    0m0.490s
sys     0m0.170s

That's 24 seconds from Manchester or roughly 3.5 times as slow.

Switching over to selecting designsync records, there are 9 of them at Dallas. Timings for this are:

Dallas:time testproj_names.pl
real    0m1.402s
user    0m0.810s
sys     0m0.120s

Manchester:time testproj_names.pl
real    0m3.646s
user    0m0.300s
sys     0m0.080s

Again in the order of 3 times as slow