« Stgloc/Code Page | Main | Cleardiffmrg/bin_merge problems »

Code Page/create_dev_snapview.pl

  • Researched further code page issues
  • Added Everest (Isreal0 team members to PQA
  • Added -reuse_stream parm to create_dev_snapview.pl
  • Handed a preliminary copy of bin_merge to Jennifer
  • Submitted defect to IBM/Rational regarding CharacterSetValidation and Code Pages

CharacterSetValidation Package is not that good

We have a remaining issue where a user will attempt to update a defect and receive an error from the CharacterSetValidation package. I re-engineered CheckCodePage.pl to check through the Controller database but it didn't find anything. Investigated the CharacterSetValidation package a little more in depth - how does it validate characters? How does it do it differently than me?

Turns out that it merely does:

  if ($line =~ /[^\t\n\r -\177]/) {
    error
  }

This odd regex seems to be merely checking to see if the characters are printable or not. I don't see how this would, for example, prevent a non US ASCII character like é from slipping through!

Narrowing it down a bit with my CheckCodePage.pl (which merely checks that characters are in the range of ordinal 0 - 127 (AKA the US ASCII character set) I figured out that the above regex was failing on characters such as ordinal 7 (Bell) and others. IOW characters that are technically in the US ASCII character set but that are not printable. Not sure how to resolve this problem yet...

Submitted defect to IBM/Rational regarding CharacterSetValidation and CodePages

Recently we upgraded our Clearquest database using 2003.06.15. With this version of Clearquest comes the usage of Code Pages. We choose to set our Code Page to US-ASCII. In upgrading our database we checked to ensure that all character data was within the US-ASCII character set (characters in the range of 0-127). Additionally we installed the CharacterSetValidation package as the Clearquest Administration Guide says:

The CharacterSetValidation package prevents clients running earlier versions of ClearQuest from entering data in a user record from a code page other than the data code page value of that database. If you do not apply the CharacterSetValidation package to your schemas, it is possible for users to enter unsupported data from the client and for data to be corrupted when modified on certain clients.

However it doesn't appear that the CharacterSetValidation package properly prevents unsupported data from being entered. Additionally it actually is causing us problems because characters that are valid US-ASCII are being flagged by the CharacterSetValidation package as unsupported.

US-ASCII is, according to http://en.wikipedia.org/wiki/ASCII:

ASCII is, strictly, a seven-bit code, meaning that it uses the bit patterns representable with seven binary digits (a range of 0 to 127 decimal) to represent character information.

Ergo US-ASCII is the characters whose ordinal values lie in the range of 0-127.

The CharacterSetValidation package installs a few Perl subroutines into the CQ schema that are supposed to check that the character data matches the code page of the database. However there doesn't seem to be any Perl code to check the difference between say a code page of US-ASCII or say Latin-1.

What it all comes down to eventual, as far as I can see, is a call to check_for_control_chars_in_line. The operative Perl code is:

if ($line =~ /[^\t\n\r -\177]/) {
  return (" contains unsupported character(s):");
}

The above regex seems to be saying if the string $line is not a tab (\t), a linefeed (\n), a carriage return (\r) or in the range of space through \177 then an unsupported character has occurred. This does not include the entire US-ASCII range of 0-127. So if a client generated a yen character (¥) the above would not flag it as an unsupported character and yet (¥ - ord 165) is not a US-ASCII character.

Finally, since we only screened for US-ASCII in the 0-127 range, we have data that contains things like the US-ASCII Bel (ord 7) and the like that CharacterSetValidation is flagging as invalid.