" /> Status for Andrew DeFaria: March 2005 Archives

« February 2005 | Main | April 2005 »

March 31, 2005

Finalized HybridOS

  • Finished up CR 1 for HybridOS checkin - assigned to Thu for review

March 30, 2005

HybridOS Binary Comparison problems

  • Managed to perform binary comparison of HybridOS

HybridOS Binary Comparison

HybridOS has been checked in and built. As you know the binary comparison procedure described in the GS: LOS178 Impact Summary discovered more differences. Specifically 227 .o files had differences. Further investigation revealed that the action of committing the sources to CVS caused $Header/ident strings to change. The following describes the changes to the $Header strings due to cvs commit:

tomcat:strings -a orig.uipc_usrreq.o | grep Header
$Header: /cvs/los178-cvs/los178/sys/networking/tcpip/general/uipc_usrreq.c,v  1.1.1.1 2004/03/03 00:59:24 emooring Exp $
tomcat:strings -a new.uipc_usrreq.o | grep Header
$Header: /cvs/hybrid-os-cvs/los178/sys/networking/tcpip/general/uipc_usrreq.c,v 1.1 2005/03/30 00:39:03 adefaria Exp $

The changes are as follows:

  1. CVS Repository name changed from los178-cvs -> hybrid-os-cvs (blue)
  2. Revision changed from whatever it was -> 1.1 (orange) All revisions for HybridOS are now 1.1
  3. Date changed to reflect time of cvs commit to new CVS repository (green)
  4. User changed from whatever it was -> adefaria (purple) since I was the user to perform the commit

Using objdump once again to disassemble these .o files and comparing the output left us with the following .o files that were still different:

  1. /sys/lib/libcsp_970.a (context_asm.o)
  2. /sys/lib/libcsp_970.a (csp_cpu_asm.o)
  3. /sys/lib/libcsp_970.a (flih.o)
  4. /sys/lib/libcsp_970.a (fpu_asm.o)
  5. /sys/lib/libcsp_970.a (launch_asm.o)
  6. /sys/lib/libcsp_970.a (tlbmiss.o)

Closer examination of these .o files reveals that the also contained ident strings in the text segment that had the same differences as the $Header differences described above. In other words the code was the same but the version strings and dates changed, as is expected.

March 29, 2005

Hybrid OS

  • Checked HybridOS into CVS
  • Rebuilt HybridOS
  • Attempted binary comparison - fails due to $Header$ strings

Binary Differences

Well the build finished but the binary comparison as per the Impact Summary failed. For a while I thought I did something wrong so I went back and re-extracted from the SCL and rebuild the old LOS178 that I had stored on the side, etc. Still it kept failing! Not only 25 files that were different and needed to be disassembled and compared but more like 228 files! What's going on?!?

So I dug deeper... Seems that $Header is embedded in some .o files and the $Headers differ (picking at random a .o that didn't compare):

tomcat:strings -a new.uipc_usrreq.o | grep Header
$Header: /cvs/hybrid-os-cvs/los178/sys/networking/tcpip/general/uipc_usrreq.c,v 1.1 2005/03/30 00:39:03 adefaria Exp $
tomcat:strings -a orig.uipc_usrreq.o | grep Header
$Header: /cvs/los178-cvs/los178/sys/networking/tcpip/general/uipc_usrreq.c,v 1.1.1.1 2004/03/03 00:59:24 emooring Exp $
tomcat:

So as you can see, we have differences. I don't know why all 2543 .o files extracted from the .a files didn't all differ.

March 28, 2005

HybridOS built

  • Built Hybrid OS for GD and performed binary comparison
  • Adding files to CVS on Tomcat
  • Completed GD LOS178 Impact Summary
  • Resolved long standing issue regarding gnuaout vs. gnu

GD LOS178 Impact Summary

Export sources from LOS178

Export sources from LOS178 CVS tree using the CVS tag REL_LOS178_2p0p0_ppc_FCS. The export will come from the machine named Rock using CVSROOT=:pserver:anoncvs@rock:/cvs/los178-cvs:

tomcat:export CVSROOT=:pserver:anoncvs@rock:/cvs/los178-cvs
tomcat:cvs login
Logging in to :pserver:anoncvs@rock:2401/cvs/los178-cvs
CVS password:
tomcat:cvs export -r REL_LOS178_2p0p0_ppc_FCS los178

Extract prebuilt CDK

Extract prebuilt CDK (sunos-xcoff-ppc) binary also using the tag of REL_LOS178_2p0p0_ppc_FCS from Rock. Note that this prebuilt CDK comes from the bin-image section of the CVS repository and that we are only using the ppc.cdksol.tar.gz image:

tomcat:cvs export -r REL_LOS178_2p0p0_ppc_FCS bin-image/ppc.cdksol.tar.gz

Extract other tools

The package.sh script from the toolbox area is used to package up the images so we need to extract that too:

tomcat:cvs export -r REL_LOS178_2p0p0_ppc_FCS toolbox/package.sh

Test Build

Perform test build.

Note: Test build requires a symlink from /usr/lynx/3.1.0/ppc/cdk/sunos-xcoff-ppc/bin/bison.simple -> $ENV_PREFIX/cdk/sunos-xcoff-ppc-/bin/bison-simple due to a hard coded path dependency. This path was not modified due to LOS-178 RSC restrictions

Steps performed are:

  1. Create ppc_dev area to perform the build in:
    tomcat:mkdir ppc_dev
    
  2. Copy in sources:
    tomcat:rsync -a los178 ppc_dev
    
  3. Unpack CDK into build area
    tomcat:cd ppc_dev
    tomcat:gnutar -zxpf ../bin-image/ppc.cdksol.tar.gz
    
  4. Perform build:
    tomcat:make DEVELOPMENT=yes install > install.log
    
  5. Check install.log for errors

Perform binary comparison test

This binary comparison test is different from the normal binary comparison tests. Basically we are simply extracting all .o's from all .a's in the packaged versions of the product. A little utility script was written to find all .a libraries and copy them to an area (complibs) broken out by the path to the library, then extract all .o's from the .a's. This script is called unpack_libs. It is not intended that such a comparison be performed on a regular basis so this script is more of a one shot script.

Further, a build will create a lot of libraries but not all libraries created will be packaged and shipped. Since we are comparing against a previously built and packaged release we must package up and unpack the build we just performed. This is done using the toolbox/package.sh script as follows:

  1. Package up the image just built:
    tomcat:toolbox/package.sh ppc_dev dev
    
  2. Unpack images to new area:
    tomcat:mkdir new
    tomcat:cd new
    tomcat:for tarfile in ../media/*.tar.gz; do
    > gnutar -zxpf $tarfile
    > done
    
  3. Gather all libraries and extract their .o's:
    tomcat:mkdir complibs
    tomcat:../unpack_libs
    
  4. Unpack old images (from t3:/export/scl/los178/2p0p0/FCS/) to old area:
    tomcat:cd ..
    tomcat:mkdir old
    tomcat:cd old
    tomcat:# copy old tar images here
    tomcat:for tarfile in *.tar.gz; do
    > gnutar -zxpf $tarfile
    > done
    
  5. Gather all libraries and extract their .o's:
    tomcat:mkdir complibs
    tomcat:../unpack_libs
    
  6. Perform diff
    tomcat:cd ..
    tomcat:diff -r old/complibs new/complibs
    
  7. The above will result in 25 .o files being different. Use objdump -D to disassemble these files and compare the disassembled output. No differences detected in disassembled output.

Import sources into new CVS repository

Sources will be imported into the CVS repository using the following command:

tomcat:cd los178
tomcat:export CVSROOT=:pserver:adefaria@tomcat:/cvs/hybrid-os-cvs
tomcat:cvs login
Logging in to :pserver:adefaria@tomcat:2401/cvs/hybrid-os-cvs
CVS password:
tomcat:# First add all directories
tomcat:find . ! -name CVS -type d -exec cvs add -m "HybridOS import from LOS178" {} \;
tomcat:# Now add all files
tomcat:find . -type f -exec cvs add -m "HybridOS import from LOS178" {} \;
tomcat:cvs commit

Additionally the binary CDK image was checked into binary-image:

tomcat:cd ../bin-image
tomcat:cvs add -m "HybridOS import from LOS178" ppc.cdksol.tar.gz
tomcat:cvs commit

Finally the toolbox/package.sh script as checked into toolbox:

tomcat:cd ../toolbox
tomcat:cvs add -m "HybridOS import from LOS178" package.sh
tomcat:cvs commit

Tag initial sources with the tag REL_HYBRIDOS_1p0_ppc_20050328

All sources, bin packages and toolbox scripts are then tagged:

tomcat:cvs tag REL_HYBRIDOS_1p0_ppc_20050328 los178 bin-image toolbox

Check out all sources and prebuilt CDK and perform build procedure again

Next we check out all sources, bin-image and toolbox scripts into new fresh areas and then perform the build procedure as described above.

Following successful build perform binary comparison test again

Perform the binary comparison described above again.

Package build to archive area

Use the package script to package up the images and place in the archive area at tomcat:/export/dev_archive/hybridos/1p0/20050328/solaris/media/ppc

Long standing issue regarding gnuaout vs. gnu

This has been bugging me for a while and I finally tracked it down. Often I'd build a toolchain then attempt to build LynxOS and it would fail when attempting to get the compiler. It seems that the toolchain build was packing up the compiler tar image with one name and the build scripts were using another name to try to find it. This resulted in errors. Now I had gotten around this via a symlink but I've been wanting to make the two build procedures agree on the names of things...

As Adam writes here the preferred name for the toolchain tar image is derived from config.guess:

Andrew DeFaria writes:

toolchain-i686-pc-linux-gnu-i386.tar.gz

This. But I think we get this from config.guess so try to see how this nice level of abstraction fails before you hard-code something.

The "toolchain-" portion is standard for the toolchain. The "i686-pc-linux-gnu" portion comes out of config.guess:

[int@dopey 20050207]$ /export/build1/LYNXOS_500/work_area/toolchain/3.2.2/toolchain/src/config.guess
    i686-pc-linux-gnu

However the int_tools uses the following code to determine the name of the toolchain tar image:

proc Unload_com { platform dir comp_release format host } {

  switch "$host" {
    "linux"  { set host_platform "i686-pc-linux-gnuaout" }
    "win32"  { set host_platform "i686-pc-cygwin" }
    "sunos"  { set host_platform "sparc-sun-solaris2.7" }
    "lynxos" { if { "$platform" == "x86" } {
                   set host_platform "i386-lynx-lynxos"
               }
               if { "$platform" == "ppc" } {
                     set host_platform "powerpc-lynx-lynxos"
               }
             }
  }
  if { "$platform" == "x86" } {
        set target_platform "i386"
  } else {
        set target_platform "$platform"
  }
  set COMPILER_TAR_GZ "toolchain-$host_platform-$target_platform.tar.gz"

The highlighted portion above is the line in error and the underlined portion should change to simply "gnu". The int_tools do not have the benefit of being able to call config.guess so this could likely break in the future again.

I will perform this change, along with other int_tool changes required for the new tag labeling under and ECR.

March 25, 2005

LOS178 compares

  • LOS178 does finally compare. Apparently .o files produced by assembly must be compared at the source level by using objdump to dump assembly
  • Attempting build of LOS178 on Tomcat before putting GD LOS178 into CVS.
  • Hit problem with hard coded paths - need Jeff to resolve these

Comparing Assembled .o files

25 files did not compare. Turns out these were probably generated by the assembler. Instead we use objdump which comes in in the CDK. This eliminates differences that may be due to date/timestamps.

March 23, 2005

LOS178 build

  • Building LOS178 Development version to compare to previously release images

March 22, 2005

Bluecat build still failing

  • Bluecat build still fails with same problem. Emailed Sasha
  • Native PPC Toolchain still failing - same problems
  • Recreated cvsr.php - file was previously deleted

Bluecat build failing

Alexander Sanochkin wrote:

Andrew,

It seems you tried to use a build tag which was not ready to rebuild BlueCat at that time. Also please note that the main BC build script has changed due to updating the BC cross compiler to version 3.4.3. The script is called do_it-bc5.0-gcc_3.4.3. You can get it from the BlueCat CVS. (/cm/CVS/BlueCat/eng/int/scripts).

Regarding the glib build problem we can not provide intelligent comments at this time as it seems that the 20050314 environment is not available for us on the jaguar machine.

What build tag are we supposed to use? I found R_5_2_1_ppc_20050319 and I assume that is what I should use. However the difference between do_it-bc5.0-gcc_3.4.3 and do_it-bc5.0 is merely:

[int@jaguar loc_archive]$ diff do_it-bc5.0.orig do_it-bc5.0-gcc_3.4.3.orig
108c108
<   export BC_TARGET=$BLUECAT_TARGET_CPU-lynx-linux-bluecat
---
>   export BC_TARGET=$BLUECAT_TARGET_CPU-lynx-linux-gnubc

And, if I might ask, why all the version numbers in the file? Why isn't it just named do_it and depending on which CVS (RCS?) tag you use you get a 5.0 or a 5.0-gcc_3.4.3 version?

Also, with do_it-bg5.0-gcc_3.4.3 I suspect that the changes I made for the patch-spec (changing do_step to perform patches between steps 1 and 2 when run in stepwise fashion) have not be incorporated. Seems to me that there are two steps being done in automated mode (i.e. doing all steps at one time) that are not performed when doing things in a stepwise fashion. The first is the building of the new GNU Tools which is effectively step 0 and the second is this patching thing which is normally done between steps 1 and 2. Might I suggest that we make these regular steps in their proper order and renumber the rest?

Build is still failing (on Jaguar). I get to step4 and it fails with:

Building glib package step 4.3 at 22:51:22
parse_file: build_package failed for glib_trg.spec
---- Step 4 finished successfully at Tue Mar 22 14:51:39 PST 2005 ----

Looking at step4/build_glib.log I see:

[int@jaguar step4]$ tail -f build_glib.log
+ ac_cv_func_getpwuid_r=yes
+ ac_cv_func_mutex_trylock=yes
+ ac_cv_func_cond_timedwait=yes
+ glib_cv_sizeof_gmutex=24
+ glib_cv_byte_contents_gmutex=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
+ ./configure --build=i386-linux-gnu --host=ppc-bluecat-linux
configure: warning: Could not determine POSIX flag. (-posix didn't work.)
configure: error: can not run test program while cross compiling
error: Bad exit status from /usr/lynx/loc_archive/build/20050319/var/tmp/rpm-tmp.40621 (%build)
    Bad exit status from /usr/lynx/loc_archive/build/20050319/var/tmp/rpm-tmp.40621 (%build)

Attempting to execute rpm-tmp.40621 reveals:

gcc -g -O2 -Wall -D_REENTRANT -o testglib testglib.o .libs/libglib.a
.libs/libglib.a(gmessages.o): In function `g_logv':
/usr/lynx/loc_archive/build/20050319/cdt/src/bluecat/BUILD/glib-1.2.10/gmessages.c:343: undefined reference to `va_copy'
.libs/libglib.a(gstrfuncs.o): In function `g_strdup_vprintf':
    /usr/lynx/loc_archive/build/20050319/cdt/src/bluecat/BUILD/glib-1.2.10/gstrfuncs.c:154: undefined reference to `va_copy'
collect2: ld returned 1 exit status
make[2]: *** [testglib] Error 1
make[2]: Leaving directory `/usr/lynx/loc_archive/build/20050319/cdt/src/bluecat/BUILD/glib-1.2.10'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/lynx/loc_archive/build/20050319/cdt/src/bluecat/BUILD/glib-1.2.10'
make: *** [all-recursive-am] Error 2
+ exit 0

Any ideas?

Native PPC Toolchain failure

Build of the Native PPC Toolchain keeps failing for me but not for Oleg. Oleg's been suggesting that I lower the ulimits to -s 100000 and -d 200000, which I did but which fails for me and not Oleg. Oleg writes:

Perhaps the time of the day has some effect on the file system behaviour (or an increased local network activity during the business time has some adverse effect on the system stability). Please try to start the toolchain build at your end-of-business time.

Meanwhile, we will ponder on what else can be wrong.

March 18, 2005

NetSNMP/Native x86 toolchain problems/ECR Linkify

  • Change NetSNMP package to automatically handle installation of snmpd-sample.conf file
  • Native toolchain build still failing despite assistance from Moscow
  • Made ECR Linkify better
  • NetSNMP

    Changed install_snmp.sh to install snmpd-sample.conf into /usr/local/share/snmp/snmpd.conf for both the bin and src packages. In the bin package we put a copy of snmpd-sample.conf into tmp.

    Still need to figure out how/where to check this into the RCS tree.

    Build scripts do not do this automatically but the build scripts have other issues.

March 17, 2005

Bluecat Build

  • Bluecat build's step 4 failed again
  • Attempting to reproduce build with old tag

March 16, 2005

Bluecat Build

  • Building Bluecat from start including rebuilding the GNU Tools
  • Proceeded through steps 1-4 but I still have problems at step 4

Bluecat GNU Tools

There are two "steps" that are not really steps and that are performed only in automated mode. Moscow had reccommended that we insert an exit statement in a certain location and run in automated mode (as root I can only assume) to get the Bluecat GNU Tools. I had originally subsetted this out and built them as int. However since I was having so many odd problems I decided to go strictly as Moscow instructs.

March 15, 2005

BC Step 3 build failure

  • Attempting to rebuild BC with new tag.
  • Installed RH 8.0 on new machine

Bluecat build failure in step 3

I was asked to rebuild Bluecat using R_5_2_1_ppc_20050314 as a tag. I tried doing this but it is failing. I made it to step 4 when it was failing with something about unable to find /arch/ppc/Makefile or something like that. I decided to instead go back to the beginning and make the GNU Tools (step 0) just to be sure. Now I get stuck at step 3. It fails in an odd way too. In /usr/lynx/loc_archive under LOGS I have the following at the tail of the step3 log:

Building glibc package step 3.3 at 14:14:07
Done
Installing glibc package
Done
Building glib package step 3.4 at 17:06:12
parse_file: build_package failed for glib_cdt.spec
---- Step 3 finished successfully at Tue Mar 15 17:07:47 PST 2005 ----
+++ Exit .... +++

However in archive/20050314/ppc/logs/step3 for build_glib.log I have:

testglib.c:915: warning: const qualifier ignored on asm
.libs/libglib.so: undefined reference to `__ctype_b'
.libs/libglib.so: undefined reference to `__ctype_toupper'
.libs/libglib.so: undefined reference to `__ctype_tolower'
collect2: ld returned 1 exit status
make[2]: *** [testglib] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all-recursive-am] Error 2
error: Bad exit status from /usr/lynx/loc_archive/build/20050314/var/tmp/rpm-tmp.95118 (%build)
   Bad exit status from /usr/lynx/loc_archive/build/20050314/var/tmp/rpm-tmp.95118 (%build)

Do you know what is going wrong?


Later I wrote:

Well rebuilding didn't help (I didn't think it would). Things seem to be failing in the rpm-tmp. script. This script seems to be dynamically created. It sets some variables and basically does a configure and make in build/20050314/cdt/src/bluecat/BUILD/glib-1.2.10. This eventually boils down to the following gcc command:

[int@jaguar glib-1.2.10]$ gcc -O2 -g -march=i386 -Wall -D_REENTRANT -o .libs/testglib testglib.o .libs/libglib.so -Wl,--rpath -Wl,/cdt/lib
.libs/libglib.so: undefined reference to `va_copy'
collect2: ld returned 1 exit status

Oh, BTW, the old machine (penguin) has been renamed to jaguar but is otherwise the same.

I'm attempting to run step 3 again but I fear it'll just have the same error.

Note I will be monitoring email and this build from home so if you have any ideas send them right away.

March 14, 2005

NetSNMP/Bluecat build/Build Machine

  • Rebuilt NetSNMP 5.1.1. Need to package src before build
  • Restarted Bluecat build for new label
  • Installing RH 8.0 on new machine

NetSNMP 5.1.1

We have a chicken and egg situation here. Apparently the src package needs to be built before the build happens, yet it also needs kern_mib.o and kern_mib.d.o, which you get after the build!. The scripts do not handle this gracefully so I rebuilt the packages by hand. Julia says the build is OK now but repeatability will be a problem

Installation of RH 8.0

After some initial trouble installing from CDROM (Needed to set IDE to Legacy in the BIOS) I managed to install RH 8.0. However I sized the root drive to only 2048 Meg. Turns out the RH 8.0 installation took 2020 Meg so I want to redo this. Upon reinstallation I'm again having problems with the CD. Seems the first CD has etchings on the outermost ring of the CD itself. Not sure if I caused this or if it was there before. May need to get anotehr CD for RH 8.0

March 11, 2005

NetSNMP

  • Rebuilt NetSNMP.

March 10, 2005

Native x86 toolchain/Bluecat Installation/ECRD timeouts

  • Discovered problems with Native x86 toolchain.
  • Attempting Bluecat installation from CDs - taking a long time!
  • Implemented timeouts correctly in ecrd, which was having client deadlocks every night

Native X86 Toolchain Build Problem

When building the toolchain on x86 natively I kept getting different failures. It was frustrating to say the least. Conferred with Adam and he remembered a problem with the kernel changing timestamps on files due to an nmap problem (ECR 22905) which still seems to be a problem. Emailed Vlad about this...

Bluecat installation from CDs

Bluecat installation was takng a long time due to timeouts in NFS for a stale file handle for some remotely mounted CDROM. Still have an issue with one RPM that has int:staff ownership:

March 9, 2005

PPC Toolchain finally builds!

  • Finally managed to build Native PPC Toolchain after many interruptions due to machine moves
  • Revisiting build of Native X86 Toolchain which is failing.

March 8, 2005

netsnmp/t3 mount/PPC Toolchain

  • Build NetSNMP 5.1.1
  • Had Jeff export t3:/usr/lynx/archive/ecr so that ECRDig can gain access to auxilary files that people store there. Currently cat.php doesn't work with all the file types it can hit there.
  • Figured out that the tar images were not being recreated due to int_tools getting stuck in the demos. Recreated PPC tar image and unpacked it onto target machine.

March 7, 2005

Bluecat CDs

  • Burned 4 CDs for Bluecat
  • Continuing to hunt down problem with PPC Toolchain build. Missed a file. After including it libc.a does have atoi but it doesn't seem to be being packaged into the product image!

March 4, 2005

Yet more building

  • Finished Bluecat build process!
  • Pulling ECR 23001 into the LynxOS build for eventual Native PPC Toolchain build

Regarding ECR 23001:

I was trying to build the native PPC toolchain again. The ECR I submitted was 23184 but that was dupped to 22979 which is in Pending Review state. So I pulled 22979 and attempted to build. But 22979 depends on 23001 (also in Pending Review). Is it OK to pull 23001 and continue onward? How are dependencies between ECRs normally handled?

Note: I say "depends on" but I'm not sure that that is the right terminology. I don't really believe that 23001 depends on 22979 nor vica versa rather there doesn't seem to be a clear process here. 22979 pulls in src/lib/libc/Makefile revision 10.25 but 10.24 has a change for 23001 in it. So 23001 is needed because of this. The other files involved in 23001 do not have the Lion_lynxos_012405 tag so they are not picked up in the normal build process - hence the build fails. This is not a classic dependency rather it's an overrunning of checkins. Of course, maybe my analysis of this situation is flawed.

March 2, 2005

Building, building...

  • Pulled ECR 22979 and rebuilt lynxos
  • Placed new LynxOS on T3 and updated t-mcpn765-1
  • Rebuild of Toolchain still fails with unresolved references to atoi!
  • Found some corruption in LynxOS CVS. Working to get a list of files and to have this fixed
  • Bluecat build, step 4 still failing...
  • Attempting to build TOB Toolchain on x86 still failing

March 1, 2005

More BC building

  • Moscow's BC build failed in step 4 due to lack of disk space. Moved guest account over to new partition
  • Step 4 is not building for me either. Emailed Moscow
  • Returning to other builds. PPC Native Toolchain build was failing - attempting to resolve that. Also X86 Native Toolchain build also failing.
  • Several enhancements to ECR Dig:
    • ECRs now include status line with Status, State and Severity
    • Http and ftp references are made into links
    • ECR numbers are now made into links
    • Added footer with "Back to ECR Dig" link