Finalized HybridOS
- Finished up CR 1 for HybridOS checkin - assigned to Thu for review
" />
« February 2005 | Main | April 2005 »
HybridOS has been checked in and built. As you know the binary comparison procedure described in the GS: LOS178 Impact Summary discovered more differences. Specifically 227 .o files had differences. Further investigation revealed that the action of committing the sources to CVS caused $Header/ident strings to change. The following describes the changes to the $Header strings due to cvs commit:
tomcat:strings -a orig.uipc_usrreq.o | grep Header $Header: /cvs/los178-cvs/los178/sys/networking/tcpip/general/uipc_usrreq.c,v 1.1.1.1 2004/03/03 00:59:24 emooring Exp $ tomcat:strings -a new.uipc_usrreq.o | grep Header $Header: /cvs/hybrid-os-cvs/los178/sys/networking/tcpip/general/uipc_usrreq.c,v 1.1 2005/03/30 00:39:03 adefaria Exp $
The changes are as follows:
Using objdump once again to disassemble these .o files and comparing the output left us with the following .o files that were still different:
Closer examination of these .o files reveals that the also contained ident strings in the text segment that had the same differences as the $Header differences described above. In other words the code was the same but the version strings and dates changed, as is expected.
Well the build finished but the binary comparison as per the Impact Summary failed. For a while I thought I did something wrong so I went back and re-extracted from the SCL and rebuild the old LOS178 that I had stored on the side, etc. Still it kept failing! Not only 25 files that were different and needed to be disassembled and compared but more like 228 files! What's going on?!?
So I dug deeper... Seems that $Header is embedded in some .o files and the $Headers differ (picking at random a .o that didn't compare):
tomcat:strings -a new.uipc_usrreq.o | grep Header $Header: /cvs/hybrid-os-cvs/los178/sys/networking/tcpip/general/uipc_usrreq.c,v 1.1 2005/03/30 00:39:03 adefaria Exp $ tomcat:strings -a orig.uipc_usrreq.o | grep Header $Header: /cvs/los178-cvs/los178/sys/networking/tcpip/general/uipc_usrreq.c,v 1.1.1.1 2004/03/03 00:59:24 emooring Exp $ tomcat:
So as you can see, we have differences. I don't know why all 2543 .o files extracted from the .a files didn't all differ.
Export sources from LOS178 CVS tree using the CVS tag REL_LOS178_2p0p0_ppc_FCS. The export will come from the machine named Rock using CVSROOT=:pserver:anoncvs@rock:/cvs/los178-cvs:
tomcat:export CVSROOT=:pserver:anoncvs@rock:/cvs/los178-cvs tomcat:cvs login Logging in to :pserver:anoncvs@rock:2401/cvs/los178-cvs CVS password: tomcat:cvs export -r REL_LOS178_2p0p0_ppc_FCS los178
Extract prebuilt CDK (sunos-xcoff-ppc) binary also using the tag of REL_LOS178_2p0p0_ppc_FCS from Rock. Note that this prebuilt CDK comes from the bin-image section of the CVS repository and that we are only using the ppc.cdksol.tar.gz image:
tomcat:cvs export -r REL_LOS178_2p0p0_ppc_FCS bin-image/ppc.cdksol.tar.gz
The package.sh script from the toolbox area is used to package up the images so we need to extract that too:
tomcat:cvs export -r REL_LOS178_2p0p0_ppc_FCS toolbox/package.sh
Perform test build.
Steps performed are:
tomcat:mkdir ppc_dev
tomcat:rsync -a los178 ppc_dev
tomcat:cd ppc_dev tomcat:gnutar -zxpf ../bin-image/ppc.cdksol.tar.gz
tomcat:make DEVELOPMENT=yes install > install.log
This binary comparison test is different from the normal binary comparison tests. Basically we are simply extracting all .o's from all .a's in the packaged versions of the product. A little utility script was written to find all .a libraries and copy them to an area (complibs) broken out by the path to the library, then extract all .o's from the .a's. This script is called unpack_libs. It is not intended that such a comparison be performed on a regular basis so this script is more of a one shot script.
Further, a build will create a lot of libraries but not all libraries created will be packaged and shipped. Since we are comparing against a previously built and packaged release we must package up and unpack the build we just performed. This is done using the toolbox/package.sh script as follows:
tomcat:toolbox/package.sh ppc_dev dev
tomcat:mkdir new tomcat:cd new tomcat:for tarfile in ../media/*.tar.gz; do > gnutar -zxpf $tarfile > done
tomcat:mkdir complibs tomcat:../unpack_libs
tomcat:cd .. tomcat:mkdir old tomcat:cd old tomcat:# copy old tar images here tomcat:for tarfile in *.tar.gz; do > gnutar -zxpf $tarfile > done
tomcat:mkdir complibs tomcat:../unpack_libs
tomcat:cd .. tomcat:diff -r old/complibs new/complibs
Sources will be imported into the CVS repository using the following command:
tomcat:cd los178 tomcat:export CVSROOT=:pserver:adefaria@tomcat:/cvs/hybrid-os-cvs tomcat:cvs login Logging in to :pserver:adefaria@tomcat:2401/cvs/hybrid-os-cvs CVS password: tomcat:# First add all directories tomcat:find . ! -name CVS -type d -exec cvs add -m "HybridOS import from LOS178" {} \; tomcat:# Now add all files tomcat:find . -type f -exec cvs add -m "HybridOS import from LOS178" {} \; tomcat:cvs commit
Additionally the binary CDK image was checked into binary-image:
tomcat:cd ../bin-image tomcat:cvs add -m "HybridOS import from LOS178" ppc.cdksol.tar.gz tomcat:cvs commit
Finally the toolbox/package.sh script as checked into toolbox:
tomcat:cd ../toolbox tomcat:cvs add -m "HybridOS import from LOS178" package.sh tomcat:cvs commit
All sources, bin packages and toolbox scripts are then tagged:
tomcat:cvs tag REL_HYBRIDOS_1p0_ppc_20050328 los178 bin-image toolbox
Next we check out all sources, bin-image and toolbox scripts into new fresh areas and then perform the build procedure as described above.
Perform the binary comparison described above again.
Use the package script to package up the images and place in the archive area at tomcat:/export/dev_archive/hybridos/1p0/20050328/solaris/media/ppc
This has been bugging me for a while and I finally tracked it down. Often I'd build a toolchain then attempt to build LynxOS and it would fail when attempting to get the compiler. It seems that the toolchain build was packing up the compiler tar image with one name and the build scripts were using another name to try to find it. This resulted in errors. Now I had gotten around this via a symlink but I've been wanting to make the two build procedures agree on the names of things...
As Adam writes here the preferred name for the toolchain tar image is derived from config.guess:
Andrew DeFaria writes:
toolchain-i686-pc-linux-gnu-i386.tar.gz
This. But I think we get this from config.guess so try to see how this nice level of abstraction fails before you hard-code something.
The "toolchain-" portion is standard for the toolchain. The "i686-pc-linux-gnu" portion comes out of config.guess:
[int@dopey 20050207]$ /export/build1/LYNXOS_500/work_area/toolchain/3.2.2/toolchain/src/config.guess i686-pc-linux-gnu
However the int_tools uses the following code to determine the name of the toolchain tar image:
proc Unload_com { platform dir comp_release format host } { switch "$host" { "linux" { set host_platform "i686-pc-linux-gnuaout" } "win32" { set host_platform "i686-pc-cygwin" } "sunos" { set host_platform "sparc-sun-solaris2.7" } "lynxos" { if { "$platform" == "x86" } { set host_platform "i386-lynx-lynxos" } if { "$platform" == "ppc" } { set host_platform "powerpc-lynx-lynxos" } } } if { "$platform" == "x86" } { set target_platform "i386" } else { set target_platform "$platform" } set COMPILER_TAR_GZ "toolchain-$host_platform-$target_platform.tar.gz"
The highlighted portion above is the line in error and the underlined portion should change to simply "gnu". The int_tools do not have the benefit of being able to call config.guess so this could likely break in the future again.
I will perform this change, along with other int_tool changes required for the new tag labeling under and ECR.
25 files did not compare. Turns out these were probably generated by the assembler. Instead we use objdump which comes in in the CDK. This eliminates differences that may be due to date/timestamps.
Alexander Sanochkin wrote:
Andrew,
It seems you tried to use a build tag which was not ready to rebuild BlueCat at that time. Also please note that the main BC build script has changed due to updating the BC cross compiler to version 3.4.3. The script is called do_it-bc5.0-gcc_3.4.3. You can get it from the BlueCat CVS. (/cm/CVS/BlueCat/eng/int/scripts).
Regarding the glib build problem we can not provide intelligent comments at this time as it seems that the 20050314 environment is not available for us on the jaguar machine.
What build tag are we supposed to use? I found R_5_2_1_ppc_20050319 and I assume that is what I should use. However the difference between do_it-bc5.0-gcc_3.4.3 and do_it-bc5.0 is merely:
[int@jaguar loc_archive]$ diff do_it-bc5.0.orig do_it-bc5.0-gcc_3.4.3.orig 108c108 < export BC_TARGET=$BLUECAT_TARGET_CPU-lynx-linux-bluecat --- > export BC_TARGET=$BLUECAT_TARGET_CPU-lynx-linux-gnubc
And, if I might ask, why all the version numbers in the file? Why isn't it just named do_it and depending on which CVS (RCS?) tag you use you get a 5.0 or a 5.0-gcc_3.4.3 version?
Also, with do_it-bg5.0-gcc_3.4.3 I suspect that the changes I made for the patch-spec (changing do_step to perform patches between steps 1 and 2 when run in stepwise fashion) have not be incorporated. Seems to me that there are two steps being done in automated mode (i.e. doing all steps at one time) that are not performed when doing things in a stepwise fashion. The first is the building of the new GNU Tools which is effectively step 0 and the second is this patching thing which is normally done between steps 1 and 2. Might I suggest that we make these regular steps in their proper order and renumber the rest?
Build is still failing (on Jaguar). I get to step4 and it fails with:
Building glib package step 4.3 at 22:51:22 parse_file: build_package failed for glib_trg.spec ---- Step 4 finished successfully at Tue Mar 22 14:51:39 PST 2005 ---- Looking at step4/build_glib.log I see: [int@jaguar step4]$ tail -f build_glib.log + ac_cv_func_getpwuid_r=yes + ac_cv_func_mutex_trylock=yes + ac_cv_func_cond_timedwait=yes + glib_cv_sizeof_gmutex=24 + glib_cv_byte_contents_gmutex=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 + ./configure --build=i386-linux-gnu --host=ppc-bluecat-linux configure: warning: Could not determine POSIX flag. (-posix didn't work.) configure: error: can not run test program while cross compiling error: Bad exit status from /usr/lynx/loc_archive/build/20050319/var/tmp/rpm-tmp.40621 (%build) Bad exit status from /usr/lynx/loc_archive/build/20050319/var/tmp/rpm-tmp.40621 (%build)
Attempting to execute rpm-tmp.40621 reveals:
gcc -g -O2 -Wall -D_REENTRANT -o testglib testglib.o .libs/libglib.a .libs/libglib.a(gmessages.o): In function `g_logv': /usr/lynx/loc_archive/build/20050319/cdt/src/bluecat/BUILD/glib-1.2.10/gmessages.c:343: undefined reference to `va_copy' .libs/libglib.a(gstrfuncs.o): In function `g_strdup_vprintf': /usr/lynx/loc_archive/build/20050319/cdt/src/bluecat/BUILD/glib-1.2.10/gstrfuncs.c:154: undefined reference to `va_copy' collect2: ld returned 1 exit status make[2]: *** [testglib] Error 1 make[2]: Leaving directory `/usr/lynx/loc_archive/build/20050319/cdt/src/bluecat/BUILD/glib-1.2.10' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/usr/lynx/loc_archive/build/20050319/cdt/src/bluecat/BUILD/glib-1.2.10' make: *** [all-recursive-am] Error 2 + exit 0
Any ideas?
Build of the Native PPC Toolchain keeps failing for me but not for Oleg. Oleg's been suggesting that I lower the ulimits to -s 100000 and -d 200000, which I did but which fails for me and not Oleg. Oleg writes:
Perhaps the time of the day has some effect on the file system behaviour (or an increased local network activity during the business time has some adverse effect on the system stability). Please try to start the toolchain build at your end-of-business time.
Meanwhile, we will ponder on what else can be wrong.
Changed install_snmp.sh to install snmpd-sample.conf into /usr/local/share/snmp/snmpd.conf for both the bin and src packages. In the bin package we put a copy of snmpd-sample.conf into tmp.
Still need to figure out how/where to check this into the RCS tree.
Build scripts do not do this automatically but the build scripts have other issues.
There are two "steps" that are not really steps and that are performed only in automated mode. Moscow had reccommended that we insert an exit statement in a certain location and run in automated mode (as root I can only assume) to get the Bluecat GNU Tools. I had originally subsetted this out and built them as int. However since I was having so many odd problems I decided to go strictly as Moscow instructs.
I was asked to rebuild Bluecat using R_5_2_1_ppc_20050314 as a tag. I tried doing this but it is failing. I made it to step 4 when it was failing with something about unable to find /arch/ppc/Makefile or something like that. I decided to instead go back to the beginning and make the GNU Tools (step 0) just to be sure. Now I get stuck at step 3. It fails in an odd way too. In /usr/lynx/loc_archive under LOGS I have the following at the tail of the step3 log:
Building glibc package step 3.3 at 14:14:07 Done Installing glibc package Done Building glib package step 3.4 at 17:06:12 parse_file: build_package failed for glib_cdt.spec ---- Step 3 finished successfully at Tue Mar 15 17:07:47 PST 2005 ---- +++ Exit .... +++
However in archive/20050314/ppc/logs/step3 for build_glib.log I have:
testglib.c:915: warning: const qualifier ignored on asm .libs/libglib.so: undefined reference to `__ctype_b' .libs/libglib.so: undefined reference to `__ctype_toupper' .libs/libglib.so: undefined reference to `__ctype_tolower' collect2: ld returned 1 exit status make[2]: *** [testglib] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all-recursive-am] Error 2 error: Bad exit status from /usr/lynx/loc_archive/build/20050314/var/tmp/rpm-tmp.95118 (%build) Bad exit status from /usr/lynx/loc_archive/build/20050314/var/tmp/rpm-tmp.95118 (%build)
Do you know what is going wrong?
Well rebuilding didn't help (I didn't think it would). Things seem to be failing in the rpm-tmp.
[int@jaguar glib-1.2.10]$ gcc -O2 -g -march=i386 -Wall -D_REENTRANT -o .libs/testglib testglib.o .libs/libglib.so -Wl,--rpath -Wl,/cdt/lib .libs/libglib.so: undefined reference to `va_copy' collect2: ld returned 1 exit status
Oh, BTW, the old machine (penguin) has been renamed to jaguar but is otherwise the same.
I'm attempting to run step 3 again but I fear it'll just have the same error.
Note I will be monitoring email and this build from home so if you have any ideas send them right away.
We have a chicken and egg situation here. Apparently the src package needs to be built before the build happens, yet it also needs kern_mib.o and kern_mib.d.o, which you get after the build!. The scripts do not handle this gracefully so I rebuilt the packages by hand. Julia says the build is OK now but repeatability will be a problem
After some initial trouble installing from CDROM (Needed to set IDE to Legacy in the BIOS) I managed to install RH 8.0. However I sized the root drive to only 2048 Meg. Turns out the RH 8.0 installation took 2020 Meg so I want to redo this. Upon reinstallation I'm again having problems with the CD. Seems the first CD has etchings on the outermost ring of the CD itself. Not sure if I caused this or if it was there before. May need to get anotehr CD for RH 8.0
When building the toolchain on x86 natively I kept getting different failures. It was frustrating to say the least. Conferred with Adam and he remembered a problem with the kernel changing timestamps on files due to an nmap problem (ECR 22905) which still seems to be a problem. Emailed Vlad about this...
Bluecat installation was takng a long time due to timeouts in NFS for a stale file handle for some remotely mounted CDROM. Still have an issue with one RPM that has int:staff ownership:
Regarding ECR 23001:
I was trying to build the native PPC toolchain again. The ECR I submitted was 23184 but that was dupped to 22979 which is in Pending Review state. So I pulled 22979 and attempted to build. But 22979 depends on 23001 (also in Pending Review). Is it OK to pull 23001 and continue onward? How are dependencies between ECRs normally handled?
Note: I say "depends on" but I'm not sure that that is the right terminology. I don't really believe that 23001 depends on 22979 nor vica versa rather there doesn't seem to be a clear process here. 22979 pulls in src/lib/libc/Makefile revision 10.25 but 10.24 has a change for 23001 in it. So 23001 is needed because of this. The other files involved in 23001 do not have the Lion_lynxos_012405 tag so they are not picked up in the normal build process - hence the build fails. This is not a classic dependency rather it's an overrunning of checkins. Of course, maybe my analysis of this situation is flawed.