1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2 "http://www.w3.org/TR/html4/strict.dtd">
5 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
6 <meta name="GENERATOR" content="Mozilla/4.61 [en] (Win98; U) [Netscape]">
7 <title>ClearSCM: Open Source Builds</title>
8 <link rel="stylesheet" type="text/css" media="screen" href="/css/Article.css">
9 <link rel="stylesheet" type="text/css" media="print" href="/css/Print.css">
10 <link rel="SHORTCUT ICON" href="http://clearscm.com/favicon.ico" type="image/png">
13 <script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
15 <script type="text/javascript">
16 _uacct = "UA-89317-1";
22 include "clearscm.php";
32 <?php start_box ("cs2");?>
33 <h2>Open Source Builds</h2>
36 <p>More and more organizations are using Open Source in their
37 product builds but is the Open Source build mechanisms efficient?
38 This article approaches this subject and shows how often Open
39 Source can be more trouble than it's worth.</p>
41 <h3>Open Source Model</h3>
43 <p>Much hype has been given to the Open Source movement and
44 rightfully so. Developers can leverage off of Open Source
45 development and modules. This article will not address Open Source
46 in general nor will it go into the legalities of using Open Source
47 in your product. It will instead focus on common Open Source
48 building mechanisms in light how efficient or inefficient they may
49 be when included in your own build mechanisms.</p>
51 <h3>Problems with code sharing</h3>
53 <p>Unless you employ people who are active in the Open Source
54 community, people who not only participate in using Open Source
55 but also contributing to Open Source, you will enevitably come
56 face to face with a real problem. If you try to improve the Open
57 Source code in any way, unless you donate your changes back to the
58 community at large <b>and</b> those changes are accepted, you will
59 run into the fact that when the next version of the Open Source in
60 question comes out you will have porting work to do. You will need
61 to incorporate your changes with changes from the whole
62 community. In some cases these changes may be done by the
63 community in a similar manner as you had done them. In such cases
64 you can abandon your changes and take the communities solution and
65 then there is one less conflict for you to worry about.</p>
67 <p>Other times the communities change is similar to your change
68 but differs enough that you still have to make some minor
69 adjustments. Sometimes you can come up with a more generic way to
70 doing something that will make everybody happy. In such cases you
71 should really consider donating your changes back under the "what
72 comes around goes around" principal. Then next update your generic
73 solution will not need to be merged again.</p>
75 <p>Still other times what you need to do is not like what anybody
76 else needs to do or wants. Or it maybe that while your solution is
77 brilliant for the limited set of architectures that you are
78 considered about the community needs to be concerned about a large
79 or different set of architectures and thus cannot accept your
80 solution as a general solution that is good for all. In such cases
81 you are stuck with maintaining your solution for each iterration
82 of the module in question.</p>
84 <p>Most developers can relate to the above few paragraphs from an
85 "inside the code" level. But what is often overlooked is that part
86 above the "inside the code" level - at the build and release
89 <h3>Building Software Efficiently (AKA Build Avoidance)</h3>
91 <table border=0 width=50% align=right>
94 <?php start_box ("cs4");?>
95 <p><i>In the beginning there was make(1) and it was
102 <p>Earlier on most software was built using the standard Unix
103 make(1) utility. Make seeks to build only that which need to be
104 build. Make uses a number of assumptions in order to perform its
105 magic. For example, make assumes that you are using 3rd generation
106 languages such as C, FORTRAN, etc. Further make assume you have
107 all of the source contained in files in the file system and that
108 the source code transforms into object code of some kind using
109 some process (e.g. foo.o is derived from foo.c using the C
112 <p>As more and more languages evolved luckily make was able to
113 adapt and you could add new transformation rules and tell make how
114 to transform these newer language source files into their
115 respective derived object files and how to piece everything
116 together. Further you could enhance and automatically define
117 dependencies in order to have your build system remain efficient
118 and continue to try to achieve that all elusive "rebuild only that
119 which requires rebuilding".</p>
121 <p>However make is easily thwarted if an eye on how make works and
122 how to use it efficiently and effectively is not paid mine. For
123 example, since make uses files and their timestamps in order to
124 determine if a target needs to be rebuild, putting a bunch of
125 functions into one large file is not a good idea since any change
126 to any of those functions will result in that whole file being
127 recompiled. However, one file per function is the other extreme of
128 this. In most software projects related functions comprising some
129 group of related software, a module, is a good compromise between
130 these two extremes.</p>
132 <h3>Using Source RPMs</h3>
134 <p>One popular construct in the Open Source world is that of
135 source RPMs. RPM stands for Redhat Package Manager and was
136 Redhat's answer to the question of how to install software on a
137 Linux system. But rpm when farther than that to include what it
138 calls Source RPMs. The concept is simple but also beautiful. While
139 an rpm is considered a binary install package a source rpm (AKA
140 rpms) contains all of the source and related other files like
141 makefiles, installation scripts, etc. In short everything is in
142 there for you to build the package from scratch. This is usual on
143 Linux systems as there are many systems on different architectures
144 where a package needs to be compiled before it is installed on the
147 <p>Many companies are taking Redhat Source RPMs and then modifying
148 only those packages that they wish to change. Other packages are
149 rebuilt from source untouched. This allows developers to
150 essentially build their own complete system with their changes
151 incorporated. A pretty ideal setup - but are RPM Source builds
154 <h3>RPM Source Builds</h3>
156 <p>Turns out that RPM source builds are not efficient at all. In
157 most cases everything gets recompiled everytime. One reason for
158 this is that source rpms are distributed as one large
159 file. Another is that a source rpm is really the <b>derived
160 file</b> not the set of source files before compilation. Because
161 of this make's assumptions have been violated and make is forced
162 to recompile everything.<p>
164 <p>The rpm -b or rpmbuild execution itself highlights the
165 problem. In the normal execution of rpm -b or rpmbuild the
166 following actions happen:</p>
169 <li>In the %prep section the standard %setup macro's first job
170 is to remove any old copies of the build tree</li>
172 <li>The next step of the standard %setup macro is to untar the
173 source from the embedded tarball</li>
175 <li>The final step is to cd to the build directory and set
176 permissions appropriately</li>
179 <p>So even before we get a chance to build anything we have a
180 "fresh" environment which is also an environment where make has no
181 chance of doing any build avoidance! Open Source source RPMs that
182 use the %setup macro will always build everything every time.</p>
184 <h3>The configure redundancy</h3>
186 <p>Additionally most Open Source packages first run configure to
187 interrogate the environment and configure the package so that it
188 can successfully build. In theory it's a good idea. In practice
189 it's slow. Also, each module performs this long configure step
190 again and again. Configure itself is smart enough to create a
191 cache of its findings so running it a second time <b>in the same
192 directory or module</b> will not have to go through all that work
193 again but remember, because of how source rpms work we are always
194 going through configure for the first time. Plus configure does
195 not create the cache for the system as a whole but the module
196 itself. Descend into another directory representing a module and
197 you'll be running configure, again and again...</p>
200 <?php copyright ();?>
203 <script language="JavaScript" src="/JavaScript/Menus.js" type="text/javascript"></script>