Improving fix_copyright.pl

  • Still unable to build 3.2.2/5.0.0 natively
  • Worked on improving algorithm for copyright replacement

I'm finding it very difficult to develop a hueristic algorithm to find what I call "LynuxWorks copyright blocks" in existing source files. The problem is that the input data is not necessarily consistant. Sure we can go for the 80% fix and hand fix the rest (in fact, that's what I did before) however those 20% often cause compiler errors during the build. What's potential worse is that it is possible that some files will be altered in such a way as to not produce a compiler error but rather to change behavior of the resulting code.

My current algorithm attempts to do the following:

  • Locate the start of a C comment (i.e. "/*"). This comment must be at the beginning of the line (for now I am not considering inline comments as the chances of these containing a bona fide copyright statement is unlikely).
  • Scan until the enclosing end comment (i.e. "*/"). Again this must appear at the end of a line ($).
  • Take those lines and examine them for the works "(C) Copyright" and "LynuxWorks" (additionally I found some files that had "LynuxWork,").
  • If found then the comment block it throw out (to be replaced by the new, more consistent copyright block.

Currently the "official" copyright block is of the form:

    /* vi: ts=4 sw=4
    *************************************************************
    (C) Copyright $copy
    LynuxWorks, Inc.
    San Jose, CA
    All rights reserved.

    $Date: $Date$
    $Revision: $Revision$
    $Source: $Source$
    ************************************************************/

The "$copy" is replaced by the copyright date found in the file. Dates of the form are changed to - (unless == . Dates of the form , , are changed to -. In order to make finding such copyright information eaiser in the future I would suggest changing the above format to:

    /* Start Copyright ******************************************
    vi: ts=4 sw=4
    (C) Copyright $copy LynuxWorks, Inc.
    San Jose, CA
    All rights reserved.

    $Date: $Date$
    $Revision: $Revision$
    $Source: $Source$
    * End Copyright *********************************************/

This would 1) retain the "vi: ts=4 sw=4" annotation that I assume is for vi users, 2) group both the copyright string ("(C) Copyright") along with the company name of LynuxWorks on the same line. This makes it easier to grep for in the future considering some files have Rockwell copyrights. 3) Clearly delineates the start and stop of the copyright block.

The problems that I'm having is that I'm seeing copyright blocks of the following forms:

  /************************************************************
    (C) Copyright 1987-2000
    Lynx Real-Time Systems, Inc.
    San Jose, CA
    All rights reserved.

    $Date: 2003/11/14 22:44:44 $
    $Revision: 1.1 $
    ************************************************************/

(Does not contain LynuxWorks, rather Lynx Real-Time Systems)

    /*
    .FP
    ***********************************************************************
     Revision History
     See ClearCase
     Version:
    ***********************************************************************
    *
    * EXPORT NOTICE:
    *
    *   INFORMATION SUBJECT TO EXPORT CONTROL LAWS
    *
    * Subject to local country rules and laws when applicable, you
    * must comply with the following:
    *
    * These commodities, technology, or software were exported from
    * the United States in accordance with the Export Administration
    * Regulations.  Diversion contrary to U. S. law and other relevant
    * export controls is prohibited.   They may not be re-exported to
    * any of the following destinations without authorization; Cuba,
    * Iran, Iraq, Libya, North Korea, Sudan or any other country to
    * which shipment is prohibited; nor to end-use(r)s involved in
    * chemical, biological, nuclear, or missile weapons activity.
    *
    * COPYRIGHT NOTICE:
    *   (C) Copyright 2001 Rockwell Collins, Inc.  All rights reserved.
    *
    * FILE NAME:
    *   df.c
    *
    * PURPOSE:
    *   utility to display disk usage
    *
    * NOTES:
    *
    * ABBREVIATIONS/ACRONYMS:
    *
    *****************************************************************
    .FP END
    */
    /************************************************************
    (C) Copyright 1987-1996
    Lynx Real-Time Systems, Inc.
    San Jose, CA
    All rights reserved.

    $Date: 2003/09/10 15:24:57 $
    $Revision: 1.1.1.1 $
    ************************************************************/

Contains multiple "(C) Copyright" strings, one being ours and the other being Rockwell's. Should both exist in the resultant file?

    /*
    .FP
     **********************************************************************
     *
     * FILE NAME:
     *   hm_load_header.c
     *
     * PURPOSE:
     *    Performs the integrity check of the program and data files
     *    in the CPR read only file system.
     *
     * ABBREVIATIONS/ACRONYMS: (optional)
     *
     * NOTES: none
     *
     * COPYRIGHT NOTICE:
     *   (C) Copyright 2001-2002 Rockwell Collins, Inc.  All rights reserved.
     *       Proprietary and confidential material.  Distribution,
     *       use, and disclosure restricted by Rockwell Collins, Inc.
     *    Copyright (c) 2003-2004, LynuxWorks, Inc. All Rights Reserved.
     *
     ***********************************************************************
    .FP END
    */

Contains multiple "(C) Copyright" strings, one being ours and the other being Rockwells, in the same comment block! Also notice the inconsistant form of one bying "(C) Copyright" while the other being "Copyright (C)". How should this case be handled?