CMP -- Compare Text Files

program and documentation by Stan Brown, Oak Road Systems
revised February 20, 1999
Copyright © 1994-1999 by Oak Road Systems, +1 216 371-0043

This program will compare files and report any differences. It improves on the DOS utilities, COMP.COM and the newer FC.EXE, in several respects:

Contents

        License and warranty
System requirements
Installation
User instructions
Options
Environment variable
Return values
How spaces and tabs are handled
Revision highlights

License and warranty

CMP is shareware. If you use it past a 30-day evaluation period, you are morally and legally bound to register and pay for it. Please see the file LICENSE.TXT for full details, including support and warranty information.

System requirements

DOS 2.0 or higher is required for the 16-bit version. The 32-bit version will run in a DOS box under Windows 98, Win95, and Win NT 4.0.

The two versions operate the same and have the same features, except that the 32-bit version supports long filenames.

Installation

There is no special installation procedure. Simply move CMP16.EXE, CMP32.EXE, or both to any convenient directory in your path.

You may wish to rename the version you use more often to the simpler CMP.EXE. All the following user instructions will assume you've done that. Otherwise, just substitute CMP16 or CMP32 wherever you see CMP in the examples.

User instructions

For a quick summary of operating instructions, type
        cmp
The full command form is one of
        cmp [options] file1 file2 [>reportfile]
        cmp [options] files directory [>reportfile]
In the second form, files may be any number of file specs, possibly containing wildcards, and directory may be a disk letter (with colon) or path (with or without trailing backslash). Please be aware that the 16-bit and 32-bit CMP programs expand wildcards slightly differently because the 32-bit version supports long filenames. Thus the 32-bit version would expand abc* to include all files, with any extension or none, whose names start with abc; with the 16-bit version you need abc*.* to get the same result.

Example:

        cmp -L5 zonk1 b:zonk2
will compare file zonk1 (on the current drive and directory) to file zonk2 in the current directory of drive b, limiting look-ahead to five lines (the -L5 option).

Another example:

        cmp a:*.doc xx.htm b:
will compare all the *.doc files in the current directory of drive a, plus xx.htm in the current directory of the current disk, to files of the same names in the current directory of drive b.

Options

CMP's operation can be modified by several options, either on the command line or in an environment variable (see below).

You have a lot of freedom about how you enter options. You can use a leading hyphen or slash mark; you can use upper- or lower-case letters. You can leave spaces between options or combine them. For instance, the following are just some of the different ways of turning on the W100 and B options:

        /w100 /b    /w100/b    /w100B    -W100-B    -W100 -b
This document will always use capital letters for the options, to make it easier to distinguish letter l and figure 1.
/? Display a help message and exit with no further processing.
 
/0 or /1
These options let you control the values that CMP returns in the DOS error level. /0 returns 0 if there are differences or 1 if there are no differences; /1 returns 1 for differences or 0 for no differences. For more details, see Return values below.
 
/B
toggles between "compress any run of spaces and/or tabs into a single space for comparison purposes" and "don't compress whitespace within a line". The default is to compress, so that CMP normally considers "a    b" and "a b" and "a  {tab} b" identical.

Note that runs of spaces and/or tabs are compressed to a single space, not completely removed. Thus CMP will always consider "ab" (with no space between "a" and "b") different from "a b" (any spaces or tabs between "a" and "b").

Regardless of this option, CMP will always disregard spaces and tabs at the ends of lines. Some more esoteric details are given below in "How spaces and tabs are handled".
 

/D
displays debugging information. Debugging information includes whether you're running the 16-bit or 32-bit version, the value of the environment variable, and the values of all options specified or implied. This information is normally suppressed, but you may find it helpful if CMP seems to behave in a way you don't expect.
 
/E
toggles between "ignore blank lines" (the default) and "treat blank lines like any others". Normally, CMP ignores any lines of length 0, and lines that contain only spaces and tabs. Specify the /E option to make CMP keep track of blank lines and report added or deleted blank lines as differences.
 
/I
toggles between "ignore case" and "consider A-Z different from a-z". The default is to treat upper and lower case as different.
 
/Llook-ahead
sets the look-ahead to look-ahead lines from each file. The default is 20 lines in the 16-bit version and 100 lines in the 32-bit version.

The significance of look-ahead is this. Suppose CMP finds, after lines 28-31 of file 1 match lines 38-41 of file 2, that line 32 of file 1 doesn't match line 42 of file 2. In this case, CMP has to look ahead at line 33 of file 1 and line 43 of file 2.

              file 1               file 2
        ==================   ==================
        (28) line a          (38) line a
        (29) line b          (39) line b
        (30) line c          (40) line c
        (31) line d          (41) line d
        (32) line e          (42) something different
        (33: look ahead)     (43: look ahead)
Maybe they match, or maybe line 43 of file 2 matches line 32 of file 1 (meaning that line 42 of file 2 is new in that file and doesn't exist in file 1). The /L option tells CMP how many lines to look ahead trying to find a match after lines that don't match. If CMP examines that number of lines from both files without finding a match, it will report that fact and stop processing. (If you wish, you can then re-run CMP with a higher /L value.)

There's no specific limit for look-ahead by itself, but /L and /W (below) have a combined limit. In the 16-bit version, 64 K (65,536 bytes) is available for look-ahead, and look-ahead times (width+2) must not exceed that value. In the 32-bit version, the look-ahead and width are limited only by available memory (including virtual memory). In either version, if you exceed the available space with the combined /L and /W options, CMP will display a message inviting you to choose lower values.
 

/Q
Suppress the logo and the warning messages about individual truncated lines (see /W, below). If any lines were truncated, a single message will still appear at the end of processing.
 
/T
toggles between "during comparison, replace each tab with the number of spaces necessary to reach the next tab stop" and "treat a tab as an ordinary character". The default is to expand tabs, and the tab stops occur every 8 columns.

The /T option has no effect unless you also use /B to turn off the compression of runs of spaces.

Some more esoteric details are given below in "How spaces and tabs are handled".
 

/Wwidth
sets the significant line width to width characters; the default is 254. CMP will examine each line only up to this width, and will display an error message for any lines that exceed it. CMP will also tell you at the end that some lines were truncated, reporting the greatest line width in either file. That makes it easy for you to re-run CMP with a higher /W value if you suspect that some lines contain differences beyond the original width.

You can suppress the messages about truncation of individual lines by using the /Q option, but CMP will still display the message at the end so you'll know that some lines were not examined completely and what you can do about it.

The effective width of a line, which is measured against /Wwidth, may be different from that line's length in characters, depending on how spaces and tabs are handled (see below). If you want to know the actual maximum effective line width in a file, simply compare the file to itself with a small width value and the /Q option to suppress messages, like this:

        cmp /QW10 file1 file1
The maximum value for /W depends on the value given for /L (above).

Environment variable

If you use certain options frequently, you can put them in the ORS_CMP environment variable. You have the same freedom as on the command line: leading slashes or hyphens, space separation or options run together, caps or lower case.

CMP processes the environment variable before any command-line options, which means that an option on the command line will override the corresponding option in the environment variable.

The toggles, /B /E /I /Q, /T, reverse their state every time you specify them. So if you usually want case-blind comparisons, put /I in the environment variable. Then, if you want case-sensitive comparisons for a particular run, simply put /I on the command line and that will reverse the setting from the environment variable. If you have any question which options are in effect, simply use /D on the command line to display all option values.

Return values

By default, CMP will return one of the following values to DOS, and you can test the return value with IF ERRORLEVEL in a batch file.
 
255   bad option, or other error on the command line
254specified file not available
253not enough memory for combination of /L and /W options
2help message displayed (/? option, or no files specified on the command line)
0program ran to completion (whether the files are the same or different)
 

You might want to use CMP in a batch file or a makefile and take different actions depending on whether two files are the same or different. To do this, use the /0 or /1 option. The /1 option emulates UNIX diff by returning an error level of 1 if the files are different or 0 if they're the same. /0 is the opposite: it returns 0 if the files are different or 1 if they're the same. In other words, the /0 or /1 option gives the value CMP should return if differences are found.

How spaces and tabs are handled

This section gives some more details about the effects of the /B and /T options, which control the treatment of spaces and tabs within a line.

CMP applies the /B and /T option settings while reading each line from file. In fact, CMP actually makes the changes to its own in-RAM copy of each line, so that when differences are found CMP displays the transformed line.

CMP always ignores any spaces and tabs at the end of a line, regardless of the options. CMP also ignores any difference between the UNIX line-ending convention (LF only) and the DOS convention (CR+LF).

There can be some interaction between the /B and /T option settings and the /Wwidth setting. The /W option specifies the maximum effective line width, but the effective line width of a line can be less or greater than the actual length of that line in characters:

For this reason, if any line's effective width exceeds the /W width, CMP will tell you the maximum effective width at the end of the run.

Since CMP normally disregards the above differences in spacing within a line, as well as completely blank lines, if the program finds no differences it will report that the files are "effectively identical". If you want to compare for character-by-character identity, including spaces, tabs, and blank lines, specify the /BET options. Then if the program finds no differences it will report that the files are identical.

Revision highlights

v4.1a, 1999-02-20
no changes to code or documentation, only updated contact information (new ISP)
v4.1, 1999-01-09
Add the /I and /D options. Split the confusing three-valued /Bn option into separate /B and /T toggle-type options. Change the 32-bit default to /L100. Improve diagnostics for a bad option in the environment variable. Convert documentation to HTML from Word for Windows.
v4.0, 1998-11-18
Package the existing version 4.0 for shareware release: revise documents without changing the software.
v4.0, 06/98
Allow multiple filespecs before a directory name, not just one filespec with wildcards. Support long filenames in the new 32-bit version.
v3.4, 10/97
Add the /0 and /1 options; systematize all return values. No longer require the trailing backslash on a directory argument. Instead of "effectively identical", report a more specific phrase when the files are not significantly different based on the /B and /E options.
v3.3, 07/97
Compress sequences of spaces and tabs to a single space; add the /B option to control that feature and tab expansion. Add the /Q option. Make the format of command-line options more flexible, and scan the ORS_CMP environment variable for options.
v3.0, 07/94
Allocate string arrays far, allowing larger combined values of /L and /W.
v2.4, 11/89
Default to /L20 (previously /L10).
v2.1, 03/85
Expand tabs in input lines to the appropriate number of spaces.
v1.1, 10/84
Allow wildcards in the first file argument.
v1.0, 08/84
initial version