\makeatletter \def\idxexample#1{\nwix@id@uses#1} \makeatother \section{An example of {\tt noweb}} The following short program illustrates the use of {\tt noweb}, a low-tech tool for literate programming. The purpose of the program is to provide a basis for comparing {\tt WEB} and {\tt noweb}, so I have used a program that has been published before; the text, code, and presentation are taken from~\cite[Chapter~12]{knuth:literate:book}. The notable differences are: \begin{itemize} \item When displaying source code, {\tt noweb} uses different typography. In particular, {\tt WEB} makes good use of multiple fonts and the ablity to typeset mathematics, and it may use mathematical symbols in place of C~symbols (e.g. ``$\land$'' for ``{\tt \&\&}''). {\tt noweb} uses a single fixed-width font for code. \item {\tt noweb} can work with {\LaTeX}, and I have used {\LaTeX} in this example. \item {\tt noweb} has no numbered ``sections.'' \ifx\wbn\undefined When numbers are needed for cross-referencing, {\tt noweb} uses page numbers. \else When numbers are needed for cross-referencing, {\tt noweb} ordinarily uses page numbers. \fi If two or more chunks appear on a page, for example, page~24, they are distinguished by appending a letter to the page number, for example, 24a or 24b. \ifx\wbn\undefined\else {\LaTeX} computes these page numbers, but since {\LaTeX} is not used in the production of {\it IEEE Software}, this example uses consecutive ``chunk numbers'' instead of page numbers. \fi \item {\tt noweb} has no special support for macros. In the sample program, I have used the chunk ``\LA{}Definitions~{\nwtagstyle{}\subpageref{NWwc.5-DefB-1}}\RA{}'' to hold macro definitions. \item {\tt noweb} does not recognize C~identifier definitions automatically, so I had to add a list of defined identifiers to each code chunk. Because {\tt noweb} is language-independent, it must use a heuristic to find uses of identifiers. This heuristic can be fooled into finding false ``uses'' in comments or string literals, such as the use of {\tt status} in chunk~\subpageref{NWwc.5-DefB-1}. \item The {\tt CWEB} version of this program has semicolons following most uses of \LA$\cdots$\RA{}. {\tt WEB} needs the semicolon or its equivalent to make its prettyprinting come out right. Because it does not attempt prettyprinting, {\tt noweb} needs no semicolons. \item Both {\tt WEB} and {\tt noweb} write chunk cross-reference information in {\footnotesize footnote} font below each code chunk, for example, ``{\footnotesize\let\nwcodecomment=\relax\nwused{\\{NWwc.5-TheG-1}}}'' Unlike {\tt WEB}, {\tt noweb} also includes cross-reference information for identifiers, for example, ``{\footnotesize Defines {\let\nwcodecomment=\relax\idxexample{{file\_count}{file:uncount}}}}'' This information is generated using the {\tt @~\%def} markings in the {\tt noweb} source. \end{itemize} \subsection{Counting words} This example, based on a program by Klaus Guntermann and Joachim Schrod~\cite{guntermann:cweb} and a program by Silvio Levy and D. E. Knuth~\cite[Chapter~12]{knuth:literate:book}, presents the ``word count'' program from {\sc Unix}, rewritten in {\tt noweb} to demonstrate literate programming using {\tt noweb}. The level of detail in this document is intentionally high, for didactic purposes; many of the things spelled out here don't need to be explained in other programs. The purpose of {\tt wc} is to count lines, words, and/or characters in a list of files. The number of lines in a file is the number of newline characters it contains. The number of characters is the file length in bytes. A ``word'' is a maximal sequence of consecutive characters other than newline, space, or tab, containing at least one visible ASCII code. (We assume that the standard ASCII code is in use.) \bigskip @ Most literate C programs share a common structure. It's probably a good idea to state the overall structure explicitly at the outset, even though the various parts could all be introduced in chunks named \LA{*}\RA{} if we wanted to add them piecemeal. Here, then, is an overview of the file {\tt wc.c} that is defined by the {\tt noweb} program {\tt wc.nw}: <<*>>= <
> <> <> <> <> @ We must include the standard I/O definitions, since we want to send formatted output to [[stdout]] and [[stderr]]. <
>= #include @ The [[status]] variable will tell the operating system if the run was successful or not, and [[prog_name]] is used in case there's an error message to be printed. <>= #define OK 0 /* status code for successful run */ #define usage_error 1 /* status code for improper syntax */ #define cannot_open_file 2 /* status code for file access error */ <>= int status = OK; /* exit status of command, initially OK */ char *prog_name; /* who we are */ @ Now we come to the general layout of the [[main]] function. <>= main(`argc, `argv) int argc; /* number of arguments on UNIX command line */ char **argv; /* the arguments, an array of strings */ { <> prog_name = argv[0]; <> <> <> exit(status); } @ If the first argument begins with a `{\tt-}', the user is choosing the desired counts and specifying the order in which they should be displayed. Each selection is given by the initial character (lines, words, or characters). For example, `{\tt-cl}' would cause just the number of characters and the number of lines to be printed, in that order. We do not process this string now; we simply remember where it is. It will be used to control the formatting at output time. <>= int file_count; /* how many files there are */ char *which; /* which counts to print */ <>= which = "lwc"; /* if no option is given, print 3 values */ if (argc > 1 && *argv[1] == '-') { which = argv[1] + 1; argc--; argv++; } file_count = argc - 1; @ Now we scan the remaining arguments and try to open a file, if possible. The file is processed and its statistics are given. We use a [[do ... while]] loop because we should read from the standard input if no file name is given. <>= argc--; do { <> <> <> <> <> <> /* even if there is only one file */ } while (--argc > 0); @ Here's the code to open the file. A special trick allows us to handle input from [[stdin]] when no name is given. Recall that the file descriptor to [[stdin]] is~0; that's what we use as the default initial value. <>= int `fd = 0; /* file descriptor, initialized to stdin */ <>= #define READ_ONLY 0 /* read access code for system open */ <>= if (file_count > 0 && (fd = open(*(++argv), READ_ONLY)) < 0) { fprintf(stderr, "%s: cannot open file %s\n", prog_name, *argv); status |= cannot_open_file; file_count--; continue; } <>= close(fd); @ We will do some homemade buffering in order to speed things up: Characters will be read into the [[buffer]] array before we process them. To do this we set up appropriate pointers and counters. <>= #define buf_size BUFSIZ /* stdio.h BUFSIZ chosen for efficiency */ <>= char buffer[buf_size]; /* we read the input into this array */ register char *ptr; /* first unprocessed character in buffer */ register char *buf_end; /* the first unused position in buffer */ register int c; /* current char, or # of chars just read */ int in_word; /* are we within a word? */ long word_count, line_count, char_count; /* # of words, lines, and chars so far */ <>= ptr = buf_end = buffer; line_count = word_count = char_count = 0; in_word = 0; @ The grand totals must be initialized to zero at the beginning of the program. If we made these variables local to [[main]], we would have to do this initialization explicitly; however, C's globals are automatically zeroed. (Or rather, ``statically zeroed.'') (Get it?) <>= long tot_word_count, tot_line_count, tot_char_count; /* total number of words, lines, chars */ @ \vskip0pt plus3in\penalty-500\vskip0pt plus-3in The present chunk, which does the counting that is {\tt wc}'s {\em raison d'\^etre}, was actually one of the simplest to write. We look at each character and change state if it begins or ends a word. <>= while (1) { <> c = *ptr++; if (c > ' ' && c < 0177) { /* visible ASCII codes */ if (!in_word) { word_count++; in_word = 1; } continue; } if (c == '\n') line_count++; else if (c != ' ' && c != '\t') continue; in_word = 0; /* c is newline, space, or tab */ } @ Buffered I/O allows us to count the number of characters almost for free. <>= if (ptr >= buf_end) { ptr = buffer; c = read(fd, ptr, buf_size); if (c <= 0) break; char_count += c; buf_end = buffer + c; } @ It's convenient to output the statistics by defining a new function [[wc_print]]; then the same function can be used for the totals. Additionally we must decide here if we know the name of the file we have processed or if it was just [[stdin]].{\hfuzz=11.1pt\par} <>= wc_print(which, char_count, word_count, line_count); if (file_count) printf(" %s\n", *argv); /* not stdin */ else printf("\n"); /* stdin */ @ <>= tot_line_count += line_count; tot_word_count += word_count; tot_char_count += char_count; @ We might as well improve a bit on {\sc Unix}'s {\tt wc} by displaying the number of files too. <>= if (file_count > 1) { wc_print(which, tot_char_count, tot_word_count, tot_line_count); printf(" total in %d files\n", file_count); } @ Here now is the function that prints the values according to the specified options. The calling routine is supposed to supply a newline. If an invalid option character is found we inform the user about proper usage of the command. Counts are printed in 8-digit fields so that they will line up in columns. <>= #define print_count(n) printf("%8ld", n) <>= wc_print(which, char_count, word_count, line_count) char *which; /* which counts to print */ long char_count, word_count, line_count; /* given totals */ { while (*which) switch (*which++) { case 'l': print_count(line_count); break; case 'w': print_count(word_count); break; case 'c': print_count(char_count); break; default: if ((status & usage_error) == 0) { fprintf(stderr, "\nUsage: %s [-lwc] [filename ...]\n", prog_name); status |= usage_error; } } } @ Incidentally, a test of this program against the system {\tt wc} command on a SPARCstation showed that the ``official'' {\tt wc} was slightly slower. Furthermore, although that {\tt wc} gave an appropriate error message for the options `{\tt-abc}', it made no complaints about the options `{\tt-labc}'! Dare we suggest that the system routine might have been better if its programmer had used a more literate approach? @ \section*{List of code chunks} This list is generated automatically. The numeral is that of the first definition of the chunk. \nowebchunks %\begin{multicols}{2}[ \section*{Index} Here is a list of the identifiers used, and where they appear. Underlined entries indicate the place of definition. This index is generated automatically. %] \nowebindex %\end{multicols}