\section[Highlights]{Literate programming, Glasgow style} Literate programming is a style that tries to maximise humans' understanding of programs by full-scale use the tools of the description trade (typesetting, indexing, etc.) to say what a program does and how it works. The power of literate programming is in the synergy between writing code and documentation in a unified framework. So, for example, in Knuth's original WEB system [ToDo: add reference(s)] ``code'' and ``text'' may be intermingled as much as you like, the pieces of ``code'' may appear in any order (even though it's only glorified Pascal), and quite substantial cross-referencing and indexing (not to mention \TeX{}-based typesetting) is built in. Programs are just documents for people to read and enjoy which happen to have machine-manipulable code ``buried'' in them. % An essential feature of a literate program is that the % compilable/executable program and the document describing that program % come from {\em the same source}. A most desirable feature of a % literate-programming system is that the painful-by-hand parts of % documents, such as indexing and cross-referencing, are provided % automatically.\footnote{This characterization of literate programming % follows Van Wyk's column in CACM, May 1990.} % % This system for literate programming meets the criteria above. The % first \sectiontype{Highlights} lists some of its features; the second % \sectiontype{notation-tut} describes the \LaTeX{}-looking markup % language and document structure; the third % \sectiontype{Programs} says how to run these programs for literate % programming. \subsection[Glasgow-objectives]{Objectives of our Glasgow system} With our Glasgow system, your ``program'' is a \LaTeX{}-like document, with the compilable/executable code marked off by a \tr{\begin{code}} ... \tr{\end{code}} pair (or equivalent shortcut notation). From there, your ``program'' may be (a)~compiled/interpreted [by extracting the embedded code and feeding it to a ``normal'' compiler/interpreter]; (b)~turned into a beautiful typeset document to be smeared onto dead trees, or (c)~turned into an on-line viewable/jump-aroundable document. Here are the specific (distinctive?) objectives of the Glasgow system. \subsubsection[on-line-form]{Programs/documents in an on-line form} The people-oriented documents produced by literate-programming systems are usually on paper (this system provides those, too). For working programmers, their good properties are largely overshadowed by one fact: paper documents go stale quickly. Also, large programs turn into not-so-navigable large piles of dead trees. Surely we can do better! Note our emphasis on presenting work-in-progress, as opposed to a system that emphasises the presentation of the final perfect programming gem. Our goal is that these on-line documents should be the medium of choice for programmers' daily reference. (APRIL91: See the note about ``literate Para mode'' in \sectionref{literate-para-mode}.) Our choice for an on-line format is the GNU Info format, a primitive ASCII-only sort-of hypertext system. The main reason for this choice is to let the GNU people do as much of our programming for us as possible! The program \tr{info} can display Info files on just about any kind of terminal; \tr{xinfo} works with the X~Window System; the ubiquitous GNU~Emacs has an ever-improving Info mode. \subsubsection[medium-scale]{Programming on a medium scale} We aren't really interested in programming-in-the-small (one person, a few hundred lines of code), and we don't know about programming in the large (hundreds of programmers, millions of lines of code); our target audience is five-person projects working on programs the size of, say, a \Haskell{} compiler :-) \subsubsection[language-independence]{Programming language independence} The system may be used for literate programming with any programming language. However, the quality of the indexing, typesetting, etc., support will vary, depending on whether that language is ``supported'' or not. Fortunately, the current implementation, while clunky (see \sectionref{clunky-implementation}, makes it easy to add better support for a given programming language. See \sectionref{reordering-unemphasis} for a related comment about code re-ordering. \subsubsection[code-verissimilitude]{Program code written exactly as in illiterate programs} You put executable/compilable code into your literate programs by inserting it in a \tr{\begin{code}/\end{code}} environment (or the ``Bird-track'' equivalent); for example: \begin{verbatim} \begin{code} main _ = [ AppendChan stdout "Hello, world!\n" ] \end{code} \end{verbatim} The principle is: you can write code inside a \tr{\begin{code}/\end{code}} environment {\em exactly} as you would in an illiterate program in the same language. This brings up a dichotomy that pervades literate programs: everything in the program is either {\em code}\index{code (vs text)} (stuff in a code environment, or equivalent) or {\em text}\index{text (vs code)} (everything else). \subsubsection[ASCII-declarative-markup]{Use of ASCII-based files, ``declarative'' markup} ASCII-based files (as opposed to some weird binary ``internal format''): so you can read the files directly and e-mail them to your friends. Insisting on ASCII files means some kind of markup commands buried in the text. We want the markup to be ``declarative,'' that is, to say ``what you want'' rather than ``how to produce what you want.'' Section numbering, generation of a table of contents, much indexing work, cross referencing, creating of the Info ``node'' structure, etc., etc., is done automagically. Our starting-point choice for a ``declarative'' markup notation was \LaTeX{}, because that's what we know (and so do a lot of other people). You could make the case that GNU's Texinfo format would've been better, esp.~given some of our other objectives. Or: you could argue that we should've chosen a completely different notation, because some people look at our literate files and think they {\em are} \LaTeX{}, which they are not. The most important parts of our \LaTeX{}-like notation are introduced in \sectionref{notation-tut}, and the whole mess is exhaustively described in \sectionref{Command_reference}. \subsubsection[hierarchical-structure]{Use of hierarchical structure} That is: programs/documents divided into sections, subsections within sections, subsubsections... Hierarchy is a very powerful structuring tool, but its utility for wading around a large sea of code is not apparent. APRIL91: See the comments about a ``literate Para mode'' in \sectionref{literate-para-mode}. See also \sectionref{sectioning-large-documents}. \subsubsection[separate-compilation]{``Separate compilation'' for large programs/documents} This really follows from the size of project we're trying to support; it's just {\em too slow} to have to slurp in the {\em whole} program text and do anything... (It's pretty slow as it is :-) Separate compilation of the embedded code: Use the normal mechanisms of your programming language. For example, if you have a literate C file, \tr{foo.lc}, then you would extract the code into \tr{foo.c} (command: \tr{lit2pgm foo.lc}) and compile as normal (\tr{gcc -c foo.c}). ``Separate compilation'' of the (\LaTeX{} and (Tex)info) document-generating tasks: this is harder and quite a lot of work {\em has} to be done at ``link time''. What happens: the initial ``separate compilations'' of all the individual files in a program/document (e.g., \tr{lit2texi -c foo.lc} or \tr{lit2latex -c foo.lc}) produce intermediate files (\tr{*.itxi} and \tr{*.itex} files, respectively). These are then consulted when a ``link'' is done (e.g., \tr{lit2latex root.lit}, where \tr{root.lit} \tr{\inputs} the other files). (It's complicated: use Makefiles!) \subsubsection[one-file-many-purposes]{Using one file in several ways} Say you're writing a \Haskell{} compiler :-), which includes the source code for a typechecker. On the one hand, you want this to be included in ``the book'' (document) that is the whole compiler. However, you may also want to have a ``typechecker document'' that is itself self-contained (if only because the ``book'' is so big). Going further: perhaps one module of your typechecker is so amazing that you want to publish it as a paper. You could make copies of files, edit them, etc., but that's tacky, and against a most deeply-held principle of literate programming: that you are looking at {\em the} source code for the program being described. As it stands, this system lets you put \tr{\begin{onlystandalone}} and \tr{\end{onlystandalone}} around pieces of your files that only apply in the ``do this as a standalone document'' case; similarly, \tr{\begin{onlypartofdoc}} and \tr{\end{onlypartofdoc}} for stuff that only applies in the ``the glorious whole'' case. APRIL91: I am fairly convinced I got this {\em wrong} and believe that this stuff would be much better handled with a pseudo-``C pre-preprocessor'' (one that does not look inside code blocks [see deeply-held principle above; \sectionref{code-verissimilitude}]); see \sectionref{pseudo-C-preprocessor}. \subsection[Glasgow-NON-objectives]{NON-Objectives of our system} \subsubsection[not-reinvent-LaTeX]{Not to reinvent \LaTeX{} (or any other extensible notation)} Just as with Texinfo, our markup notation has a {\em fixed} set of commands. We don't have a \tr{\define} command that let's you define new macros in terms of old ones. This is for simplicity, I suppose. APRIL91: see the note about fake ``C pre-processing'' in \sectionref{pseudo-C-preprocessor}. \subsubsection[reordering-unemphasis]{An un-emphasis on code-reordering} Some programming languages have a narrow-minded idea about what order the pieces of a program must be presented (e.g., COBOL: identification division, then file division, then data division [how many of you knew this :-]), and some literate-programming systems provide lots of machinery to get around this orderly intransigence. We write in \Haskell{}, which is relatively open-minded about the order in which functions, etc., are presented. Re-splicing together code from dispersed program fragments doesn't buy you much---and we think the same is true for most modern or semi-modern programming languages. Therefore, this system has only primitive support for reordering code. \subsection[Other-Glasgow-features]{Other features of our Glasgow system} \subsubsection[sectioning-large-documents]{Sectioning support for large documents} The main thing we've done here is {\em change} the sectioning commands (vs \LaTeX{}) so it's easy to re-arrange chunks of the hierarchy. Also, a default Info ``node'' structure is worked out, based on the sectioning information. (Veteran Texinfo hackers will appreciate the joys of not typing error-prone \tr{@menu}s.) Sectioning in an individual file should {\em always} begins at \tr{\section}; such files should then glued together in a ``root file'' with \tr{\input} commands interspersed with \tr{\upsection} and \tr{\downsection} commands. \tr{lit2latex}/\tr{lit2texi} combine your files, putting appropriate real-\LaTeX{}/Texinfo sectioning commands (\tr{\part}, \tr{\chapter}, etc.) in the right places. A side-effect of diverging from \LaTeX{} sectioning is that we can support deeper nesting of sectioning commands. It's also neat to be able to change your mind about whether you want to start with (\LaTeX{}-speak...) parts, chapters, sections, or whatever... Please see \sectionref{Sectioning} for further details about this whole mess. \subsubsection[automatic-indexing]{Automatic indexing of your code} An attempt is made to index all the ``interesting things'' in the code parts of your program. The success of this enterprise depends on the degree to which the language you are using is supported. In a paper manifestation, you'll get nice neat indexes, like we're used to. In an on-line manifestation, you would like to use the information to ``point and jump'' (our Info-based implementation does this, perhaps clumsily). APRIL91: See \sectionref{multiple-indexes} for latest thoughts about indexing. Also, I've put some comment about free-text retrieval in \sectionref{free-text-index}. \subsubsection[manual-indexing]{Indexing by hand} Indexing your text (vs your code) is not automatic. You do it by hand with \tr{\index{}} commands. The format of the \tr{\index} commands is a simplified (?) form of \tr{makeindex} commands. \Sectionref{Indexing} gives the details. \subsubsection[automatic-cross-referencing]{Automatic cross-referencing of your code} Your code may also be automatically cross-referenced; again, it's based on finding ``interesting things'' in your code; furthermore, it depends on distinguishing between ``definitions'' and ``uses'', because the main purpose of cross-referencing is to let you move quickly from a ``use'' (e.g., of a function) to its ``definition''. APRIL91: Sometimes you would really like to be able to go from a definition to all of its uses. Perhaps this should be an option. Cross-referencing is automatically OFF when producing \LaTeX{} documents (we found it mainly cluttered) and ON when producing Info files (which lets you jump to the definitions through a node's menu). \subsubsection[shortcut-notations]{Shortcut notations} Within non-code parts of documents, code snippets may be included between `at' signs; e.g., \tr{@f x y = y@}---appropriate formatting, indexing, etc., will be done. APRIL91 COMMENT: I need to clarify the proper use of these \tr{@}s. I also need to figure what to do when \tr{@}s are not the right thing! (I think Texinfo has an over-proliferation of commands for marking off this and that in the text.) See \sectionref{code-in-text-formatting}. For the ordinary text to be shown in a typewriter font (much used in user's guides, for example), you may use the \tr{\tr{}} command; the only restriction on \tr{} is that braces must be balanced. The same trick for ``plain'' (roman) font is \tr{\pl{}}. [APRIL91: See \sectionref{diff-fonts-in-text} for further thoughts about fonts in your text.] APRIL91: See also: \sectionref{BNF-grammars}. \subsubsection[TeX-to-ASCII]{The great ``\LaTeX{} to ASCII'' problem solved!} (Well, not quite...) A literate program looks an awful lot like a \LaTeX{} document. If you convert the program to an Info file, then run the \tr{info2ascii} script [which I haven't written yet :-] over it, you'll have a quite-reasonable all-ASCII nicely-formatted version of your \LaTeX{}-ish input. \subsection[Glasgow-NON-features]{Shortcomings of our system} \subsubsection[texinfo-limits]{Limits because of the Texinfo intermediate representation} We produce our on-line viewable GNU Info files by going through their Texinfo format. If something cannot be represented easily in Texinfo (which doesn't do things not easily representable in character-only ASCII), it ain't in our system. This leads to the following {\em very notable} shortcomings: \begin{enumerate} \item No graphics, pictures, or sound. \item No fancy mathematical stuff of the sort \TeX{}/\LaTeX{} is so good at. Note: we would like to provide enough {\em basic} mathematical stuff so you aren't hamstrung by this restriction. \end{enumerate} \subsubsection[clumsy-node-namespace]{Node namespace is not ``neat''} For all practical purposes, it's {\em your problem} to make sure that you don't get nodename clashes. This is really uncool. APRIL91: Some ideas for partial alleviation of making-up-nodenames weariness, see \sectionref{revised-sectioning}. Barring that, a good discipline for making up unique nodenames would be helpful (I haven't found one yet). \subsubsection[clunky-implementation]{A clunky implementation} The principle has been: do as little work as possible. (What do you expect from the proponents of lazy functional programming?) The implementation is the Perl Script From Hell, with a supporting cast of the Flex Program From Hell, and a whole bunch of exceedingly useful programs written by other people, mainly the GNU Info/Texinfo-related programs and \tr{tgrind} (by Van Jacobsen). [We can pass along the source to all of these programs.]