Investigation of Size Reduction in MakeIndex [25-Jul-1991] Joachim Schrod in his work on MakeIndex 3.0 (to be reported at EuroTeX'91 in Paris, September 1991) found that substantial storage economizations in MakeIndex are possible by dynamically allocating some arrays. Since that version is not yet available, it seemed worthwhile to try to apply such changes to version 2.10, making version 2.11. The major memory consumption in MakeIndex is by the arrays af[][] and sf[][] in #define FIELD_MAX 3 #define STRING_MAX 256 typedef struct KFIELD { char sf[FIELD_MAX][STRING_MAX]; /* sort key */ char af[FIELD_MAX][STRING_MAX]; /* actual key */ ... } FIELD, *FIELD_PTR; These arrays are referenced in genind.c, scanid.c, and sortid.c. They are defined in mkind.h, and initialized to empty strings in scanid.c. Each index entry has one such struct KFIELD slot, and with current dimension limits, sizeof(FIELD) == 1844. book.idx has 4241 index entries, so we need 1844*4241 = 7820404 bytes of dynamic storage. However, book.idx is only 142027 bytes long, and in entry, 16 characters ("\indexentry{}{}") need not be stored; the true storage requirement is therefore 142027 - 4241*16 = 74171 so 99.05% of the storage is being wasted, After sorting and removal of duplicate lines, book.idx reduces to 85935 bytes and 2531 entries, so the actual storage required is 1844*2531 = 4667164 bytes; the wastage is still high: 98.4%. If we change to typedef struct KFIELD { char *sf[FIELD_MAX]; /* sort key */ char *af[FIELD_MAX]; /* actual key */ ... } FIELD, *FIELD_PTR; then sizeof(FIELD) = 332, and the required storage is 332*2531 = 840292 bytes. This is still too large for an IBM PC. However, KFIELD also contains char encap[STRING_MAX]; /* encapsulator */ and that field is only rarely used, so by dynamically allocating it when needed, we can reduce the KFIELD node storage by 256 - 4 = 252 bytes, reducing sizeof(FIELD) to 80 bytes. For book.idx, we then need 80*2531 = 202480 bytes, and that should be small enough to run on the PC. encap is used in genind.c, scanid.c, and sortid.c, and defined in mkind.h. We can handle dynamic allocation of af[] and sf[] three ways: (1) initialize them both to (char*)NULL, then allocate space as needed, (2) initialize them both to calloc(1), and (3) initialize them both to a constant empty string, "". MakeIndex makes many references to the first character (*af[i] or af[i][0]), the first scheme would require many changes to avoid dereferencing NULL pointers. The second scheme would not have this problem, but would require a lot of overhead in malloc'ing 1-character blocks. The third scheme seems the best choice, and requires few changes; no free() or realloc() calls are made by MakeIndex, so we need take no precautions about attempting to free or reallocate constant string storage. Examination of the code in genind.c, scanid.c, and sortid.c which references af[] and sf[] shows that changes are needed only in scanid.c, where we introduce a new function, make_string(), to handle the dynamic allocation of strings when needed. Similarly, of the encap[] uses in genind.c, scanid.c, and sortid.c, only scanid.c needs changes. The definition of struct KFIELD in mkind.h must be updated so that af, sf, and encap become pointers to strings, rather than arrays of characters.