1 [article Boost.AutoIndex
3 [copyright 2008, 2011 John Maddock]
5 Distributed under the Boost Software License, Version 1.0.
6 (See accompanying file LICENSE_1_0.txt or copy at
7 [@http://www.boost.org/LICENSE_1_0.txt])
9 [authors [Maddock, John]]
10 [/last-revision $Date: 2008-11-04 17:11:53 +0000 (Tue, 04 Nov 2008) $]
13 [def __quickbook [@http://www.boost.org/doc/tools/quickbook/index.html Quickbook]]
14 [def __boostbook [@http://www.boost.org/doc/html/boostbook.html BoostBook]]
15 [def __boostbook_docs [@http://www.boost.org/doc/libs/1_41_0/doc/html/boostbook.html BoostBook documentation]]
16 [def __quickbook_syntax [@http://www.boost.org/doc/libs/1_41_0/doc/html/quickbook/ref.html Quickbook Syntax Compendium]]
17 [def __docbook [@http://www.docbook.org/ DocBook]]
18 [def __docbook_params [@http://docbook.sourceforge.net/release/xsl/current/doc/ Docbook xsl:param format options]]
19 [def __DocObjMod [@http://en.wikipedia.org/wiki/Document_Object_Model Document Object Model (DOM)]]
21 [def __doxygen [@http://www.doxygen.org/ Doxygen]]
22 [def __pdf [@http://www.adobe.com/products/acrobat/adobepdf.html PDF]]
24 [template deg[]'''°'''] [/ degree sign ]
27 [section:overview Overview]
29 AutoIndex is a tool for taking the grunt work out of indexing a
30 Boostbook\/Docbook document
31 (perhaps generated by your Quickbook file mylibrary.qbk,
32 and perhaps using also Doxygen autodoc)
33 that describes C\/C++ code.
35 Traditionally, in order to index a Docbook document you would
36 have to manually add a large amount of `<indexterm>` markup:
37 in fact one `<indexterm>` for each occurrence of each term to be
40 Instead AutoIndex will automatically scan one or more C\/C++ header files
41 and extract all the ['function], ['class], ['macro] and ['typedef]
42 names that are defined by those headers, and then insert the
43 `<indexterm>`s into the Docbook XML document for you.
45 AutoIndex can also scan using a list of index terms
46 specified in a script file, for example index.idx.
47 These manually provided terms can optionally be regular expressions,
48 and may allow the user to find references to terms
49 that may not occur in the C++ header files. Of course providing a manual
50 list of search terms in to index is a tedious task
51 (especially handling plurals and variants),
52 and requires enough knowledge of the library
53 to guess what users may be seeking to know,
54 but at least the real 'grunt work' of
55 finding the term and listing the page number is automated.
57 AutoIndex creates index entries as follows:
59 for each occurrence of each search term, it creates two index entries:
61 # The search term as the ['primary index key] and
62 the ['title of the section it appears in] as a subterm.
64 # The section title as the main index entry and the search term as the subentry.
66 Thus the user has two chances to find what they're
67 looking for, based upon either the section name
68 or the ['function], ['class], ['macro] or ['typedef] name.
70 [note This behaviour can be changed so that only one index entry is created
71 (using the search term as the key and
72 not using the section name except as a sub-entry of the search term).]
74 So for example in Boost.Math the class name `students_t_distribution` has a primary
75 entry that lists all sections the class name appears in:
77 [$../students_t_eg_1.png]
79 Then those sections also have primary entries, which list all the search terms those
82 [$../students_t_eg_2.png]
84 Of course these automated index entries may not be quite
85 what you're looking for: often you'll get a few spurious entries, a few missing entries,
86 and a few entries where the section name used as an index entry is less than ideal.
87 So AutoIndex provides some powerful regular expression based rules that allow you
88 to add, remove, constrain, or rewrite entries. Normally just a few lines in
89 AutoIndex's script file are enough to tailor the output to match the author's
90 expectations (and thus hopefully the index user's expectations too!).
92 AutoIndex also supports multiple indexes (as does Docbook), and since it knows
93 which search terms are ['function], ['class], ['macro] or ['typedef] names, it
94 can add the necessary attributes to the XML so that you can have separate
95 indexes for each of these different types. These specialised indexes only contain
96 entries for the ['function], ['class], ['macro] or ['typedef] names, ['section
97 names] are never used as primary index terms here, unlike the main "include everything"
100 Finally, while the Docbook XSL stylesheets create nice indexes complete with page
101 numbers for PDF output, the HTML indexes look poorer by comparison, as these use
102 section titles in place of page numbers... but as AutoIndex uses section titles
103 as index entries this leads to a lot of repetition, so as an alternative AutoIndex
104 can be instructed to construct the index itself. This is faster than using
105 the XSL stylesheets, and now each index entry is a hyperlink to the
108 [$../students_t_eg_3.png]
110 With internal index generation there is also a helpful navigation bar
111 at the start of each Index:
113 [$../students_t_eg_4.png]
115 Finally, you can choose what kind of XML container wraps an internally generated index -
116 this defaults to `<section>...</section>` but you can use either command line options
117 or Boost.Build Jamfile features, to select an alternative wrapper - for example ['appendix]
118 or ['chapter] would be good choices, whatever fits best into the flow of the
119 document. You can even set the container wrapper to type ['index] provided you turn
120 off index generation by the XSL stylesheets, for example by setting the following
121 build requirements in the Jamfile:
124 <format>html:<auto-index-internal>on # Use internally generated indexes.
125 <auto-index-type>index # Use <index>...</index> as the XML wrapper.
126 <format>html:<xsl:param>generate.index=0 # Don't let the XSL stylesheets generate indexes.
129 [endsect] [/section:overview Overview]
131 [section:tut Getting Started and Tutorial]
133 [section:build Step 1: Build the AutoIndex tool]
135 [note This step is strictly optional, but very desirable to speed up build times.]
137 cd into `tools/auto_index/build` and invoke bjam as:
141 Optionally pass the name of the compiler toolset you want to use to bjam as well:
145 This will build the tool and place a copy in the current directory (which is to say `tools/auto_index/build`)
147 Now open up your `user-config.jam` file and at the end of the file add the line:
150 using auto-index : ['full-path-to-boost-tree]/tools/auto_index/build/auto-index.exe ;
154 This declaration must go towards the end of `user-config.jam`, or in any case after the Boostbook initialisation.
156 Also note that Windows users must use forward slashes in the paths in `user-config.jam`]
158 [endsect] [/section:build Step 1: Build the AutoIndex tool]
160 [section:configure Step 2: Configure Boost.Build jamfile to use AutoIndex]
162 Assuming you have a Jamfile for building your documentation that looks
170 # build requirements go here:
176 [pre using auto-index ; ]
178 to the start of the Jamfile, and then add whatever auto-index options
179 you want to the ['build requirements section], for example:
186 # Build requirements go here:
188 # <auto-index>on (or off) one turns on (or off) indexing:
191 # Turns on (or off) auto-index-verbose for diagnostic info.
192 # This is highly recommended until you have got all the many details correct!
193 <auto-index-verbose>on
195 # Choose the indexing method (separately for html and PDF) - see manual.
196 # Choose indexing method for PDFs:
197 <format>pdf:<auto-index-internal>off
199 # Choose indexing method for html:
200 <format>html:<auto-index-internal>on
202 # Set the name of the script file to use (index.idx is popular):
203 <auto-index-script>index.idx
204 # Commands in the script file should all use RELATIVE PATHS
205 # otherwise the script will not be portable to other machines.
206 # Relative paths are normally taken as relative to the location
207 # of the script file, but we can add a prefix to all
208 # those relative paths using the <auto-index-prefix> feature.
209 # The path specified by <auto-index-prefix> may be either relative or
210 # absolute, for example the following will get us up to the boost root
211 # directory for most Boost libraries:
212 <auto-index-prefix>..\/..\/..
214 # Tell Quickbook that it should enable indexing.
215 <quickbook-define>enable_index ;
220 [section:options Available Indexing Options]
222 The available options are:
225 [[<auto-index>off/on][Turns indexing of the document on, defaults to
226 "off", so be sure to set this if you want AutoIndex invoked!]]
227 [[<auto-index-internal>off/on][Chooses whether AutoIndex creates the index
228 itself (feature on), or whether it simply inserts the necessary DocBook
229 markup so that the DocBook XSL stylesheets can create the index. Defaults to "off".]]
230 [[<auto-index-script>filename][Specifies the name of the script to load.]]
231 [[<auto-index-no-duplicates>off/on][When ['on] AutoIndex will only index a term
232 once in any given section, otherwise (the default) multiple index entries per
233 term may be created if the term occurs more than once in the section.]]
234 [[<auto-index-section-names>off/on][When ['on] AutoIndex will use create two
235 index entries for each term found - one uses the term itself as the primary
236 index key, the other uses the enclosing section name. When off the index
237 entry that uses the section title is not created. Defaults to "on"]]
238 [[<auto-index-verbose>off/on][Defaults to "off". When turned on AutoIndex
239 prints progress information - useful for debugging purposes during setup.]]
240 [[<auto-index-prefix>filename][Optionally specifies a directory to apply
241 as a prefix to all relative file paths in the script file.
243 You may wish to do this to reduce typing of pathnames, and\/or where the
244 paths can't be located relative to the script file location,
245 typically if the headers are in the Boost trunk,
246 but the script file is in Boost sandbox.
248 For Boost standard library layout,
249 [^<auto-index-prefix>..\/..\/..] will get you back up to the 'root' of the Boost tree,
250 so [^!scan-path boost\/mylibrary\/] is where your headers will be, and [^libs\/mylibrary] for other files.
251 Without a prefix all relative paths are relative to the location of the script file.
254 [[<auto-index-type>element-name][Specifies the name of the XML element in which to enclose an internally generated indexes:
255 defaults to ['section], but could equally be ['appendix] or ['chapter] or some other block level element that has a formal title.
256 The actual list of available options depends upon the Quickbook document type, the following table gives the available options,
257 assuming that the index is placed at the top level, and not in some sub-section or other container:]]
261 [[Document Type][Permitted Index Types]]
262 [[book][appendix index article chapter reference part]]
263 [[article][section appendix index sect1]]
264 [[chapter][section index sect1]]
265 [[library][The same as Chapter (section index sect1)]]
266 [[part][appendix index article chapter reference]]
267 [[appendix][section index sect1]]
268 [[preface][section index sect1]]
269 [[qandadiv][N/A: an index would have to be placed within a subsection of the document.]]
270 [[qandaset][N/A: an index would have to be placed within a subsection of the document.]]
271 [[reference][N/A: an index would have to be placed within a subsection of the document.]]
272 [[set][N/A: an index would have to be placed within a subsection of the document.]]
275 In large part then the choice of `<auto-index-type>element-name` depends on the
276 formatting you want to be applied to the index:
279 [[XML Container Used for the Index][Formatting Applied by the XSL Stylesheets]]
280 [[appendix][Starts a new page.]]
281 [[article][Starts a new page.]]
282 [[chapter][Starts a new page.]]
283 [[index][Starts a new page only if it's contained within an article or book.]]
284 [[part][Starts a new page.]]
285 [[reference][Starts a new page.]]
286 [[sect1][Starts a new page as long as it's not the first section (but is controlled by the XSL parameters chunk.section.depth and/or chunk.first.sections).]]
287 [[section][Starts a new page as long as it's not the first section or nested within another section (but is controlled by the XSL parameters chunk.section.depth and/or chunk.first.sections).]]
290 In almost all cases the default (section) is the correct choice - the exception is when the index is to be placed
291 directly inside a /book/ or /part/, in which case you should probably use the same XML container for the index as
292 you use for whatever subdivisions are in the /book/ or /part/. In any event placing a /section/ within a /book/ or
293 /part/ will result in invalid XML.
295 Finally, if you are using Quickbook to generate the documentation, then you may wish to add:
297 [pre <include>$boost-root/tools/auto_index/include]
299 to your projects requirements (replacing $boost-root with the path to the root of the Boost tree), so that
300 the file auto_index_helpers.qbk can be included in your quickbook source with simply a:
302 [pre \[include auto_index_helpers.qbk\]]
304 [endsect] [/section:options Available Indexing Options]
306 [section:optional Making AutoIndex optional]
308 It is considerate to make the [*use of auto-index optional] in Boost.Build,
309 to allow users who do not have AutoIndex installed to still be able to build your documentation.
311 This also very convenient while you are refining your documentation,
312 to allow you to decide to build indexes, or not:
313 building indexes can take long time, if you are just correcting typos,
314 you won't want to wait while you keep rebuilding the index!
316 One method of setting up optional AutoIndex support is to place all
317 AutoIndex configuration in a the body of a bjam if statement:
320 if --enable-index in \[ modules.peek : ARGV \]
322 ECHO "Building the docs with automatic index generation enabled." ;
325 project : requirements
327 <auto-index-script>index.idx
329 ... other AutoIndex options here...
331 # And tell Quickbook that it should enable indexing.
332 <quickbook-define>enable_index
337 ECHO "Building the my_library docs with automatic index generation disabled. To get an Index, try building with --enable-index." ;
341 You will also need to add a conditional statement at the end of your Quickbook file,
342 so that the index(es) is/are only added after the last section if indexing is enabled.
345 \[\? '''enable_index'''
353 To use this jamfile, you need to cd to your docs folder, for example:
355 cd \boost-sandbox\guild\mylibrary\libs\mylibrary\doc
357 and then run `bjam` to build the docs without index, for example:
359 bjam -a html > mylibrary_html.log
363 bjam -a html --enable-index > mylibrary_html_index.log
365 [endsect] [/section:optional Making AutoIndex optional]
367 [tip Always send the output to a log file.
368 It will contain of lot of stuff, but is invaluable to check if all has gone right,
369 or else diagnose what has gone wrong.
372 [tip A return code of 0 is not a reliable indication
373 that you have got what you really want -
374 inspecting the log file is the only certain way.
377 [tip If you upgrade compiler version, for example MSVC from 9 to 10,
378 then you may need to rebuild Autoindex
379 to avoid what Microsoft call a 'side-by-side' error.
380 And make sure that the autoindex.exe version you are using is the new one.
383 [endsect] [/section:configure Step 2: Configure Boost.Build to use AutoIndex]
385 [section:add_indexes Step 3: Add indexes to your documentation]
387 To add a single "include everything" index to a BoostBook\/Docbook document,
388 (perhaps generated using Quickbook, and perhaps also using Doxygen reference section),
389 add `<index/>` at the location where you want the index to appear.
390 The index will be rendered as a separate section called "Index"
391 when the documentation is built.
393 To add multiple indexes, then give each one a title and set its
394 `type` attribute to specify which terms will be included, for example
395 to place the ['function], ['class], ['macro] or ['typedef] names
396 indexed by ['AutoIndex] in separate indexes along with a main
397 "include everything" index as well, one could add:
400 <index type\="class_name">
401 <title>Class Index<\/title>
404 <index type\="typedef_name">
405 <title>Typedef Index<\/title>
408 <index type\="function_name">
409 <title>Function Index<\/title>
412 <index type\="macro_name">
413 <title>Macro Index<\/title>
419 [note Multiple indexes like this only work correctly if you tell the XSL stylesheets
420 to honor the "type" attribute on each index as by default [/[*they do not do this]].
421 You can turn the feature on by adding `<xsl:param>index.on.type=1` to your projects
422 requirements in the Jamfile.]
424 In Quickbook, you add the same markup but enclose it between two triple-tick \'\'\' escapes,
427 [pre \'\'\'<index\/>\'\'\' ]
429 Or more easily via the helper file auto_index_helpers.qbk, so that given:
431 [pre \[include auto_index_helpers.qbk\]]
433 one can simply write:
436 \[named_index class_name Class Index\]
437 \[named_index function_name Function Index\]
438 \[named_index typedef_name Typedef Index\]
439 \[named_index macro_name Macro Index\]
443 [note AutoIndex knows nothing of the XML `xinclude` element, so if
444 you're writing raw Docbook XML then you may want to run this through an
445 XSL processor to flatten everything to one XML file before passing to
446 AutoIndex. If you're using Boostbook or quickbook though, this all
447 happens for you anyway, and AutoIndex will index the whole document
448 including any sections included with `xinclude`.]
450 If you are using AutoIndex's internal index generation on
453 <auto-index-internal>on
455 (usually recommended for HTML output, but ['not] the default)
456 then you can also decide what kind of XML wrapper the generated index is placed in.
457 By default this is a `<section>...</section>` XML block (this replaces the original
458 `<index>...</index>` block). However, depending upon the structure of the document
459 and whether or not you want the index on a separate page - or else on the front page after
460 the TOC - you may want to place the index inside a different type of XML block. For example
461 if your document uses `<chapter>` top level content rather than `<section>`s then
462 it may be preferable to place the index in a `<chapter>` or `<appendix>` block.
463 You can also place the index inside an `<index>` block if you prefer, in which case the index
464 does not appear in on a page of its own, but after the TOC in the HTML output.
466 You control the type of XML block used by setting the =<auto-index-type>element-name=
467 attribute in the Jamfile, or via the `index-type=element-name` command line option to
468 AutoIndex itself. For example, to place the index in an appendix, your Jamfile might
475 xml mylibrary : mylibary.qbk ;
480 # auto-indexing is on:
483 # PDFs rely on the XSL stylesheets to generate the index:
484 <format>pdf:<auto-index-internal>off
486 # HTML output uses auto-index to generate the index:
487 <format>html:<auto-index-internal>on
489 # Name of script file to use:
490 <auto-index-script>index.idx
492 # Set the XML wrapper for HML Indexes to "appendix":
493 <format>html:<auto-index-type>appendix
495 # Turn on multiple index support:
496 <xsl:param>index.on.type=1
500 [endsect] [/section:add_indexes Step 3: Add indexes to your documentation]
502 [section:script Step 4: Create the .idx script file - to control what to terms to index]
504 AutoIndex works by reading a script file that tells it what terms to index.
506 If your document contains largely text, and only a small amount of simple C++,
507 and/or if you are using Doxygen to provide a C++ Reference section
508 (that lists the C++ elements),
509 and/or if you are relying on the indexing provided from a Standalone Doxygen Index,
510 you may decide that a index is not needed
511 and that you may only want the text part indexed.
513 But if you want C++ classes functions, typedefs and/or macros AutoIndexed,
514 optionally, the script file also tells which other C++ files to scan.
516 At its simplest, it will scan one or more headers for terms that
517 should be indexed in the documentation. So for example to scan
518 "myheader.hpp" the script file would just contain:
521 !scan mydetailsheader.hpp
523 Or, more likely in practice, so
524 we can recursively scan through directories looking for all
525 the files to scan whose [*name matches a particular regular expression]:
527 [pre !scan-path "boost\/mylibrary" ".*\.hpp" true ]
529 Each argument is whitespace separated and can be optionally
530 enclosed in "double quotes" (recommended).
532 The final ['true] argument indicates
533 that subdirectories in `/boost/math/mylibrary` should be searched
534 recursively in addition to that directory.
536 [caution The second ['file-name-regex] argument is a regular expression and not a filename GLOB!]
538 [caution The scan-path is modified by any setting of <auto-index-prefix>.
539 The examples here assume that this is [^<auto-index-prefix>..\/..\/..]
540 so that `boost/mylibrary` will be your header files,
541 `libs/mylibrary/doc` will contain your documentation files and
542 `libs/mylibrary/example` will contain your examples.
545 You could also scan any examples (.cpp) files,
546 typically in folder `/mylibrary/lib/example`.
549 # All example source files, assuming no sub-folders.
550 !scan-path "libs\/mylibrary\/example" ".*\.cpp"
553 Often the ['scan] or ['scan-path] rules will bring in too many terms
554 to search for, so we need to be able to exclude terms as well:
558 Which excludes the term "type" from being indexed.
560 We can also add terms manually:
564 will index occurrences of "foobar" and:
566 foobar \<\w*(foo|bar)\w*\>
568 will index any whole word containing either "foo" or "bar" within it,
569 this is useful when you want to index a lot of similar or related
570 words under one entry, for example:
574 Will only index occurrences of "reflex" as a whole word, but:
578 will index occurrences of "reflex", "reflexing" and
579 "reflexed" all under the same entry ['reflex].
580 You will very often need to use this to deal with plurals and other variants.
582 This inclusion rule can also restrict the term to
583 certain sections, and add an index category that
584 the term should belong to (so it only appears in certain
587 Finally the script can add rewrite rules, that rename section names
588 that are automatically used as index entries. For example we might
589 want to remove leading "A" or "The" prefixes from section titles
590 when AutoIndex uses them as an index entry:
592 !rewrite-name "(?i)(?:A|The)\s+(.*)" "\1"
594 [endsect] [/section:script Step 4: Create the script file - to control what to terms to index]
596 [section:entries Step 5: Add Manual Index Entries to Docbook XML - Optional]
598 If you add manual `<indexentry>` markup to your Docbook XML then these will be
599 passed through unchanged. Please note however, that if you are using
600 AutoIndex's internal index generation then it only recognises
601 `<primary>`, `<secondary>` and `<tertiary>` elements within the `<indexterm>`.
602 `<see>` and `<seealso>` elements are not currently recognised
603 and AutoIndex will emit a warning if these are used.
605 Likewise none of the attributes which can be applied to these elements are used when
606 AutoIndex generates the index itself, with the exception of the `<type>` attribute.
608 For Quickbook users, there are some templates in auto_index_helpers.qbk that assist
609 in adding manual entries without having to escape to Docbook.
611 [endsect] [/section:entries Step 5: Add Manual Index Entries to Docbook XML - Optional]
613 [section:pis Step 6: Using XML processing instructions to control what gets indexed.]
615 Sometimes when you need to exclude certain sections of text from indexing,
616 then you can achieve this with the following XML processing instructions:
619 [[Instruction][Effect]]
620 [[`<?BoostAutoIndex IgnoreSection?>`]
621 [Causes the whole of the current section to be excluded from indexing.
622 By "section" we mean either a true "section" or any sibling XML element:
623 "dedication", "toc", "lot", "glossary", "bibliography", "preface", "chapter",
624 "reference", "part", "article", "appendix", "index", "setindex", "colophon",
625 "sect1", "refentry", "simplesect", "section" or "partintro".]]
626 [[`<?BoostAutoIndex IgnoreBlock?>`]
627 [Causes the whole of the current text block to be excluded from indexing.
628 A text block may be any of the section/chapter elements listed above, or a
629 paragraph, code listing, table etc. The complete list is:
630 "calloutlist", "glosslist", "bibliolist", "itemizedlist", "orderedlist",
631 "segmentedlist", "simplelist", "variablelist", "caution", "important", "note",
632 "tip", "warning", "literallayout", "programlisting", "programlistingco",
633 "screen", "screenco", "screenshot", "synopsis", "cmdsynopsis", "funcsynopsis",
634 "classsynopsis", "fieldsynopsis", "constructorsynopsis",
635 "destructorsynopsis", "methodsynopsis", "formalpara", "para", "simpara",
636 "address", "blockquote", "graphic", "graphicco", "mediaobject",
637 "mediaobjectco", "informalequation", "informalexample", "informalfigure",
638 "informaltable", "equation", "example", "figure", "table", "msgset", "procedure",
639 "sidebar", "qandaset", "task", "productionset", "constraintdef", "anchor",
640 "bridgehead", "remark", "highlights", "abstract", "authorblurb" or "epigraph".]]
643 For Quickbook users the file auto_index_helpers.qbk contains a helper template
644 that assists in inserting these processing instructions, for example:
646 [pre \[AutoIndex IgnoreSection\]]
648 Will cause that section to not be indexed.
650 [endsect] [/section:pis Step 6: Using XML processing instructions to control what gets indexed.]
652 [section:build_docs Step 7: Build the Docs]
654 Using Boost.Build you build the docs with either:
656 bjam release > mylibrary_html.log
658 To build the html docs or:
660 bjam pdf release > mylibrary_pdf.log
664 During the build process you should see AutoIndex emit a message in the log file
667 [pre Indexing 990 terms... ]
669 If you don't see that, or if it's indexing 0 terms then something is wrong!
671 Likewise when index generation is complete, AutoIndex will emit another message:
673 [pre 38 Index entries were created.]
675 Again, if you see that 0 entries were created then something is wrong!
677 Examine the log file, and if the cause is not obvious,
678 make sure that you have [^<auto-index-verbose>on] and that
680 [^!debug regular-expression] directives are in your script file.
682 [endsect] [/section:build_docs Step 7: Build the Docs]
684 [section:refine Step 8: Iterate - to refine your index]
686 Creating a good index is an iterative process, often the first step is
687 just to add a header scanning rule to the script file and then generate
688 the documentation and see:
691 * What's been included that shouldn't be.
692 * What's been included under a poor name.
694 Further rules can then be added to the script to handle these cases
695 and the next iteration examined, and so on.
697 [tip If you don't understand why a particular term is (or is not) present in the index,
698 try adding a ['!debug regular-expression]
699 directive to the [link boost_autoindex.script_ref script file].
702 [heading Restricting which Sections are indexed for a particular term]
704 You can restrict which sections are indexed for a particular term.
705 So assuming that the docbook document has the usual hierarchical names for section ID's
706 (as Quickbook generates, for example),
707 you can easily place a constraint on which sections are examined for a particular term.
709 For example, if you want to index occurrences of Lord Kelvin's name,
710 but only in the introduction section, you might then add:
712 Kelvin "" ".*introduction.*"
715 assuming that the section ID of the intro is "some_library_or_chapter_name.introduction".
717 This would avoid an index entry every time 'Kelvin' is found,
718 something the user is unlikely to find helpful.
720 [endsect] [/section:refine Step 8: Iterate - to refine your index]
722 [endsect] [/section:tut Getting Started and Tutorial]
725 [section:script_ref Script File (.idx) Reference]
727 The following elements can occur in a script:
729 [h4 Comments and blank lines]
731 Blank lines consisting of only whitespace are ignored, so are lines that [*start with a #].
733 [note You can't append \# comments onto the end of a line\!]
735 [h4 Inclusion of Index terms]
737 term [regular-expression1 [regular-expression2 [category]]]
743 The index term will form a primary entry in the Index
744 with the section title(s) containing the term as secondary entries, and
745 also will be used as a secondary entry beneath each of the section
746 titles that the index term occurs in.]
749 [[regular-expression1][
750 ['Index term Searcher.]
752 An optional regular expression: each occurrence
753 of the regular expression in the text of the document will result
754 in one index term being emitted.
756 If the regular expression is omitted (default) or is "", then the ['index term] itself
757 will be used as the search text - and only occurrence of whole words matching
758 ['index term] will be indexed.
764 will index occurrences of "foobar" in any section, but
766 ``foobar \<\w*(foo|bar)\w*\>``
768 will index any whole word containing either "foo" or "bar" within it.
769 This is useful when you want to index a lot of similar or related words under one entry.
773 will only index occurrences of "reflex" as a whole word, but:
775 ``reflex \<reflex\w*\>``
777 will index occurrences of "reflex", "reflexes", "reflexing" and "reflexed" ...
778 all under the same entry reflex.
780 You will very often need to use this to deal with plurals and other variants.]
781 ] [/regular-expression1]
783 [[regular-expression2]
784 [['Section(s) Selector.]
786 A constraint that specifies which sections are
787 indexed for ['term]: only if the ID of the section matches
788 ['regular-expression2] exactly will that section be indexed
789 for occurrences of ['term].
791 For example, to limit indexing to just [*one specific section] (but not sub-sections below):
793 ``myclass "" "mylib\.examples"``
796 For example, to limit indexing to specific sections, [*and sub-sections below]:
798 ``myclass "" "mylib\.examples.*"``
800 will index occurrences of "myclass" as a whole word,
801 but only in sections whose section ID [*begins] "mylib.examples", while
803 ``myclass "\<myclass\w*\>" "mylib\.examples.*"``
805 will also index plurals myclass, myclasses, myclasss ...
809 ``myclass "" "(?!mylib\.introduction).*"``
811 will index occurrences of "myclass" in any section,
812 except those whose section IDs begin "mylib.introduction".
814 Finally, two (or more) sections can be excluded by OR'ing them together:
816 ``myclass "" "(?!mylib\.introduction|mylib\.reference).*"``
818 which excludes searching for this term in sections whose ID's start with either "mylib.introduction" or "mylib.reference".
820 If this third section selection field is omitted (the default)
821 or is "", then [*all sections] are indexed for this term.
823 ] [/regular-expression2]
826 ['Index Category Constraint.]
828 Optionally a category to place occurrences of ['index term] in.
829 If you have multiple indexes then this is the name
830 assigned to the indexes "type" attribute.
834 myclass "" "" class_name
836 Will index occurances of ['myclass] and place them in the class-index if there is one.
842 You can have an index term appear more than once in the script file:
844 * If they have different /category/ names then they are treated quite separately.
845 * Otherwise they are combined, so that the logical or of the regular expressions provided are taken.
849 myterm search_expression1 constrait_expression2 foo
850 myterm search_expression1 constrait_expression2 bar
852 Will be treated as different terms each with their own entries, while:
854 myterm search_expression1 constrait_expression2 mycategory
855 myterm search_expression1 constrait_expression2 mycategory
857 Will be combined into a single term equivalent to:
859 myterm (?:search_expression1|search_expression1) (?:constrait_expression2|constrait_expression2) mycategory
861 [h4 Source File Scanning]
863 !scan source-file-name
865 Scans the C\/C++ source file ['source-file-name] for definitions of
866 ['function]s, ['class]s, ['macro]s or ['typedef]s and makes each of
867 these a term to be indexed. Terms found are assigned to the index category
868 "function_name", "class_name", "macro_name" or "typedef_name" depending
869 on how they were seen in the source file. These may then be included
870 in a specialised index whose "type" attribute has the same category name.
873 When actually indexing a document, the scanner will not index just any old occurrence of the
874 terms found in the source files. Instead it searches for class definitions or function or
875 typedef declarations. This reduces the number of spurious matches placed in the index, but
876 may also miss some legitimate terms:
877 refer to the /define-scanner/ command for information on how to change this.
880 [h4 Directory and Source File Scanning]
882 !scan-path directory-name file-name-regex [recurse]
885 [[directory-name][The directory to scan: this should be a path relative
886 to the script file (or to the path specified with the prefix=path option on the command line)
887 and should use all forward slashes in its file name.]]
889 [[file-name-regex][A regular expression: any file in the directory whose name
890 matches the regular expression will be scanned for terms to index.]]
892 [[recurse][An optional boolean value - either "true" or "false" - that
893 indicates whether to recurse into subdirectories. This defaults to "false".]]
900 Excludes all the terms in whitespace separated ['term-list] from being indexed.
901 This should be placed /after/ any ['!scan] or ['!scan-path] rules which may
902 result in the terms becoming included. In other words this removes terms from
903 the scanners internal list of things to index.
905 [h4 Rewriting Section Names]
907 [pre !rewrite-id regular-expression new-name]
910 [[regular-expression][A regular expression: all section ID's that match
911 the expression exactly will have index entries ['new-name] instead of
914 [[new-name][The name that the section will appear under in the index.]]
917 !rewrite-name regular-expression format-text
920 [[regular-expression][A regular expression: all sections whose titles
921 match the regular expression exactly, will have index entries composed
922 of the regular expression match combined with the regex format string
924 [[format-text][The Perl-style format string used to reformat the title.]]
930 !rewrite-name "(?:A|An|The)\s+(.*)" "\1"
933 Will remove any leading "A", "An" or "The" from all index entries - thus preventing lots of
934 entries under "The" etc!
936 [h4 Defining or Changing the File Scanners]
938 !define-scanner type file-search-expression xml-regex-formatter term-formatter id-filter filename-filter
940 When a source file is scanned using the =!scan= or =!scan-path= rules, then the file is searched using
941 a series of regular expressions to look for classes, functions, macros or typedefs that should be indexed.
942 A set of default regular expressions are provided for this (see below), but sometimes you may want to replace
943 the defaults, or add new scanners. The arguments to this rule are:
946 [[type][The ['type] to which items found using this rule will assigned, index terms created from the
947 source file and then found in the XML, will have the type attribute set to this value, and may then appear in a
948 specialized index with the same type attribute]]
949 [[file-search-expression][A regular expression that is used to scan the source file for index terms, the result of
950 a match against this expression will be transformed by the next two arguments.]]
951 [[xml-regex-formatter][A regular expression format string that extracts the salient information from whatever
952 matched the ['file-search-expression] in the source file, and creates ['a new regular expression] that will
953 be used to search the document being indexed for occurrences of this index term.]]
954 [[term-formatter][A regular expression format string that extracts the salient information from whatever
955 matched the ['file-search-expression] in the source file, and creates the index term that will appear in
957 [[id-filter][Optional. A regular expression that restricts the section-id's that are searched in the document being indexed:
958 only sections whose ID attribute matches this expression exactly will be considered for indexing terms found by this scanner.]]
959 [[filename-filter][Optional. A regular expression that restricts which files are scanned by this scanner: only files whose file name
960 matches this expression exactly will be scanned for index terms to use. Note that the filename matched against this may
961 well be an absolute path, and contain either forward or backward slash path separators.]]
964 If, when the first file is scanned, there are no scanners whose ['type] is "class_name", "typedef_name", "macro_name" or
965 "function_name", then the defaults are installed. These are equivalent to:
967 !define-scanner class_name "^[[:space:]]*(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\<\w+\>([[:blank:]]*\([^)]*\))?[[:space:]]*)*(\<\w*\>)[[:space:]]*(<[^;:{]+>)?[[:space:]]*(\{|:[^;\{()]*\{)" "(?:class|struct)[^;{]+\<\5\>[^;{]+\{" \5
968 !define-scanner typedef_name "typedef[^;{}#]+?(\w+)\s*;" "typedef[^;]+\<\1\>\s*;" "\1"
969 !define-scanner "macro_name" "^\s*#\s*define\s+(\w+)" "\<\1\>" "\1"
970 !define-scanner "function_name" "\w++(?:\s*+<[^>]++>)?[\s&*]+?(\w+)\s*(?:BOOST_[[:upper:]_]+\s*)?\([^;{}]*\)\s*[;{]" "\\<\\w+\\>(?:\\s+<[^>]*>)*[\\s&*]+\\<\1\\>\\s*\\([^;{]*\\)" "\1"
972 Note that these defaults are not installed if you have provided your own versions with these ['type] names. In this case if
973 you want the default scanners to be in effect as well as your own, you should include the above in your script file.
974 It is also perfectly allowable to have multiple scanners with the same ['type], but with the other fields differing.
976 Finally you should note that the default scanners are quite strict
977 in what they will find, for example the class
978 scanner will only create index entries for classes that have class definitions of the form:
980 class my_class : public base_classes
984 In the documentation, so that simple mentions of the class name will ['not] get indexed,
985 only the class synopsis if there is one.
986 If this isn't how you want things, then include the ['class_name] scanner definition
987 above in your script file, and change
988 the ['xml-regex-formatter] field to something more permissive, for example:
990 !define-scanner class_name "^[[:space:]]*(template[[:space:]]*<[^;:{]+>[[:space:]]*)?(class|struct)[[:space:]]*(\<\w+\>([[:blank:]]*\([^)]*\))?[[:space:]]*)*(\<\w*\>)[[:space:]]*(<[^;:{]+>)?[[:space:]]*(\{|:[^;\{()]*\{)" "\<\5\>" \5
992 Will look for ['any] occurrence of whatever class names the scanner may find in the documentation.
994 [h4 Debugging scanning]
996 If you see a term in the index, and you don't understand why it's there, add a ['debug] directive:
999 !debug regular-expression
1002 Now, whenever ['regular-expression] matches either the found index term,
1003 or the section title it appears in, or the ['type] field of a scanner, then
1004 some diagnostic information will be printed that will look something like:
1007 Debug term found, in block with ID: spirit.qi.reference.parser_concepts.parser
1008 Current section title is: Notation
1009 The main index entry will be : Notation
1010 The indexed term is: parser
1011 The search regex is: \[P\|p\]arser
1012 The section constraint is: .*qi.reference.parser_concepts.*
1013 The index type for this entry is: qi_index
1016 This can produce a lot of output in your log file,
1017 but until you are satisfied with your file selection and scanning process,
1018 it is worth switching it on.
1020 [endsect] [/section:script_ref Script File Reference]
1022 [section:workflow Understanding The AutoIndex Workflow]
1024 # Load the script file (usually index.idx)
1025 and process it one line at a time,
1026 producing one or more index term per (non-comment) line.
1028 # Reading all lines builds a list of ['terms to index].
1029 Some of those may be terms defined (by you) directly in the script file,
1030 others may be terms found by scanning C++ header and source files
1031 that were specified by the ['!scan-path] directive.
1033 # Once the complete list of ['terms to index] is complete,
1034 it loads the Docbook XML file.
1035 (If this comes from Quickbook\/Doxygen\/Boostbook\/Docbook then this is
1036 the complete documentation after conversion to Docbook format).
1038 # AutoIndex builds an internal __DocObjMod of the Docbook XML.
1039 This internal representation then gets scanned for occurrences of the ['terms to index].
1040 This scanning works at the XML paragraph level
1041 (or equivalent sibling such as a table or code block)
1042 - so all the XML encoding within a paragraph gets flattened to plain text.[br]
1043 This flattening means the regular expressions used to search for ['terms to index]
1044 can find anything that is completely contained within a paragraph
1045 (or code block etc).
1047 # For each term found then an ['indexterm] Docbook element is inserted
1048 into the __DocObjMod (provided internal index generation is off),
1050 # Also the AutoIndex's internal index representation gets updated.
1052 # Once the whole XML document has been indexed,
1053 then, if AutoIndex has been instructed to generate the index itself,
1054 it creates the necessary XML and inserts this into the __DocObjMod.
1056 # Finally the whole __DocObjMod is written out as a new Docbook XML file,
1057 and normal processing of this continues via the XSL stylesheets (with xsltproc)
1058 to actually build the final human-readable docs.
1060 [endsect] [/section:workflow AutoIndex Workflow]
1063 [section:xml XML Handling]
1065 AutoIndex is rather simplistic in its handling of XML:
1067 * When indexing a document, all block content at the paragraph level gets collapsed into a single
1068 string for matching against the regular expressions representing each index term. In other words,
1069 for the most part, you can assume that you're indexing plain text when writing regular expressions.
1070 * Named XML entities for &, ", ', < or > are converted to their corresponding characters before indexing
1071 a section of text. However, decimal or hex escape sequences are not currently converted.
1072 * Index terms are assumed to be plain text (whether they originate from the script file
1073 or from scanning source files) and the characters &, ", < and > will be escaped to
1074 & " < and > respectively.
1076 [endsect] [/section:xml XML Handling]
1078 [section:qbk Quickbook Support]
1080 The file auto_index_helpers.qbk in ['boost-path]/tools/auto_index/include contains various Quickbook
1081 templates to assist with AutoIndex support. One would normally add the above path to your include
1082 search path via an `<include>path` statement in your Jamfile, and then make the templates available
1083 to your Quickbook source via a:
1085 [pre \[include auto_index_helpers.qbk\]]
1087 statement at the start of your Quickbook file.
1089 The available templates are then:
1092 [[Template][Description]]
1093 [[`[index]`][Creates a main index, with no "type" category set, which will be titled simply "Index".]]
1094 [[`[named_index type title]`][Creates an index with the type attribute set to "type" and the title will be "title".[br]
1095 For example to create an index containing only class names one would typically add `[named_index class_name Class Index]`
1096 to your Quickbook source.]]
1097 [[`[AutoIndex Arg]`][Creates a Docbook processing instruction that will be handled by AutoIndex, valid values for "Arg"
1098 are either "IgnoreSection" or "IgnoreBlock".]]
1099 [[`[indexterm1 primary-key]`][Creates a manual index entry that will link to the current section, and have a single primary key "primary-key".
1100 Note that this index key will not have a "type" attribute set, and so will only appear in the main index.]]
1101 [[`[indexterm2 primary-key secondary-key]`][Creates a manual index entry that will link to the current section, and has
1102 "primary-key" and "secondary key" as the primary and secondary keys respectively.
1103 Note that this index key will not have a "type" attribute set, and so will only appear in the main index.]]
1104 [[`[indexterm3 primary-key secondary-key tertiary-key]`][Creates a manual index entry that will link to the current section,
1105 and have primary, secondary and tertiary keys: "primary-key", "secondary key" and "tertiary key".
1106 Note that this index key will not have a "type" attribute set, and so will only appear in the main index.]]
1108 [[`[typed_indexterm1 type primary-key]`][Creates a manual index entry that will link to the current section, and have a single primary key "primary-key".
1109 Note that this index key will have the "type" attribute set to the "type" argument, and so may appear in named sub-indexes
1110 that also have their type attribute set.]]
1111 [[`[typed_indexterm2 type primary-key secondary-key]`][Creates a manual index entry that will link to the current section, and has
1112 "primary-key" and "secondary key" as the primary and secondary keys respectively.
1113 Note that this index key will have the "type" attribute set to the "type" argument, and so may appear in named sub-indexes
1114 that also have their type attribute set.]]
1115 [[`[typed_indexterm3 type primary-key secondary-key tertiary-key]`][Creates a manual index entry that will link to the current section,
1116 and have primary, secondary and tertiary keys: "primary-key", "secondary key" and "tertiary key".
1117 Note that this index key will have the "type" attribute set to the "type" argument, and so may appear in named sub-indexes
1118 that also have their type attribute set.]]
1123 [section:comm_ref Command Line Reference]
1125 The following command line options are supported by AutoIndex:
1128 [[--in=infilename][Specifies the name of the XML input file to be indexed.]]
1129 [[--out=outfilename][Specifies the name of the new XML file to create.]]
1130 [[--scan=source-filename][Specifies that ['source-filename] should be scanned
1131 for terms to index.]]
1132 [[--script=script-filename][Specifies the name of the script file to process.]]
1133 [[--no-duplicates][If a term occurs more than once in the same section, then
1134 include only one index entry.]]
1135 [[--internal-index][Specifies that AutoIndex should generate the actual
1136 indexes rather than inserting `<indexterm>`s and leaving index generation
1137 to the XSL stylesheets.]]
1138 [[--no-section-names][Prevents AutoIndex from using section names as index entries.]]
1139 [[--prefix=pathname][Specifies a directory to apply as a prefix to all relative file paths in the script file.]]
1140 [[--index-type=element-name][Specifies the name of the XML element to enclose internally generated indexes in:
1141 defaults to ['section], but could equally be ['appendix] or ['chapter]
1142 or some other block level element that has a formal title.]]
1145 [endsect] [/section:comm_ref Command Line Reference]
1147 [include ../include/auto_index_helpers.qbk]