Extractor: a source code extractor
1. Overview
Extractor is a tool for extracting information from C++ or Java source files.
If you write technical documents like program documentation, how-to’s (like this) or other mostly software oriented documents, it is a very commen requirement to show some source-code. To limit the chances of documention errors, these source-code examples should be real world examples. This means, that they should be extracted from real code examples or production code and they should not be cut-and-pasted into the document or originally be written in the document. This leads to untested and unchecked examples with a high risk of being wrong (or outdated).
AsciiDoctor is capable of including source code files into the document.
But real world examples are mostly too long to be fully included. It would be much more readable if you can focus on the important parts of the source code and leave unrelevant parts out. This is what extractor is for:
-
extracting parts (snippets) from real world source code.
-
omitting lines of the source code
-
generating auto-callouts from comments of the source-code
-
highlighting parts of the source-code
2. Prerequisites
2.2. Libraries
The following system libraries are required:
-
from Boost: Boost-Regex (header-only library)
3. Installation
3.1. Download the source
Download the souce from sourceforge.
4. Usage
The following example will show the basic idea behind extractor.
4.1. Simple Example
Suppose you’ll document the following very simple C++ example:
test.cc#include <iostream>
#include <iomanip>
int main() {
return EXIT_SUCCESS;
}
It could be that you want to describe the portion of the souce file where the include directives are
in a special way. Therefore it would be nice to extract that part of the file like:
test.cc [Snippet: include]#include <iostream>
#include <iomanip>
In a later section of your documentation you want to emphasize the int main() function:
test.cc [Snippet: main]int main() {
return EXIT_SUCCESS;
}
For this to be possible without copying manually anything from your sources you have to mark these parts directly in your sources. These parts are called snippets.
//[<name> (1)
...
//] (2)
| 1 | Begin of snippet name |
| 2 | End of snippet name |
| Source snippets resemble the AsciiDoctor feature of include tags, but they have to be strictly nested. They must not overlap! |
With this you can annotate your real source code with the neccessary snippet definitions:
//[include
#include <iostream>
#include <iomanip>
//]
//[main
int main()
{
return EXIT_SUCCESS;
}
//]
Then you run the extractor for your file test.cc:
$ extractor test.cc
The outcome from this is the file test.extractor with the contents:
Snippet [ all [ ( 0 , 6 ) ] exclude [ ] ]
Snippet [ include [ ( 0 , 2 ) ] exclude [ ] ]
Snippet [ main [ ( 2 , 6 ) ] exclude [ ] ]
At the moment this file isn’t very useful (but if you want to automate building the whole documenation,
this file will become very handy), but if you look carefully into the directory of the file test.cc
you’ll find a newly create directory named .extractor:
total 12 -rw-r--r-- 1 lmeier lmeier 197 Apr 22 12:58 test.cc.all -rw-r--r-- 1 lmeier lmeier 175 Apr 22 12:58 test.cc.include -rw-r--r-- 1 lmeier lmeier 173 Apr 22 12:58 test.cc.main
These files contain the snippets, e.g. the file test.cc.main obviously contains the snippet main of
file test.cc. The file test.cc.all contains the full file test.cc but without the snippet definitions.
In your AsciiDoctor documentation files you can include these snippets files.
Especially usefull are defintions of some attributes like srcbase, srcdir and extractdir:
include::{srcbase}/{srcdir}/{extractordir}/test.cc.main[]
The contents of the snippet-file is asciidoc:
.Zeilen aus der Datei link:{srcbase}/{srcdir}/test.cc.html[`test.cc`,window="_new"] [Snippet: main]
[source,cpp,indent=0]
----
int main() {
return EXIT_SUCCESS;
}
----
|
Please note that this file contains If the snippet contains auto-callouts these will also be collected into the snippet file (s.a. Auto-Callouts). |
If you use the above include-macro in your documentation you’ll get the following result:
test.cc [Snippet: main]int main() {
return EXIT_SUCCESS;
}
Ok, that’s the simple story.
4.2. Source Annotations
There are several source annotations which extractor understands: snippets (simple or compound),
omitted lines, auto callouts and highlighting (marking).
4.2.1. Source snippets
Source snippets are divided into simple and compound snippets.
4.2.1.1. Simple snippets
As stated above simple snippets are defined by the special comments in the source files. Snippets must not be overlapping, but they can (and usually should) be nested:
//[<snippet1> (1)
...
//[<snippet2> (2)
...
//] (3)
...
//] (4)
| 1 | Begin of snippet1 |
| 2 | Begin of snippet2 |
| 3 | End of snippet2 |
| 4 | End of snippet1 |
The following source code gives an example of two nested snippets with names pragma and Abc:
nested.cc#include <cstdlib>
//[pragma
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused"
//[Abc
class Abc {
public:
Abc() {}
private:
int mX = 0;
};
//]
#pragma GCC diagnostic pop
//]
int main() {
return EXIT_SUCCESS;
}
To include them use the following line in your adoc-file for the outer snippet with name pragma:
include::{srcbase}/{srcdir}/{extractordir}/nested.cc.pragma[]
use the next line for the inner snippet with name `Abc':
include::{srcbase}/{srcdir}/{extractordir}/nested.cc.Abc[]
and you get for the outer snippet pragma:
nested.cc [Snippet: pragma]#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused"
class Abc {
public:
Abc() {}
private:
int mX = 0;
};
#pragma GCC diagnostic pop
Please note that the definition lines of the inner snippet Abc are excluded.
|
or for the inner snippet Abc:
nested.cc [Snippet: Abc]class Abc {
public:
Abc() {}
private:
int mX = 0;
};
4.2.1.2. Continuation of simple snippets
Simple snippets can be continued as in the following example:
simple.cc with simple snippet#include <iostream>
int main()
{
int x = 0;
//[out
std::cout << __PRETTY_FUNCTION__ << std::endl;
//]
x = 42;
//[out
std::cout << __cplusplus << std::endl;
//]
return x;
}
This will produce a continued snippet:
simple.cc [Snippet: out]std::cout << __PRETTY_FUNCTION__ << std::endl;
// ... lines omitted ...
std::cout << __cplusplus << std::endl;
|
Please note that the text between the parts ot the continued snippet
(here: Quelltext 16. Zeilen aus der Datei
simple.cc [Snippet: out]
|
| Please see also Missing Features. |
4.2.1.3. Compound snippets
Sometimes it is useful to exclude one or more nested snippets from an outer snippet. This can be done by subtracting one or more of the nested snippets from the outer one. You can use the follwing compund snippet definition syntax:
//[outer -inner (1)
//[inner (2)
//] (3)
//] (4)
| 1 | Defintion of the snippet outer without the nested snippet inner |
| 2 | Defintion of the nested snippet inner |
| 3 | End of snippet inner |
| 4 | Ende of snippet outer |
As said above you can freely nest the snippets as in the following example:
test.cc with simple and compound source snippets#include <iostream>
//[mainx -ret -out
//[mainout -ret
//[mainret -out
//[main
int main()
{
//[out
std::cout << __PRETTY_FUNCTION__ << std::endl;
std::cout << __cplusplus << std::endl;
//]
//[ret
return 0;
//]
}
//]
//]
//]
//]
|
Snippet called
There is one implicitly defined snippet called allall. It includes the whole source file but (normally) with
the snippet definitions themselves removed.
|
If you include the snippet all you get the whole file test.cc with the snippet definition lines removed:
test.cc#include <iostream>
int main() {
std::cout << __PRETTY_FUNCTION__ << std::endl;
std::cout << __cplusplus << std::endl;
return 0;
}
The snippet main displays as follows:
test.cc [Snippet: main]int main() {
std::cout << __PRETTY_FUNCTION__ << std::endl;
std::cout << __cplusplus << std::endl;
return 0;
}
The snippet mainret excludes the inner snippet out from the outer snippet main. So you get
the following result:
test.cc [Snippet: mainret]int main() {
// ...
return 0;
}
The excluded snippet itself is displayed as // … by default.
But you can customize that (see Using exclude Texts).
|
The same is true for snippet mainout subtracting ret:
test.cc [Snippet: mainout]int main() {
std::cout << __PRETTY_FUNCTION__ << std::endl;
std::cout << __cplusplus << std::endl;
// ...
}
For maximum flexibility you can remove more than one inner snippet from an outer snippet. This
is shown with snippet mainx:
test.cc [Snippet: mainx]int main() {
// ...
}
4.2.1.4. Using exclude Texts
As default a text like a C++-comment is show in case a snippet is excluded from another, outer snippet. If want to show some special text in this case use a exclude-text defintion.
//[outer -inner (1)
//[inner : The alternative exclude text (2)
//]
//]
| 1 | Defintion of the snippet outer without the nested snippet inner |
| 2 | The nested snippet inner with an alternative exclude-text The alternative exclude text |
Below is an example defining exclude-texts:
test2.cc with exclude texts#include <iostream>
//[mainout -ret
//[mainret -out
//[main
int main()
{
//[out : The output statements are not shown
std::cout << __PRETTY_FUNCTION__ << std::endl;
std::cout << __cplusplus << std::endl;
//]
//[ret : The return-statement omitted
return 0;
//]
}
//]
//]
//]
With these _exclude_texts the snippets are display as follows:
test2.cc [Snippet: mainout]int main() {
std::cout << __PRETTY_FUNCTION__ << std::endl;
std::cout << __cplusplus << std::endl;
// The return-statement omitted
}
test2.cc [Snippet: mainret]int main() {
// The output statements are not shown
return 0;
}
4.2.2. Omitted lines
Sometimes individual lines are distracting the readers attention and these lines should be excluded from the source shown. For this purpose extractor can marked lines.
Individual lines can be omitted with a special marker //-:
omit.cc with omitted lines#include <iostream>
//[main
int main()
{
int x = 0;
std::cout << __PRETTY_FUNCTION__ << std::endl; //-
x = 42;
std::cout << __cplusplus << std::endl; //-
return x;
}
//]
Below you see the snippet main with some lines omitted:
omit.cc [Snippet: main] (einige Zeilen nicht dargestellt)int main() {
int x = 0;
x = 42;
return x;
}
4.2.3. Auto-Callouts
A really nice feature are the auto-callouts.
Callouts in general are a really good way to annotate portions of some code. But there are problems using the normal way to define callouts:
-
If you use the _ include_-macro then you have to split the callout into two places: the source file and the documenation file.
-
The callout-text will only be visible in the documentation not in the source file.
-
Using snippets together with callouts one get problems with the numbering of the callouts.
These problems are avoided using the so called auto-callouts:
-
These are fully specified in the source file, so the reader of the original source file still gets the callout text.
-
There is only one place to define the callout: the source file.
-
All auto-callouts are automatically numbered appropriate to the snippets included.
In the following example we have two auto-callouts. One just omits the number of the callout writing:
int x = 0; // <> Initialization of _variable_ `x` with value `0`
callout.cc with _auto-callouts.#include <iostream>
//[main
int main()
{
//[a
int x = 0; // <> Initialization of _variable_ `x` with value `0`
//]
std::cout << __PRETTY_FUNCTION__ << std::endl;
//[b
x = 42; // <> Copy-assignment
//]
std::cout << __cplusplus << std::endl;
//[c
return x; // <> returning the value to the caller
//]
}
//]
With auto-callouts the following snippet file will be generated. As you can see, the callouts are numbered and the text of the source comments are transfered to the callout-definition.
.Zeilen aus der Datei link:{srcbase}/{srcdir}/callout.cc.html[`callout.cc`,window="_new"] [Snippet: main]
[source,cpp,indent=0]
----
int main() {
int x = 0; (1)
std::cout << __PRETTY_FUNCTION__ << std::endl;
x = 42; (2)
std::cout << __cplusplus << std::endl;
return x; (3)
}
----
<1> Initialization of _variable_ `x` with value `0`
<2> Copy-assignment
<3> returning the value to the caller
| Using this feature you can leave the text for the callouts bundled with the source itself. Therefore the code author sees the very same callout-text in the comment as the documentation reader. |
callout.cc [Snippet: main]int main() {
int x = 0; (1)
std::cout << __PRETTY_FUNCTION__ << std::endl;
x = 42; (2)
std::cout << __cplusplus << std::endl;
return x; (3)
}
| 1 | Initialization of variable x with value 0 |
| 2 | Copy-assignment |
| 3 | returning the value to the caller |
If you include different snippets the callout numbering will be adapted:
callout.cc [Snippet: a]int x = 0; (1)
| 1 | Initialization of variable x with value 0 |
callout.cc [Snippet: b]x = 42; (1)
| 1 | Copy-assignment |
callout.cc [Snippet: c]return x; (1)
| 1 | returning the value to the caller |
| Please see also Missing Features. |
4.5. Command Line Options
extractor has the following commandline option:
extractor201406 CommandLineOption[ h , help ] CommandLineOption[ v , verbose ] CommandLineOption[ se , skipEmptyLines , skipemptylines ] CommandLineOption[ sb , skipBlockComments , skipblockcomments ] CommandLineOption[ ss , skipSnippetDefs , skipsnippetdefs ] CommandLineOption[ sc , skipCallouts , skipcallouts ] CommandLineOption[ sm , skipMultiSnippetDeliminter , skipdelimiter ] CommandLineOption[ sx , skipExcludeMarker , skipexclude ] CommandLineOption[ sh , skipHighlighting , skiphighlight ] CommandLineOption[ ee , enableEmptyLines , enableemptylines ] CommandLineOption[ eb , enableBlockComments , enableblockcomments ] CommandLineOption[ es , enableSnippetDefs , enablesnippetdefs ] CommandLineOption[ ec , enableCallouts , enablecallouts ] CommandLineOption[ eh , enableHighlighting , enablehighlight ] CommandLineOption[ io , includeOmitted , includeomitted ] CommandLineOption[ in , indent , indentlevel ] CommandLineOption[ l , lang , language ] CommandLineOption[ a , astyle , astyleoptions ] CommandLineOption[ o , output ] CommandLineOption[ d , subdir ] CommandLineOption[ x , nosnippets , filteronly ] CommandLineOption[ n , linenums , linenumbers ]
| Option |
Description |
-h
|
help message |
-v
|
be more verbose |
--se
|
(default): don’t print empty lines in snippet output to be more condensed. |
--ee
|
opposite of above |
--sb
|
(default): don’t include block-comments
(like |
--eb
|
opposite of above |
--ss
|
(default): don’t include the snippets definition lines themselves into the snippet files. |
--es
|
opposite of above |
--sc
|
(default): don’t include callouts (see Auto-Callouts) into the snippet files |
--ec
|
opposite of above |
--io
|
include omitted lines (see Omitted lines) |
-l
|
set the language (see AsciiDoctor) |
-a
|
set additional options for AStyle |
-o
|
set the output filename for the snippet database file (default: |
-d
|
set the directory for generated snippet files (default: |
-x
|
don’t generate snippet files, just pass the source through |
5. Other useful tools
5.1. Automating the documentation process
The snippets are best generated when building the source itself: if the source changes the snippets must chance too. Automating this whole process would be best.
5.1.1. Using make to generate to snippets
make is a content-agnostik build tool, therefore it could be used for automating the documentation generating process.
If the project is already build with GNU make, then with some additional rules all needed information can be generated very simple.
make (Makefile) to generate the source snippets and the snippet-database fileEXTRACTOR = extractor (1)
EXTRACTDIR = .extractor (2)
%.cc.extract: %.cc (3)
$(EXTRACTOR) -lcpp -aA2 -o$@ -d$(EXTRACTDIR) $< (4)
%.h.extract: %.h (5)
$(EXTRACTOR) -lcpp -aA2 -o$@ -d$(EXTRACTDIR) $<
%.java.extract: %.java (6)
$(EXTRACTOR) -ljava -aA2 -o$@ -d$(EXTRACTDIR) $<
| 1 | The EXTRACTOR-variable contains the path to the extractor-executable |
| 2 | Use this directory to put the snippets in |
| 3 | Rule to generate the snippets-database file (e.g. test.cc.extract) from a source file (e.g. test.cc) |
| 4 | Use cpp as language setting an A2-syling for astyle |
| 5 | Same rule for header files |
| 6 | Same rule for java-files |
With the following rules you can generate the needed html-files for the links in the snippet-files:
make (Makefile) to generate highlighted versions of the source files using GNU source-highlightSRCHI = source-highlight (1)
%.cc.html: %.cc
$(EXTRACTOR) -x --eb --io $< | $(SRCHI) -scpp > $@ (2)
%.h.html: %.h
$(EXTRACTOR) -x --eb --io $< | $(SRCHI) -scpp > $@
%.java.html: %.java
$(EXTRACTOR) -x --eb --io $< | $(SRCHI) -sjava > $@
| 1 | source-highlight is used to produce the html versions of the source files |
| 2 | extractor with special options is used to eliminate the snippet definitions but include otherwise omitted lines |
6. Missing Features
As you work with extractor you’ll surely notice, that there are (many) missing features. The main reason for this, that these feature aren’t relevant for me at the moment. If you need one of the following, it should be simple to add them. Please inform me if you are working on creating a patch.
-
setting the text between continued snippets (
// .. lines omitted …). -
customization of the texts shown in captions if snippets are used.
-
customization of the texts shown in captions if omitted lines are used.