Previous	Up	Next
	~~Down~~

Use this link if you want more information about Open Source project support services.

This project's summary page on SourceForge.net.

Suggestions on how to improve this project are welcome here.

Status and Text Reference Coding

Implementation Notes

Note that this document is intended to be used along with the Programer's Guide and Maintainer's Notes.

Directory structure

The main differences between the implementations for the different architectures will be in the header files. Since the actual header file name will be the same for all architectures, the different versions will need to be in different directories. Those directories are:

large (≥32 bit) integer reference: intlr
32 bit, 2's compliment: int32c
Others to be implemented as needed.: Note that 32 bit, 2's compliment is by far the most common architecture in use today. The others are mostly of historical significance. It is very unlikely that the other variants will ever be implemented, much less tested.

Style and large scale structure

The right margin is after column 80. It should be treated as a hard margin; only if the entire line is truly indivisible should anything extend beyond it. (Lines in a <PRE>...</PRE> blocks with lots of special characters is the main example of this exception.)

Tab stops are effectively every 4th column. (The 8 column tab stop is left over from punch card and Teletype® days. The result looks ugly. While 8 column tabs does tend to limit complexity, it does so at the expense of readability. Readability is more useful than an artificial device to limit complexity; The complexity limit is a matter of discipline on the part of the coder. Readability impacts everyone who looks at the code. )

Large blocks of comments are to be literally enclosed in a 'ASCII art' box with 'rounded' corners. The enclosed space is 76 columns wide. The horizontal elements of the box should indicate the mid-line and tab stops. If a horizontal section separator is used, it should be divided the same way.

The initial box must contain a title line, a copyright notice, the license and liability limitation text and other text as required. Per SourceForge requirements, the license must be an Open Source license. In this case, the LGPL version 2.1 license is to be used.

End of line comments should be right justified.

Header files

The idempotentce sequence follows this model:

«Initial block comment»
`\--- ---.--- ---.--- ---.--- ---.--- ---:--- ---.--- ---.--- ---.--- ---.--- -*/`
`#if defined(_`«upper cased file name»)
`#define _`«upper cased file name» «yyyymm»
«single instance header contents»
`#else`	`/* if defined(_`«upper cased file name»`) */`
`#if _`«upper cased file name»`!=`«yyyymm»
`#error` «file name from #include»`version mismatch.`
`#endif`	`/* if _`«upper cased file name»`!= yyyymm */`
`#endif`	`/* else defined(`_«upper cased file name»`) */`

Where

«Initial block comment»: includes the file title, copyright notice, disclamers and other text as required.
«upper cased file name»: is the name of this header file converted to upper case and with dots (.) replaced with underscores (_).
«file name from #include»: is the name of this header file as it would appear in a preprocessor #include line.
«yyyymm»: is the four digit year and two digit month of the last release of this header. Remember to replace this token in both places it occurs.
«single instance header contents»: is the active content of the header file.

This is in the boiler plate file [TBS] but requires a post processing step to activate. The idempotentce sequence uses a more or less standard reserved name to prevent reentry. Unless there is another header with the same name, there should not be any conflict in the use of the name.

Header <limits.h> is included in the reference version to define CHAR_BIT. CHAR_BIT is used with sizeof in expressions where the number of bits in an instance of a particular type is needed.

Header <stdint.h> is included to define the types int_least16_t, int_least32_t and uint_least32_t.

Type

No surprises here – it's an uint_least32_t.

Constants

The constants have a prefix from the reserved name space. (That is _RSE_.) This may have to be changed if there is a conflict.

The values are defined as macros. (It may be possible to define them in enums.)

The one bit fields have been treated differently from the broader fields in the reference implementation.

The order of the definitions is alphabetical within groups.

Macros

The user defined macros start with an upper case prefix. (That is RSE_.) This may have to be changed if there is a conflict. See the Programer's Guide for a list of these and their description.

The value construction function like macros use a lexical transformation to convert their first and sometimes their second arguments from keywords to values. Because this transformation is needed, they can not be implemented as functions. Their usage as static initializers also requires that they can produce compile time constant expressions. That requirement also requires function like macro implementations. See the discussion in the Programer's Guide for more information.

Three or four letter abbreviations have been defined for the longer keywords. One letter abbreviations, both upper and lower case, have been defined for all keywords.

Functions and function like macros

There are fifteen function declarations with corresponding function like macro definitions. Plauger's book on The Standard C Library explains how to bypass the macro definitions to get to the function implementations without #undefing the macros. Simply enclose the name of the function in parenthesis. (This breaks the preprocessor's search for the token sequence '«macro name» (...' while the main parser can converts the parenthesized expression to a function address and generates the expected function call.) That technique is used to declare the function prototypes. See the Programer's Guide for a list of these and their description.

Multi-part parametrized boiler plate

Boiler plate is a name for a piece of text that is so frequently used that its form has been standardized and is to be included verbatim into a document, particularly into a legal document. The idea can be and has been adapted to programming. Header files are a kind of boiler plate. This project's documentation uses parametrized and multi-part boiler plate.

An example of parametrized boiler plate is the 'copyright.htm' fragment. The text it contains is entirely predefined with a few exceptions like the name and address of the copyright holder and the copyright date.

This text is inserted into the documents under the control of a master file by the 'C' preprocessor. By defining a few 'tokens', the preprocessor can be instructed to insert the correct variable information.

Merging different kinds of sources

When anybody writes a function for a program, several other pieces of code and documentation need to be written and any change to any of these pieces may require changes in one or more of the other pieces. If these pieces are scattered in many files, it can be difficult to assure that all the necessary changes have been made. With the problem stated in this fashion it becomes apparent that putting all the different pieces in a single file might be useful.

The following are some of the different pieces that it might be useful to keep together:

the implementation of a function or procedure,
the prototype of the function or procedure,
a description of the interface for the internals manual or user's guide,
a code fragment that demonstrates how to use the function or procedure,
one or more code fragments that test the implementation, and
test data.

Putting various different kinds of material in the same file is not exactly a new idea. A database is an example of this kind of merge. Merging code and documentation is not really a new idea either; Knuth tried this with the 'literate programing' movement he started. Unfortunately that movement has not grown or been accepted by a large number of people. An important question is why not.

At least part of the answer is that literate programming requires a large commitment to using it 'up front'. Special 'tools' are required. It is not exactly easy to convert an existing body of code to literate programming form. Also, the emphasis of the technique is on the documentation rather than on the code. While this emphasis is not necessarily wrong, it does not match the exigencies of getting a project 'out the door'.

An alternate approach that allows pieces to be added to an existing project might be more acceptable. It would also help if simple or already existing tools could be used. One necessary and obvious such tool is the 'C' language preprocessor.

What would need to be done to merge the various pieces into a single file?

At least the following:

A discipline for designating the various pieces.
Preprocessor lines to select the particular piece needed for a specific purpose.
Simple post editing techniques to convert text that the preprocessor can not be made to produce.

The selection mechanism, for fairly obvious reasons, has to be based on the definition of preprocessor macros. For this project the macro WHICH_PIECE will be used.

There are generally three things that the preprocessor can not be made to produce. The first is preprocessing statements for the produced file; it will process them rather than insert them into the target file. Similarly it will remove comments. And finally it will sometimes reduce white space to a single character.

To avoid these issues, an extra character needs to be added at strategic points. It should be a fairly unusual character, preferably one that will not appear in normal code. There are two such characters in the ASCII character set: '@' and '$'. I have chosen '@' for this project. ('$' has a special meaning in shell scripts and is slightly harder to work with as a result.) The post processing transformations needed to solve the problems are:

'@@' => ' ' to take care of the white space problem.
'@' => ' ' to take care of the comment and preprocessor problems, and
'@' => '@' to allow '@' characters to be inserted if needed.

The following code fragment can be added to a Makefile so the transformations are applied automatically to .html (and some others) files:

CPP             = $(CC) -E
CPPH            = $(CPP) -P

%.html : %.c
        $(CPPH) -o $@ $< && sed -i 's/@@/  /g;s/@//g;s/&#064;/@/g' $@

.SUFFIXES : .html

This will have to be modified in fairly obvious ways for other kinds of files.

A typical (pseudo code) example would be:

/*  «Comments identifying the file, copyright, license and similar notices»   */
     «The following piece names should each have a different integer value»
#define HEADER_PART 1
«...»
#if !defined(WHICH_PART)
     «The default piece, usually the procedure of function implementation»
#else                                              /* if !defined(WHICH_PART) */
#if WHICH_PART == HEADER_PIECE
      «Function prototypes and other things that belong in a header file»
#elif WHICH_PART == «The name of another piece»     /* if WHICH_PART == ... */
                          «The source for another piece»
«Repeat the previous two lines/parts as needed»
#endif                                              /* elif WHICH_PART == ... */
#endif                                           /* else !defined(WHICH_PART) */
#undef  HEADER_PART
«...»
«Other cleanup as needed»

The following shows how to modify the input to achieve the appropriate output:

@#define EOF@@ @@-1 @@ @@ @@ @@ @@ @@ @@ @@ @@ @@ /@* set the EOF macro's value */

would insert the following in the resulting header file:

#define EOF     -1                               /* set the EOF macro's value */

The source for the header file would look something like this:

/@*-- ---.--- ---.--- ---.--- ---.--- ---:--- ---.--- ---.--- ---.--- ---.--- --\ 
| «Header file name» -- «brief description»                                    |
|                                                                              |
| Copyright (c) «year» by «author», «city and state»                           |
#include "copyright.ppc"
\--- ---.--- ---.--- ---.--- ---.--- ---:--- ---.--- ---.--- ---.--- ---.--- -*/
@#if !defined(«Idempotence token for this header»)
@#define «Idempotence token for this header» «yyyymm»
#define WHICH_PART  HEADER_PIECE
#include "«a filename».c"
...
#undef WHICH_PART
       «Repeat the previous four lines/parts with variations as needed»
@#else @@ @@ @@ @@ @@ @@ @@ /@* if !defined(«Idempotence token for this header») */
@#if «Idempotence token for this header» != «yyyymm»
@#error Header file «Header file name» version mismatch;
@#endif
@#endif @@ @@ @@ @@ @@ @@ /@* else !defined(«Idempotence token for this header») */

where the "copyright.ppc" file contains the boiler plate header file comments.

Documentation

There are a number of different kinds of documentation. The most basic is the in-line comment. These can serve a number of purposes, such as providing text for code outline display programs and explanations of what a particularly unusual language construct or sequence is intended to do. They should not simply repeat information that is obvious from the code.

Block comments present a bit more of a formating challenge. This is particularly true if the blocks are actually code for some code generator like an SQL preprocessor. Block comments can also serve as a code outline, particularly while a piece of code is being written. They can document maintenance problems and fixed coding errors so that those problems and errors are not reintroduced later. If you are using an embedded document extractor, block comments are often used to delineate the markup code it uses in much the same fashion as the preprocessors mentioned here.

More formal documents, like implementation notes, maintenance notes and user's guides do not embed well in the code but the pieces of those documents that are implemented in a particular place should be kept with the implementation so that they can be updated when the implementation changes. The embedding technique described in the previous section is useful in this situation.

To provide a consistant structure to the documents, much of the HTML code that does not change from document to document has been put into separate files. Some of those files contain multiple constant sections. They should be included once for each piece.

Specifically, the "header.htm" file contains four pieces: the first part is the <HEAD> boiler plate up to the point where the copyright notice needs to be inserted, the second part is the rest of the <HEAD> boiler plate up to the point where the <STYLE> information needs to be inserted, the third part is the boiler plate from the end of the <STYLE> information up to the start of the point where <BODY> starts, and the fourth part is the boiler plate for the end of the document.

The "layout.htm" file contains two pieces: the first part sets up the 'marginalia' column on the left side of the document and the second part provides the HTML needed to finish the layout.

The source for a document is a .c file that has the following general form:

«a block comment identifying the file, copyright and disclaimers»
«parameter #defines for parameters used by the boiler plate»
#include "header.htm"
«copyright notices for the resulting document»
#include "header.htm"
«style information, often in #include files»
#include "header.htm"
#include "layout.htm"
«the HTML for the test of the document»
#include "layout.htm"
#include "header.htm"

Testing

If it is possible to write a code fragment or data sequence that tests a particular function or procedure, it is helpful to have that code fragment or test data available along with the code that implements the function or procedure. It is also helpful to be able to add to those fragments when a particular problem or error is fixed so that an inadvertent reintroduction of the problem or error can be detected quickly. Again the embedding technique described above is useful for this.

Translating keywords into numeric values.

One of the worst stumbling blocks to understanding a piece of code is the presence of 'magic numbers' (sometimes called manifest constants) in it. The presence of some 'magic numbers' is inevitable. Command codes that must be stuffed in a particular field of a device port are the ultimate example of this. The value of the code and the location of the field and the port number are all under the control of the engineers that designed the device; without this information you are very unlikely to be able to write a piece of code to control the device.

Aside from hardware interfaces, 'magic numbers' also appear in software interfaces and formulas necessary to comply with external specifications. It helps a great deal if these numbers are giving sensable names and the names, not the number itself appears in the code. A trivial example would be to define 'pi' as 3.14159265358979. A significant problem with this approach is that defining such names precludes the use of that name for other purposes; this is a particularly inconsiderate move when such words as 'bad', 'good' etc. are needed. This problem goes by the moniker 'name space polution'.

To reduce this problem to a manageable level, a preprocessor trick has been used. the ## operator has been used to move the indicator and condition name keyword values from the common usage name space to the reserved name space. In particular the result descriptive words 'good' and 'bad' are translated to '_RSE_Rgood' and '_RSE_Rbad', the process descriptive words 'stop' and 'continue' are translated to '_RSE_Pstop' and '_RSE_Pcontinue', and the combined status descriptive words are translated to a series beginning with '_RSE_S'. See the Programmer's guide section on keywords for specific information.

Work List:

Why are macros more commonly used to define constants than enums?
Function prototypes and implementations need to be documented.
A series of tests is needed for all of this.

page top

Previous	Up	Next
	~~Down~~