wiki:StylePage

The Cloudy Style Page


Source file format

The source uses tabs for indentation. Please do not use spaces, this will break the indentation for the rest of us.

Each routine must start with a call to DEBUG_ENTRY with the name of the routine in quotes. An example is DEBUG_ENTRY( "oi3Pcs()" );. If the compile time macro DEBUG_FUN is defined then a print will indicate when a routine is called. This is intended as a debugging aid.

There should be a space on either side of the equal signs in arithmetic operations. An example is variable = 0.123. This makes it possible to go a global search on "variable =" and find all occurrences.

Local variables should be declared at the spot where they are first used. This makes it easier to see the variable type. An example would be

double cs = 0.93;
double rate = cs * 59676.;

The code now has the majority of the declarations at the start of the routine. This is inherited from the C days and is not the preferred style.


Atomic data references

Codes such as Cloudy only exist because of the foundation of basic atomic and molecular data. It is important to the survival of this field that the original sources of the basic data be cited, since this in turn affects their ability to generate support. The code precedes all atomic data with a citation to the original paper in the following form:

     /* >>refer Si2 AS Berrington, K., AtData Nuc Data Tab 33, 195.

This information is extracted from the source with the Perl script doc_atomic_data.pl which lives in the source directory. It creates a file doc_atomic_data_refer.txt giving all atomic data references.

The reference begins with /* as a C style comment. The flag >>refer indicates that the line is an atomic data reference. The fields are tab delimited. The text after the first tab indicate the species (c4, he2, H2+). After the next tab there is a flag indicating the process (CS for collision strength, AS for transition probabilities, UTA for unresolved transition array, and CT for charge transfer).

If the reference cannot fit on a single line it may continue on the following line, starting with the flag >>refercon which is followed by a tab and the remainder of the reference. The following is an example

	/* O V 630, CS from 
	 * >>refer	o5	cs	Berrington, K.A., Burke, P.G., Dufton, P.L., Kingston, A.E. 1985,
	 * >>refercon	At. Data Nucl. Data Tables, 33, 195 */

Old data references, those no longer used by the code but are still of historical value, are denoted by the flag >>referold. The perl script places these references in the file doc_atomic_data_refer_old.txt.

This style should be followed consistently so that the Perl script can generate a list of atomic data references used.


Routines to do common tasks

The MultiArrPage describes the new multi_arr structure for allocating and accessing multi-dimensional arrays in Cloudy.

fp_equal( arg1, arg2, n ) - returns true if the two arguments are within a relative precision of n epsilon of one another. When unspecified, n defaults to 3.

fp_equal_tol( arg1, arg2, tol ) - returns true if the two arguments differ by no more than tol. It is fundamentally different from fp_equal in that it can return true when arg1 and arg2 have different sign, while fp_equal cannot.

fp_bound( lo, x, hi, n ) - returns true if x is within the bounds given by lo and hi, to within a relative precision of n epsilon. When unspecified, n defaults to 3.

fp_bound_tol( lo, x, hi, tol ) - returns true if x is within the bounds given by lo and hi, to within a relative precision of tol.

get_ptr ( obj ) - returns a pointer to raw data contained by object obj. An attribute shim - it is presently implemented for const and non-const vector<T> and valarray<T> to avoid compiler incompatibilities.

sign( arg1, arg2 ) - the Fortran sign function, returns arg1 with the sign of arg2. If arg2 == 0, +arg1 is returned.

sign3( arg ) - the Pascal sign function, returns -1 if arg1 < 0, returns 0 when arg1 == 0, and +1 when arg > 0.

set_NaN( var ) / set_NaN( arr, n ) - sets the argument (a simple variable "var" or an array "arr[n]") to signaling NaN (not a number) which will cause the code to crash if the variable is used in a floating point operation before it is properly initialized. Note that this will fool valgrind / purify, but this is OK as long as FP traps are enabled.

isnan( var) checks whether a variable is NaN.

SDIV( arg ) - "safe division". BEWARE: this macro is neither safe, nor does it do any division. All it does is guard against division by zero, but it does not guard against overflow when dividing by a very small number, nor does it preserve the sign when dividing by a very small negative number. This macro should gradually be phased out due to its poorly defined behavior. For new code the routine safe_div described below should be used.

safe_div( x, y ) - carries out a safe division x/y in the sense that division by zero and overflow exceptions are avoided under all circumstances; the code will however still produce invalid FP exceptions where appropriate (this includes 0/0, see also the second version of safe_div below). If the results would otherwise have overflowed (this includes division of a non-zero number by zero), +/-DBL_MAX or +/-FLT_MAX is returned. Note that this routine carries quite a bit of overhead, so only use it where it is really needed.

safe_div( x, y, res_0by0 ) - this routine behaves the same as safe_div( x, y ), except that when 0/0 is encountered, res_0by0 is returned rather than raising an invalid FP exception. This version can never crash, except when x and/or y are NaN (which should never happen if the code is behaving properly!).

STATIC - this is used to declare that a routine is file static. This should never be used to declare the scope of a variable or to cause it to retain its value when going out of scope. Declaring the macro USE_GPROF during compilation will cause the STATIC qualifier to be redefined to an empty string and all routines will be globally visible. This influences gprof output as it disables inlining of certain routines. It is also needed in unit testing if you want to test the results of a static routine.

NORETURN - this indicates that a routine can never return because it terminates the program (an example is TotalInsanity()). This is useful information for the compiler, e.g. to prevent warnings about a variable not being initialized in a branch of an if-statement, or the routine not returning a result.

UNUSED - this indicates that a variable is never read (possibly after being initialized). It prevents warnings about such variables. It currently only works for GNU compatible compilers. Use like this:

long UNUSED bla;

dprintf - generates print output with DEBUG prepended. The script checkall.pl in the test suite directories checks for this string since it is easy to forget that print statements were enabled.

cdEXIT( arg ) - the exit handler. This must be called to exit to insure that output is properly closed and the information in the SAVE GRID file is complete. The argument should have the type exit_type, which is defined in cddefines.h. There is a whole range of possible values, most of which should only be used in cdMain() by the code catching every possible exception. Elsewhere in the code, use either EXIT_FAILURE or EXIT_SUCCESS. These have been redefined to ES_FAILURE and ES_SUCCESS in cddefines.h to make sure they have the correct type and value (note that the value of EXIT_FAILURE would have been implementation defined if we had not done this, now EXIT_FAILURE is guaranteed to be 1).

DEBUG_ENTRY( "routine()" ) - This routine produces a call trace (recording entry and exit of each routine) when the macro DEBUG_FUN is defined during compilation. This produces lots of output and is only used as a last resort when all other debugging methods fail to uncover where the code crashes. To keep the bulk of the output down, trivial routines typically do not have the DEBUG_ENTRY call. This is especially the case for code that is intended to be inlined. Any routine that calls cdEXIT also must have a call to DEBUG_ENTRY at the start. The debugtrace class generated by the DEBUG_ENTRY call is used as a source of the routine name so that it can be printed in the exit message. Once the C++0x standard is in effect, this can be replaced with the __func__ string, but for now this is the only reliable source of the routine name.

open_data() - All files need to be opened with open_data() as the use of fopen() is deprecated (the compiler will actually generate an error if you accidentally try to use fopen). When reading files, this routine will traverse the search path and open the file in the first location where it finds a match. It will produce a warning if multiple matches are found along the path and also produce an appropriate error message if the file is not found. The access_scheme flag will determine which parts of the search path will be searched. The default is AS_DEFAULT, which translates to AS_DATA_ONLY when reading files. This means that only the data directories will be searched, and not the current working directory. For historical reasons there currently is a bewildering variety of access schemes. This should be simplified in the near future. A description of the existing schemes can be found in cpu.h. There are three versions of open_data(), one for classic C-style I/O streams, one for C++ fstream based I/O, and one for MPI-IO based I/O. We have defined a bunch of flags in cpu.h to define the access mode to the fstream. These are chosen to be equivalent to their C-style counterparts, e.g. C-style access mode "r" becomes mode_r, and access mode "r+b" becomes mode_rpb. All existing C-style access modes are represented. C++ offers more access modes than C, and you can still use those by supplying the raw ios_base flags to open_data. These will be the most common ways of using open_data to read a normal core data file:

// C-style
FILE* ioDATA = open_data( "name", "r" );
// .... read data
fclose( ioDATA ); // don't forget, you may run out of file descriptors otherwise!
// write some file
FILE* ioDATA = open_data( "other", "w" );

// C++ style
fstream ioDATA;
open_data( ioDATA, "name", mode_r );
// .... read data
ioDATA.close(); // optional, the destructor will do this anyway...
// write some file
open_data( ioDATA, "other", mode_w );

When writing files, open_data() will always open the file in the current working directory, no matter what is in your search path. The search path is only used for reading files.

C++ based I/O has several advantages. I think the most important are that you get simple access to all the nifty string manipulation options that C++ has to offer, that you don't need to worry about buffer overflows any longer, and that the destructor of the fstream will close the file for you. The disadvantage is that it can be slower if you have very large files to read. If that becomes an issue, it can be worked around by using sscanf on the C-representation of the string, or other (lower level) routines. However, in my experience that is rarely needed and the advantaged of C++ I/O vastly outweigh the disadvantages.

MALLOC - the Cloudy form of malloc(). DEPRECATED. Memory allocated with MALLOC() needs to be freed with a call to free(). In debug mode (more precisely, when the macros NDEBUG and NOINIT are not set) MALLOC() will trash the allocated memory to guard against uninitialized use. Be aware that this is not fool-proof though, uninitialized use may still go undetected! Use valgrind / purify for proper detection of uninitialized use of vars (NB - see also the compiler macro NOINIT below). CAVEAT: never use MALLOC() to allocate memory for non-POD types as the constructor will not be executed! Avoid using them for any kind of struct or class (even if it is POD) since the class may become non-POD in the future! For arrays the preferred method of allocating is to use a container class like vector, valarray, multi_arr, or flex_arr. These are very useful as they guarantee that the allocated memory will be freed again when the container is destroyed. You can also use new / delete instead if you want a single instance, or new [] / delete[] if you want an array. But it is better to avoid these as well since freeing the memory needs to be done manually, so there is always a risk of memory leaks. As a rule of thumb, prefer container classes over new / delete / delete[]. Don't use MALLOC() in new code as it is deprecated.

TorF( bool ) - returns the character T or F indicating the value of the bool argument. This allows values of logical variables to be printed in way that makes sense to people.

Routines that set flags

fixit() / broken() - code that needs to be fixed or is broken. There is clearly some overlap between the two. Typical use for fixit() would be when you see code that needs to be improved, but you don't want to deal with that now. Prepend the code with a call to fixit() and a short comment explaining what needs to be done. On the other hand, broken() is typically used when you e.g. hack the code to disable some physics or deliberately feed wrong input during testing. In that case prepend the hack with a call to broken() to assure it gets deleted after testing is complete. Also add a comment explaining what needs to be done to get back to normal behavior of the code.

TotalInsanity() - used to close clauses that cannot possibly happen.

TotalInsanityAsStub<class T>() - always calls TotalInsanity(), but in such a way that the compiler will not mark statements following it as unreachable. It pretends that it can return a value of type T, but in practice never does. This way you can use it as a stub to #define calls away that are not supported on all platforms or compilation modes (e.g. UNIX- or MPI-specific routines). Class T should match the return type of the routine you are #define-ing away. This then reduces the need for #ifdef statements and allows you to use C++ conditionals instead.

ShowMe() - generates a request to show the output to the group, along with a printout of the input stream that caused it.

Compiler macros

These macros are set with compiler options. For most compilers the macro OPTION should be set with -DOPTION

The print macros command will print the name and current value of many of the available macros.

MPI

MPI_ENABLED - the generic compile-time macro that is set to enable MPI specific parts of the code. This will only work if you have MPI installed including the header files.

MPI_GRID_RUN - used to enable the MPI versions of the grid and phymir commands. It implicitly sets MPI_ENABLED.

Profiling

USE_GPROF This removes the STATIC command so that all routines are global. This means that all the cpu time is correctly ascribed to the chunk of code which used it. However, it also means that the performance benefits of inlining are lost, so it isn't clear whether this information is meaningful once you've done it. Note that this only effects inlined functions, rather than things with file scope.

Array bounds

BOUNDS_CHECK enables array bounds checking for those arrays that use the multi_arr or flex_arr method. This is described further on the MultiArrPage page. A failed bounds check will throw an exception that is caught in the main program. This is not useful for debugging. If you want to debug a failed bounds check, type the command catch throw in the debugger (this is for gdb, possibly other debuggers too) and then run Cloudy the usual way. The abort will then be caught by the debugger at the place where the overflow happens.

Memory checking

NOINIT - The fact that MALLOC trashes its memory will fool valgrind / purify into believing that the allocated memory was properly initialized. So in valgrind / purify runs, compile with NOINIT to disable this behavior in order to properly detect use of uninitialized vars.

The type of realnum

FLT_IS_DBL - Floating variables are either of type double or realnum. The latter is defined as float in cddefines.h. The macro FLT_IS_DBL will cause realnum to be defined as double.

The ASSERT macro

The C/C++ language assert macro makes it possible to confirm that nothing impossible has happened. As an example we might assert that the kinetic temperature is positive, or even that it is greater than the CMB temperature. In the C/C++ language this is done with a source line that reads

assert( phycon.te > 0. );

The code will terminate if the assertion fails. The assert will exist when lower levels of optimization are used. Most compilers will remove asserts for highly optimized code.

ASSERT - the Cloudy form of the assert macro. This should always be used. There are two different behaviors that are controlled by compiler or command-line options.

Throwing a C++ exception. The standard behavior is that ASSERT will throw a C++ exception that can be caught. Typically it will be caught by the main program, which is not useful for debugging since the stack will already have been unwound by then. If you want to debug a failed assert, the best course of action is to set your debugger to catch a C++ exception when it is thrown. All modern debuggers should have this option. Please check your documentation to find out how this is done. In gdb it is done by typing "catch throw" before you start the program.

Abort. You can insert the SET ASSERT ABORT command in the input script, or run Cloudy with the -a command line flag. The latter form is strictly needed when the assert fails before the input is even parsed. This will cause the code to abort rather than throw an exception on a failed assert. The abort will then be caught by the debugger. Catching the C++ exception really is the preferred course of action. In gdb you can do this by typing catch throw. Support for SET ASSERT ABORT and the -a flag may be removed in the future. Below follows a more detailed description of the behavior of the ASSERT macro.

1) Default behavior is that a failed ASSERT throws a bad_assert exception (which is custom-defined in cddefines.h). This will be caught in cdMain(), where MyAssert is called and the usual error message is printed.

2) Since an exception is thrown, you can catch it in a try-catch block. MyAssert() will not be called in that case.

3) By issuing the SET ASSERT ABORT command, the behavior is modified to raise SIGABRT which can be caught by a debugger. MyAssert() will not be called in that case either. All the necessary information should be available in your debugger.

4) You can issue the "-a" flag (for abort) on the command line, which is completely equivalent to the SET ASSERT ABORT command except that it is already in effect before the input is parsed. So this is equivalent:

   cloudy.exe -a
   crash assert

and

   cloudy.exe
   set assert abort
   crash assert

The C-style form of the assert macro is used if the macro OLD_ASSERT is set at compile time. Routine MyAssert is called when the assert is trapped, and you can set breakpoints within this routine.

Trapping asserts within debuggers is discussed on the AssertTrapping page.


Variable names and strong typing

Cloudy uses a simple formulation of the Hungarian naming convention (Simonyi, C. 1977, Meta-Programming: A Software Production Method, Thesis, Stanford University). In this convention the first few characters of a variable name indicate the type and function of that variable.

The naming convention used in the code today looks back to an under-appreciated advantage in the FORTRAN II and FORTRAN 66 languages - the fully implicit designation of variable types by the first letter of its name. The naming convention forced by early versions of FORTRAN (integers begin with i-n, real numbers begin with other characters) is still useful since the type can be determined at a glance.

Integers

Integers begin with the characters i, j, k, l, m, or n.

Counters begin with n. Examples include nLevel or nLoop.

Loop indices are generally i, j, or k. Sometimes they are counters.

Double or float variables

These begin with letters between a through h, and o through z. Examples include PumpRate, DestRate, or CollisIoniz.

The naming convention does not distinguish between floats and doubles.

In some cases floating numbers naturally will have names beginning with one of the letters reserved for integers. In this case a lower case x is used as the first character. Examples include xJumpDown, xMoleDen.

Character strings

Strings begin with "ch". Examples are chName or chReadInput.

Logical variables

These begin with "lg". Examples are lgOK, lgDone. These are of intrinsic type bool.

Note by PvH: Should we really be using Hungarian notation? This is what Bjarne Stroustrup has to say on the subject:

"No I don't recommend "Hungarian". I regard "Hungarian" (embedding an abbreviated version of a type in a variable name) a technique that can be useful in untyped languages, but is completely unsuitable for a language that supports generic programming and object-oriented programming - both of which emphasize selection of operations based on the type an argument [has] (known to the language or to the run-time support). In this case, "building the type of an object into names" simply complicates and minimizes abstraction. To various extent, I have similar problems with every scheme that embeds information about language-technical details (e.g., scope, storage class, syntactic category) into names. I agree that in some cases, building type hints into variable names can be helpful, but in general, and especially as software evolves, this becomes a maintenance hazard and a serious detriment to good code. Avoid it as the plague."

I agree with him. When you start using templates, the whole concept of rigid typing that underlies Hungarian notation breaks down and becomes useless and even a hindrance. So should we not use it in templates? I guess we have no other choice. But what would then be the point be of using it elsewhere? If Hungarian is not used universally in the code, its usefulness is severely restricted. You have to check the type of a variable anyway.


Return to DeveloperPages

Return to main wiki page

Return to nublado.org


Last modified 17 months ago Last modified on 2016-06-14T09:33:01Z