Monday, September 8, 2014

Improve GEOtop informatics !

I take the occasion of the actual structure of GEOtop main to argue about best practices in programming and introduce the topic of design patterns. For browsing the actual code you can see here , while my more specific comments are available at the GEOtop Developers forum

I argued that a much simpler  form for the main, should be represented by a scheme like the following (dots represent missing code, the code is assumed to be C++ - please be merciful with me: my knowledge of C++ details is very limited -  but, the arguments are for every generic language with support of object orientation):

where three main phases are distinguished: an initialization phase (with allocation and initialisation of variables), an execution phase (where the real stuff is done), and a closure phase (where all the memory if freed, and whatever is needed is performed). This best practice is a suggestion that comes from many recent numerical modeling efforts, like ESMF, OpenMI, OMS and allows an encapsulation of the processes in their own part of the project, but everybody, I think, can appreciate its simplicity and cleanness. 
A structure like this should be not very difficult to obtain in a few days of work of a proficient programmer. This structure does not alter the internal of the code and rationalise what is present now, introduce some encapsulation of the procedure,  but IT IS NOT object oriented (OO) programming.
Object oriented programming has to do with the choices of appropriate classes, and instantiation of the appropriate objects, and doing OO design, for implementing the code.

Who is suited to procedural programming, for instance interprets algorithms as “procedure calls”. My view, and understanding of the matter, is instead that procedures should be instantiation of classes, a.k.a. “objects”. Therefore, a proper evolution of the above could be:

In this second case, the organisation of the code remains the same. However the execution of the single procedure is written differently, i.e., for instance:

EnergyBalance.execute( ); 

This is intended to mean that EnergyBalance is an object and .execute() (please notice without arguments) is a method called on this object. One of the advantage of making the each procedure an object, has several argument in its favour. The first is that each object, now, contains its own inputs as fields, and encapsulation is certainly improved. This makes all the code easier to maintain, and the input data for each of the procedures easy to control and parse (out of the time loop, obviously). For a simpler, but a conceptually similar, example in Java, please see here.
Doing this step is not matter of a day, it certainly require more thinking and a sequence of modifications also in the structure of the data containers actually present in GEOtop which propagates down into the inner routines.

In case, also the print( ) method is part of a class, with the rational that with proper software engineering, it can be sent in parallel with the core calculation, on different processors, and therefore, interfering the less possibile with the ‘core’ computational time. I personally bet that, at moment, a consistent part of computational time of GEOtop is wasted in printing to screen data. So, maybe we spent months to have a fast algorithm for finding the solution, and that we use that time saved to put thousands of number on the screen or to write them to disk subtracting power to the computation. 

Actually, the one above is just the beginning of the story. Opening the way to classes, open the way to use some well known  Design Patterns to make the code more maintainable, flexible and evolvable. One case, obvious to all GEOtop aficionados is that, in reality, either the water budget and the other procedures, are a bundle of different alternatives. With OO there exists at least one neat scheme to treat this case, which is called, the “Strategy pattern” where, the object really chosen is one out of a group of options which are implementation of a common interface  (I am going to use the technical OO slang here). So actually, the "main( )" can be programmed to accept a generic type of solution (algorithm) to that problem (as specified by the abstract interface), and eventually this solution can be chosen at run time among the subclasses that implement it.  Solutions can then be added to the code without the need to modify (almost) anything else of the code than just adding a new subclass.

References (you will not find them easy to read ... but take it quietly, cross-check with the web, and something will pass slowly). Besides the basic, also adressed here fro Java, try to give a look to:

Gamma, E; Helm, R.; Johnson, R; Vlissides, J; Design Patterns: Elements of Reusable Object Oriented Software, Addison-Wesley, 1995

Martin Fowler's, UML Distilled: A Brief Guide to the Standard Object Modeling Language, Addison Wesley, 1999

Freeman, Eric T; Elisabeth Robson, Bert Bates, Kathy Sierra (2004). Head First Design Patterns. O'Reilly Media. ISBN 0-596-00712-4

Bruce Eckel, Thinking in patterns', notes. 

Various Authors, Design Pattern,  from Indian Student Association of University of Nebraska at Omaha
 

No comments:

Post a Comment