The next generation of supercomputers will probably need large amount of parallelism, both for generating the needed computing power and for masking memory latency. Furthermore, it is necessary to expand the application of parallelism to less regular programs than is usually found in the usual numerical applications. The main obstacle to be overcome is the presence of control dependences, i.e.\ of situations in which the result of an operation is used to decide if another operation is going to be executed or not. This forbids parallelization, unless speculative execution is used: an operation is initiated before being sure that it must be executed. A first type of problem is to insure that speculation is efficient, i.e.\ that the time lost executing useless operations is more than compensated by the increase in parallelism.
Our aim here is to explore the logical constraints which must be satisfied by parallel speculative programs. This is best done in the framework of scheduling. For instance, one obtains speculative execution simply by ignoring some control dependences when constructing the schedule. If this is done, one has to insert compensating dependences to restore the correctness of the object program. One also has to keep enough information for being able to undo the effects of speculative operations. Lastly, one has to insure that everything stays finite in the object program, including the size of of temporary memory and the amount of computation per logical time step. Preliminary solutions are given for some of these problems.
The simulation of complex scientific or technical processes often relies on mathematical models using partial differential equations. After discretization, they result in large, sparse systems of equations with up to several millions of unknowns. Fast iterative algorithms for the solution of these systems are typically based on the multilevel principle. An effective discretization often requires adaptively generated meshes such that nontrivial data structures must be used. Unfortunately, some of the commonly used programming techniques lead to a high overhead on many advanced computer architectures. The two main sources of this performance degradation are the effects of non-uniform data access combined with indirect addressing that prohibits the use of instruction-level parallelism. This, however, is essential to exploit many modern CPU designs. The second, and possibly more fundamental problem arises from hierarchical memory architectures with several layers of caches. Their effective use requires programs with data access locality. Unfortunately, iterative solvers are typically implemented by using global sweeps over the whole data set, and thus their performance is essentially limited by by the speed of memory system. These problems are addressed in the patch-adaptive multigrid method (PAM), a recently developed experimental software system implementing an adaptive multigrid method based on a locally uniform mesh data structure. The advantages of this approach are shown by performance comparisons with conventional unstructured mesh multigrid systems.
Irregular computations pose some of the most interesting and challenging problems in the area of automatic parallelization. Such irregularity appears in certain kinds of numerical problems and is pervasive in symbolic applications. In addition, irregular computations often require the use of dynamic run-time data structures which make heavy use of pointers. This complicates all the steps of a parallelizing compiler, from independence detection to task partitioning and placement. In the past decade there has been significant progress in the development of parallelizing compilers for logic programming and, more recently, constraint programming. The typical applications of these paradigms involve very frequently highly irregular tasks and make heavy use of dynamic data structures with pointers (note that logical variables represent in practice a well behaved form of pointers), all of which arguably makes the techniques used in these compilers potentially interesting. In this talk we will review in a tutorial way some of the progress made in the areas of compile-time independence detection, inter-procedural pointer aliasing analysis, management of nondeterministic and speculative computations, task granularity control, and dynamic task allocation in the context of parallelizing compilers for logic and constraint programs. We will also demonstrate representatives of several generations of these compilers.