3.3 Characteristics of Tasks and Interactions
The various decomposition techniques described in the previous section allow us to identify the
concurrency that is available in a problem and decompose it into tasks that can be executed in
parallel. The next step in the process of designing a parallel algorithm is to take these tasks and
assign (i.e., map) them onto the available processes. While devising a mapping scheme to
construct a good parallel algorithm, we often take a cue from the decomposition. The nature of
the tasks and the interactions among them has a bearing on the mapping. In this section, we
shall discuss the various properties of tasks and inter-task interactions that affect the choice of
a good mapping.
3.3.1 Characteristics of Tasks
The following four characteristics of the tasks have a large influence on the suitability of a
mapping scheme.
Task Generation The tasks that constitute a parallel algorithm may be generated either
statically or dynamically. Static task generation refers to the scenario where all the tasks are
known before the algorithm starts execution. Data decomposition usually leads to static task
generation. Examples of data-decomposition leading to a static task generation include matrixmultiplication
and LU factorization (Problem 3.5). Recursive decomposition can also lead to a
static task-dependency graph. Finding the minimum of a list of numbers (Figure 3.9) is an
example of a static recursive task-dependency graph.
Certain decompositions lead to a dynamic task generation during the execution of the
algorithm. In such decompositions, the actual tasks and the task-dependency graph are not
explicitly available a priori, although the high level rules or guidelines governing task generation
are known as a part of the algorithm. Recursive decomposition can lead to dynamic task
generation. For example, consider the recursive decomposition in quicksort (Figure 3.8). The
tasks are generated dynamically, and the size and shape of the task tree is determined by the
values in the input array to be sorted. An array of the same size can lead to task-dependency
graphs of different shapes and with a different total number of tasks.
Exploratory decomposition can be formulated to generate tasks either statically or dynamically.
For example, consider the 15-puzzle problem discussed in Section 3.2.3. One way to generate a
static task-dependency graph using exploratory decomposition is as follows. First, a
preprocessing task starts with the initial configuration and expands the search tree in a
breadth-first manner until a predefined number of configurations are generated. These
configuration now represent independent tasks, which can be mapped onto different processes
and run independently. A different decomposition that generates tasks dynamically would be
one in which a task takes a state as input, expands it through a predefined number of steps of
breadth-first search and spawns new tasks to perform the same computation on each of the
resulting states (unless it has found the solution, in which case the algorithm terminates).
Task Sizes The size of a task is the relative amount of time required to complete it. The
complexity of mapping schemes often depends on whether or not the tasks are uniform; i.e.,
whether or not they require roughly the same amount of time. If the amount of time required
by the tasks varies significantly, then they are said to be non-uniform. For example, the tasks
in the decompositions for matrix multiplication shown in Figures 3.10 and 3.11 would be
considered uniform. On the other hand, the tasks in quicksort in Figure 3.8 are non-uniform.
Knowledge of Task Sizes The third characteristic that influences the choice of mapping
scheme is knowledge of the task size. If the size of all the tasks is known, then this information
can often be used in mapping of tasks to processes. For example, in the various decompositions
for matrix multiplication discussed so far, the computation time for each task is known before
the parallel program starts. On the other hand, the size of a typical task in the 15-puzzle
problem is unknown. We do not know a priori how many moves will lead to the solution from a
given state.
Size of Data Associated with Tasks Another important characteristic of a task is the size of
data associated with it. The reason this is an important consideration for mapping is that the
data associated with a task must be available to the process performing that task, and the size
and the location of these data may determine the process that can perform the task without
incurring excessive data-movement overheads.
Different types of data associated with a task may have different sizes. For instance, the input
data may be small but the output may be large, or vice versa. For example, the input to a task
in the 15-puzzle problem may be just one state of the puzzle. This is a small input relative to
the amount of computation that may be required to find a sequence of moves from this state to
a solution state. In the problem of computing the minimum of a sequence, the size of the input
is proportional to the amount of computation, but the output is just one number. In the parallel
formulation of the quick sort, the size of both the input and the output data is of the same order
as the sequential time needed to solve the task.
3.3.2 Characteristics of Inter-Task Interactions
In any nontrivial parallel algorithm, tasks need to interact with each other to share data, work,
or synchronization information. Different parallel algorithms require different types of
interactions among concurrent tasks. The nature of these interactions makes them more
suitable for certain programming paradigms and mapping schemes, and less suitable for others.
The types of inter-task interactions can be described along different dimensions, each
corresponding to a distinct characteristic of the underlying computations.
Static versus Dynamic One way of classifying the type of interactions that take place among
concurrent tasks is to consider whether or not these interactions have a static or dynamic
pattern. An interaction pattern is static if for each task, the interactions happen at
predetermined times, and if the set of tasks to interact with at these times is known prior to the
execution of the algorithm. In other words, in a static interaction pattern, not only is the taskinteraction
graph known a priori, but the stage of the computation at which each interaction
occurs is also known. An interaction pattern is dynamic if the timing of interactions or the set of
tasks to interact with cannot be determined prior to the execution of the algorithm.
Static interactions can be programmed easily in the message-passing paradigm, but dynamic
interactions are harder to program. The reason is that interactions in message-passing require
active involvement of both interacting tasks – the sender and the receiver of information. The
unpredictable nature of dynamic iterations makes it hard for both the sender and the receiver to
participate in the interaction at the same time. Therefore, when implementing a parallel
algorithm with dynamic interactions in the message-passing paradigm, the tasks must be
assigned additional synchronization or polling responsibility. Shared-address space
programming can code both types of interactions equally easily.
The decompositions for parallel matrix multiplication presented earlier in this chapter exhibit
static inter-task interactions. For an example of dynamic interactions, consider solving the 15-
puzzle problem in which tasks are assigned different states to explore after an initial step that
generates the desirable number of states by applying breadth-first search on the initial state. It
is possible that a certain state leads to all dead ends and a task exhausts its search space
without reaching the goal state, while other tasks are still busy trying to find a solution. The
task that has exhausted its work can pick up an unexplored state from the queue of another
busy task and start exploring it. The interactions involved in such a transfer of work from one
task to another are dynamic.
Regular versus Irregular Another way of classifying the interactions is based upon their
spatial structure. An interaction pattern is considered to be regular if it has some structure that
can be exploited for efficient implementation. On the other hand, an interaction pattern is called
irregular if no such regular pattern exists. Irregular and dynamic communications are harder
to handle, particularly in the message-passing programming paradigm. An example of a
decomposition with a regular interaction pattern is the problem of image dithering.
Example 3.9 Image dithering
In image dithering, the color of each pixel in the image is determined as the weighted
average of its original value and the values of its neighboring pixels. We can easily
decompose this computation, by breaking the image into square regions and using a
different task to dither each one of these regions. Note that each task needs to access
the pixel values of the region assigned to it as well as the values of the image
surrounding its region. Thus, if we regard the tasks as nodes of a graph with an edge
linking a pair of interacting tasks, the resulting pattern is a two-dimensional mesh, as
shown in Figure 3.22.
Sparse matrix-vector multiplication discussed in Section 3.1.2 provides a good example of
irregular interaction, which is shown in Figure 3.6. In this decomposition, even though each
task, by virtue of the decomposition, knows a priori which rows of matrix A it needs to access,
without scanning the row(s) of A assigned to it, a task cannot know which entries of vector b it
requires. The reason is that the access pattern for b depends on the structure of the sparse
matrix A.
Read-only versus Read-Write We have already learned that sharing of data among tasks
leads to inter-task interaction. However, the type of sharing may impact the choice of the
mapping. Data sharing interactions can be categorized as either read-only or read-write
interactions. As the name suggests, in read-only interactions, tasks require only a read-access
to the data shared among many concurrent tasks. For example, in the decomposition for
parallel matrix multiplication shown in Figure 3.10, the tasks only need to read the shared input
matrices A and B. In read-write interactions, multiple tasks need to read and write on some
shared data. For example, consider the problem of solving the 15-puzzle. The parallel
formulation method proposed in Section 3.2.3 uses an exhaustive search to find a solution. In
this formulation, each state is considered an equally suitable candidate for further expansion.
The search can be made more efficient if the states that appear to be closer to the solution are
given a priority for further exploration. An alternative search technique known as heuristic
search implements such a strategy. In heuristic search, we use a heuristic to provide a relative
approximate indication of the distance of each state from the solution (i.e. the potential number
of moves required to reach the solution). In the case of the 15-puzzle, the number of tiles that
are out of place in a given state could serve as such a heuristic. The states that need to be
expanded further are stored in a priority queue based on the value of this heuristic. While
choosing the states to expand, we give preference to more promising states, i.e. the ones that
have fewer out-of-place tiles and hence, are more likely to lead to a quick solution. In this
situation, the priority queue constitutes shared data and tasks need both read and write access
to it; they need to put the states resulting from an expansion into the queue and they need to
pick up the next most promising state for the next expansion.
One-way versus Two-way In some interactions, the data or work needed by a task or a
subset of tasks is explicitly supplied by another task or subset of tasks. Such interactions are
called two-way interactions and usually involve predefined producer and consumer tasks. In
other interactions, only one of a pair of communicating tasks initiates the interaction and
completes it without interrupting the other one. Such an interaction is called a one-way
interaction. All read-only interactions can be formulated as one-way interactions. Read-write
interactions can be either one-way or two-way.
The shared-address-space programming paradigms can handle both one-way and two-way
interactions equally easily. However, one-way interactions cannot be directly programmed in
the message-passing paradigm because the source of the data to be transferred must explicitly
send the data to the recipient. In the message-passing paradigm, all one-way interactions must
be converted to two-way interactions via program restructuring. Static one-way interactions can
be easily converted to two-way communications. Since the time and the location in the program
of a static one-way interaction is known a priori, introducing a matching interaction operation in
the partner task is enough to convert a one-way static interaction to a two-way static
interaction. On the other hand, dynamic one-way interactions can require some nontrivial
program restructuring to be converted to two-way interactions. The most common such
restructuring involves polling. Each task checks for pending requests from other tasks after
regular intervals, and services such requests, if any.
With Thanks to Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar
Reference
Content is taken from
Introduction to Parallel Computing, Second Edition
By Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar
Publisher: Addison Wesley
Pub Date: January 16, 2003
ISBN: 0-201-64865-2
Buy this book