4.2 Decomposition Techniques
As mentioned earlier, one of the fundamental steps that we need to undertake to solve aproblem in parallel is to split the computations to be performed into a set of tasks for
concurrent execution defined by the task-dependency graph. In this section, we describe some
commonly used decomposition techniques for achieving concurrency. This is not an exhaustive
set of possible decomposition techniques. Also, a given decomposition is not always guaranteed
to lead to the best parallel algorithm for a given problem. Despite these shortcomings, the
decomposition techniques described in this section often provide a good starting point for many
problems and one or a combination of these techniques can be used to obtain effective
decompositions for a large variety of problems.
These techniques are broadly classified as recursive decomposition, data-decomposition,
exploratory decomposition, and speculative decomposition. The recursive- and datadecomposition
techniques are relatively general purpose as they can be used to decompose a
wide variety of problems. On the other hand, speculative- and exploratory-decomposition
techniques are more of a special purpose nature because they apply to specific classes of
problems.
4.2.1 Recursive Decomposition
Recursive decomposition is a method for inducing concurrency in problems that can be solvedusing the divide-and-conquer strategy. In this technique, a problem is solved by first dividing it
into a set of independent subproblems. Each one of these subproblems is solved by recursively
applying a similar division into smaller subproblems followed by a combination of their results.
The divide-and-conquer strategy results in natural concurrency, as different subproblems can be
solved concurrently.
Example 3.4 Quicksort
Consider the problem of sorting a sequence A of n elements using the commonly used
quicksort algorithm. Quicksort is a divide and conquer algorithm that starts by
selecting a pivot element x and then partitions the sequence A into two subsequences
A0 and A1 such that all the elements in A0 are smaller than x and all the elements in
A1 are greater than or equal to x. This partitioning step forms the divide step of the
algorithm. Each one of the subsequences A0 and A1 is sorted by recursively calling
quicksort. Each one of these recursive calls further partitions the sequences. This is
illustrated in Figure 3.8 for a sequence of 12 numbers. The recursion terminates when
each subsequence contains only a single element.
Figure 3.8. The quicksort task-dependency graph based on
recursive decomposition for sorting a sequence of 12 numbers.
In Figure 3.8, we define a task as the work of partitioning a given subsequence. Therefore,
Figure 3.8 also represents the task graph for the problem. Initially, there is only one sequence
(i.e., the root of the tree), and we can use only a single process to partition it. The completion
of the root task results in two subsequences (A0 and A1, corresponding to the two nodes at the
first level of the tree) and each one can be partitioned in parallel. Similarly, the concurrency
continues to increase as we move down the tree.
3.2.2 Data Decomposition
Data decomposition is a powerful and commonly used method for deriving concurrency in
algorithms that operate on large data structures. In this method, the decomposition of
computations is done in two steps. In the first step, the data on which the computations are
performed is partitioned, and in the second step, this data partitioning is used to induce a
partitioning of the computations into tasks. The operations that these tasks perform on different
data partitions are usually similar (e.g., matrix multiplication introduced in Example 3.5) or are
chosen from a small set of operations (e.g., LU factorization introduced in Example 3.10).
The partitioning of data can be performed in many possible ways as discussed next. In general,
one must explore and evaluate all possible ways of partitioning the data and determine which
one yields a natural and efficient computational decomposition.
Partitioning Output Data In many computations, each element of the output can be computed
independently of others as a function of the input. In such computations, a partitioning of the
output data automatically induces a decomposition of the problems into tasks, where each task
is assigned the work of computing a portion of the output. We introduce the problem of matrix multiplication
in Example 3.5 to illustrate a decomposition based on partitioning output data.
Example 3.5 Matrix multiplication
Consider the problem of multiplying two n x n matrices A and B to yield a matrix C.
Figure 3.10 shows a decomposition of this problem into four tasks. Each matrix is
considered to be composed of four blocks or submatrices defined by splitting each
dimension of the matrix into half. The four submatrices of C, roughly of size n/2 x n/2
each, are then independently computed by four tasks as the sums of the appropriate
products of submatrices of A and B.
Most matrix algorithms, including matrix-vector and matrix-matrix multiplication, can be
formulated in terms of block matrix operations. In such a formulation, the matrix is viewed as
composed of blocks or submatrices and the scalar arithmetic operations on its elements are
replaced by the equivalent matrix operations on the blocks. The results of the element and the
block versions of the algorithm are mathematically equivalent (Problem 3.10). Block versions of
matrix algorithms are often used to aid decomposition.
The decomposition shown in Figure 3.10 is based on partitioning the output matrix C into four
submatrices and each of the four tasks computes one of these submatrices. The reader must
note that data-decomposition is distinct from the decomposition of the computation into tasks.
Although the two are often related and the former often aids the latter, a given datadecomposition
does not result in a unique decomposition into tasks. For example, Figure 3.11
shows two other decompositions of matrix multiplication, each into eight tasks, corresponding
to the same data-decomposition as used in Figure 3.10(a).
Figure 3.11. Two examples of decomposition of matrix multiplication
into eight tasks.
With Thanks to Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar
Reference
Content is taken from
Introduction to Parallel Computing, Second Edition
By Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar
Publisher: Addison Wesley
Pub Date: January 16, 2003
ISBN: 0-201-64865-2
Buy this book
No comments:
Post a Comment