DBMS(5sem Comp), MIS ILOC, OOPM: HPC Chapter No 4 Principles of Parallel Algorithm Design

4.2 Decomposition Techniques

As mentioned earlier, one of the fundamental steps that we need to undertake to solve a
problem in parallel is to split the computations to be performed into a set of tasks for
concurrent execution defined by the task-dependency graph. In this section, we describe some
commonly used decomposition techniques for achieving concurrency. This is not an exhaustive
set of possible decomposition techniques. Also, a given decomposition is not always guaranteed
to lead to the best parallel algorithm for a given problem. Despite these shortcomings, the
decomposition techniques described in this section often provide a good starting point for many
problems and one or a combination of these techniques can be used to obtain effective
decompositions for a large variety of problems.
These techniques are broadly classified as recursive decomposition, data-decomposition,
exploratory decomposition, and speculative decomposition. The recursive- and datadecomposition
techniques are relatively general purpose as they can be used to decompose a
wide variety of problems. On the other hand, speculative- and exploratory-decomposition
techniques are more of a special purpose nature because they apply to specific classes of
problems.

4.2.1 Recursive Decomposition

Recursive decomposition is a method for inducing concurrency in problems that can be solved
using the divide-and-conquer strategy. In this technique, a problem is solved by first dividing it
into a set of independent subproblems. Each one of these subproblems is solved by recursively
applying a similar division into smaller subproblems followed by a combination of their results.
The divide-and-conquer strategy results in natural concurrency, as different subproblems can be
solved concurrently.

Example 3.4 Quicksort
Consider the problem of sorting a sequence A of n elements using the commonly used
quicksort algorithm. Quicksort is a divide and conquer algorithm that starts by
selecting a pivot element x and then partitions the sequence A into two subsequences
A0 and A1 such that all the elements in A0 are smaller than x and all the elements in
A1 are greater than or equal to x. This partitioning step forms the divide step of the
algorithm. Each one of the subsequences A0 and A1 is sorted by recursively calling
quicksort. Each one of these recursive calls further partitions the sequences. This is
illustrated in Figure 3.8 for a sequence of 12 numbers. The recursion terminates when
each subsequence contains only a single element.

Figure 3.8. The quicksort task-dependency graph based on
recursive decomposition for sorting a sequence of 12 numbers.

In Figure 3.8, we define a task as the work of partitioning a given subsequence. Therefore,

Figure 3.8 also represents the task graph for the problem. Initially, there is only one sequence

(i.e., the root of the tree), and we can use only a single process to partition it. The completion

of the root task results in two subsequences (A0 and A1, corresponding to the two nodes at the

first level of the tree) and each one can be partitioned in parallel. Similarly, the concurrency

continues to increase as we move down the tree.

3.2.2 Data Decomposition

Data decomposition is a powerful and commonly used method for deriving concurrency in

algorithms that operate on large data structures. In this method, the decomposition of

computations is done in two steps. In the first step, the data on which the computations are

performed is partitioned, and in the second step, this data partitioning is used to induce a

partitioning of the computations into tasks. The operations that these tasks perform on different

data partitions are usually similar (e.g., matrix multiplication introduced in Example 3.5) or are

chosen from a small set of operations (e.g., LU factorization introduced in Example 3.10).

The partitioning of data can be performed in many possible ways as discussed next. In general,

one must explore and evaluate all possible ways of partitioning the data and determine which

one yields a natural and efficient computational decomposition.

Partitioning Output Data In many computations, each element of the output can be computed

independently of others as a function of the input. In such computations, a partitioning of the

output data automatically induces a decomposition of the problems into tasks, where each task

is assigned the work of computing a portion of the output. We introduce the problem of matrix multiplication

in Example 3.5 to illustrate a decomposition based on partitioning output data.

Example 3.5 Matrix multiplication

Consider the problem of multiplying two n x n matrices A and B to yield a matrix C.

Figure 3.10 shows a decomposition of this problem into four tasks. Each matrix is

considered to be composed of four blocks or submatrices defined by splitting each

dimension of the matrix into half. The four submatrices of C, roughly of size n/2 x n/2

each, are then independently computed by four tasks as the sums of the appropriate

products of submatrices of A and B.

Most matrix algorithms, including matrix-vector and matrix-matrix multiplication, can be

formulated in terms of block matrix operations. In such a formulation, the matrix is viewed as

composed of blocks or submatrices and the scalar arithmetic operations on its elements are

replaced by the equivalent matrix operations on the blocks. The results of the element and the

block versions of the algorithm are mathematically equivalent (Problem 3.10). Block versions of

matrix algorithms are often used to aid decomposition.

The decomposition shown in Figure 3.10 is based on partitioning the output matrix C into four

submatrices and each of the four tasks computes one of these submatrices. The reader must

note that data-decomposition is distinct from the decomposition of the computation into tasks.

Although the two are often related and the former often aids the latter, a given datadecomposition

does not result in a unique decomposition into tasks. For example, Figure 3.11

shows two other decompositions of matrix multiplication, each into eight tasks, corresponding

to the same data-decomposition as used in Figure 3.10(a).

Figure 3.11. Two examples of decomposition of matrix multiplication

into eight tasks.

With Thanks to Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar

Reference

Content is taken from

Introduction to Parallel Computing, Second Edition

By Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar

Publisher: Addison Wesley

Pub Date: January 16, 2003

ISBN: 0-201-64865-2

Buy this book

https://books.google.co.in/books/about/Introduction_to_Parallel_Computing.html?id=B3jR2EhdZaMC

DBMS(5sem Comp), MIS ILOC, OOPM

Thursday, March 26, 2020

HPC Chapter No 4 Principles of Parallel Algorithm Design

4.2 Decomposition Techniques

4.2.1 Recursive Decomposition

No comments:

Post a Comment

Lecture PPTs of MIS , Module 4 to Module 6

Followers