Page 302 - The Combined Finite-Discrete Element Method

P. 302

ALTERNATIVE HARDWARE ARCHITECTURES 285

5 4
Normalised CPU time 3 2

0 1
2 3 4 5 6 7 8 9
Number of processors

Figure 9.4 Time per time step using a distributed parallel architecture model and domain decom-
position. Total number of particles-0.6 billion.

A typical performance of a distributed parallel architecture model is shown in Figure 9.4.
The speed increase is not linear with the number of processors. However, it is evident that a
much larger problem can be addressed by this model than by the shared memory approach.

9.5.2 Distributed computing

The biggest problem associated with parallel hardware architectures is that such archi-
tectures are not required by the majority of applications in which computers are used. In
other words, applications which require parallel architectures are few and far between.
Thus a particular parallel architecture is unlikely to be manufactured in large quantities.
This makes parallel architectures disproportionately much more expensive than sequential
architectures. For instance, a ﬁxed number of ﬂoat point operations executed on a parallel
machine can be over 1000 times more expensive than the same number of ﬂoat point
multiplications executed on a sequential machine.
The situation is made even worse by the very large difference between applications
requiring parallel computing. For instance, parallelisation of a chess game would ideally
require a different parallel hardware architecture from the parallelisation of language trans-
lation or vision processing. Finite element applications would ideally require a different
parallel hardware conﬁguration from discrete element applications. 2D simulations would
require a signiﬁcantly different parallel architecture from 3D applications if optimum
performance is to be achieved.
There have been attempts in the past to design a general purpose parallel hardware.
The problem with these attempts is that they seem to be optimal architectures for none
of the applications they were intended for. In addition, the parallelisation libraries end up
being extremely hardware-dependent, not only in terms of performance, but very often
in terms of detail and even syntax. This is not to say that parallelisation is impossible.
Parallelisation is certainly a viable solution for many CPU and RAM critical applications.
The problem is that very often it is not affordable. Parallelisation is almost like a custom-
designed machine, because no parallel computer has ever been produced in quantities
measured in millions of machines.

297 298 299 300 301 302 303 304 305 306 307