Page 168 - A Practical Guide from Design Planning to Manufacturing
P. 168
Microarchitecture 141
by designs from the opposite camp. The entire industry moves to higher
performance by improving both frequency and IPC. Benchmarks are the
best way of measuring processor performance, but they are far from per-
fect. The biggest problems are choosing benchmark programs, compiler
optimizations, and system configurations. Different choices for any of
these may suddenly change which computer scores as the fastest on a
given benchmark.
Choosing realistic programs as the obstacle course is perhaps the most
difficult problem because which programs are really important varies so
much from one user to another. Someone who plays many 3D games will
find the performance of floating-point intensive programs relevant, whereas
someone who does not may care only about the performance of integer pro-
grams. An added complication is that the applications available change over
time. The software takes advantage of improving processor performance
and greater memory capacity by creating new applications that require
more performance and memory. Benchmarks must regularly update the
programs they use in order reflect current software. The SPEC integer
benchmark program list was first created in 1988, but was updated in 1992,
1996, and 2000.
Processors with large caches tend to look better running older bench-
marks. Processor cache memory has increased quickly over time but not
nearly as dramatically as the size of typical applications. Usually a
processor will have nowhere near enough cache to hold all the code or
instructions of a contemporary program. However, it might have a large
enough cache to hold all of a benchmark, especially if that benchmark
is a few years old. This creates a strong temptation for some companies
to quote older benchmarks long after they have ceased to be represen-
tative of current software.
In order to be run on processors of different architectures, benchmarks
must be written in high-level programming languages, but this means
that the compiler that translates the code for execution on a particular
processor can have a very large impact on the measured performance.
Some microarchitectures rely more upon the compiler for reordering
while others are better able to compensate for a simpler compiler by
reordering at run time. Even a simple compiler design typically has
many different optimizations that can be selected. Some will expose
more parallelism at the cost of increasing code size. This may improve
or hurt performance depending upon the details of the microarchitecture.
Processors with complex hardware reordering look relatively better
when making comparisons with simple compilers. Processors that need
sophisticated compilers have helped drive improvements in compiler
technology and these advances have been part of the steady progress in
computer performance. However, sometimes undue attention is given to
compiler optimizations that show dramatic improvements in performance