86 lines
2.9 KiB
ReStructuredText
86 lines
2.9 KiB
ReStructuredText
===========
|
|
Performance
|
|
===========
|
|
|
|
The SPMD programming model that ``ispc`` makes it easy to harness the
|
|
computational power available in SIMD vector units on modern CPUs, while
|
|
its basis in C makes it easy for programmers to adopt and use
|
|
productively. This page summarizes the performance of ``ispc`` with the
|
|
workloads in the ``examples/`` directory of the ``ispc`` distribution.
|
|
|
|
These results were measured on a 4-core Apple iMac with a 4-core 3.4GHz
|
|
Intel® Core-i7 processor using the Intel® AVX instruction set. The basis
|
|
for comparison is a reference C++ implementation compiled with gcc 4.2.1,
|
|
the version distributed with OS X 10.7.2. (The reference implementation is
|
|
also included in the ``examples/`` directory.)
|
|
|
|
.. list-table:: Performance of ``ispc`` with a variety of the workloads
|
|
from the ``examples/`` directory of the ``ispc`` distribution, compared
|
|
a reference C++ implementation compiled with gcc 4.2.1.
|
|
|
|
* - Workload
|
|
- ``ispc``, 1 core
|
|
- ``ispc``, 4 cores
|
|
* - `AOBench`_ (512 x 512 resolution)
|
|
- 6.19x
|
|
- 28.06x
|
|
* - `Binomial Options`_ (128k options)
|
|
- 7.94x
|
|
- 33.43x
|
|
* - `Black-Scholes Options`_ (128k options)
|
|
- 8.45x
|
|
- 32.48x
|
|
* - `Deferred Shading`_ (1280p)
|
|
- 5.02x
|
|
- 23.06x
|
|
* - `Mandelbrot Set`_
|
|
- 6.21x
|
|
- 20.28x
|
|
* - `Perlin Noise Function`_
|
|
- 5.37x
|
|
- n/a
|
|
* - `Ray Tracer`_ (Sponza dataset)
|
|
- 4.31x
|
|
- 20.29x
|
|
* - `3D Stencil`_
|
|
- 4.05x
|
|
- 15.53x
|
|
* - `Volume Rendering`_
|
|
- 3.60x
|
|
- 17.53x
|
|
|
|
|
|
.. _AOBench: https://github.com/ispc/ispc/tree/master/examples/aobench
|
|
.. _Binomial Options: https://github.com/ispc/ispc/tree/master/examples/options
|
|
.. _Black-Scholes Options: https://github.com/ispc/ispc/tree/master/examples/options
|
|
.. _Deferred Shading: https://github.com/ispc/ispc/tree/master/examples/deferred
|
|
.. _Mandelbrot Set: https://github.com/ispc/ispc/tree/master/examples/mandelbrot_tasks
|
|
.. _Ray Tracer: https://github.com/ispc/ispc/tree/master/examples/rt
|
|
.. _Perlin Noise Function: https://github.com/ispc/ispc/tree/master/examples/noise
|
|
.. _3D Stencil: https://github.com/ispc/ispc/tree/master/examples/stencil
|
|
.. _Volume Rendering: https://github.com/ispc/ispc/tree/master/examples/volume_rendering
|
|
|
|
|
|
The following table shows speedups for a number of the examples on a
|
|
2.40GHz, 40-core Intel® Xeon E7-8870 system with the Intel® SSE4
|
|
instruction set, running Microsoft Windows Server 2008 Enterprise. Here,
|
|
the serial C/C++ baseline code was compiled with MSVC 2010.
|
|
|
|
.. list-table:: Performance of ``ispc`` with a variety of the workloads
|
|
from the ``examples/`` directory of the ``ispc`` distribution, on
|
|
system with 40 CPU cores.
|
|
|
|
* - Workload
|
|
- ``ispc``, 40 cores
|
|
* - AOBench (2048 x 2048 resolution)
|
|
- 182.36x
|
|
* - Binomial Options (2m options)
|
|
- 63.85x
|
|
* - Black-Scholes Options (2m options)
|
|
- 83.97x
|
|
* - Ray Tracer (Sponza dataset)
|
|
- 195.67x
|
|
* - Volume Rendering
|
|
- 243.18x
|
|
|