Matt Pharr 73bf552cd6 Add support for coalescing memory accesses from gathers.
There are two related optimizations that happen now.  (These
currently only apply for gathers where the mask is known to be
all on, and to gathers that are accessing 32-bit sized elements,
but both of these may be generalized in the future.)

First, for any single gather, we are now more flexible in mapping it
to individual memory operations.  Previously, we would only either map
it to a general gather (one scalar load per SIMD lane), or an 
unaligned vector load (if the program instances could be determined
to be accessing a sequential set of locations in memory.)

Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit),
4-wide, or 8-wide loads.  Further, we now generate code that shuffles
these loads around.  Doing fewer, larger loads in this manner, when
possible, can be more efficient.

Second, we can coalesce memory accesses across multiple gathers. If 
we have a series of gathers without any memory writes in the middle,
then we try to analyze their reads collectively and choose an efficient
set of loads for them.  Not only does this help if different gathers
reuse values from the same location in memory, but it's specifically
helpful when data with AOS layout is being accessed; in this case,
we're often able to generate wide vector loads and appropriate shuffles
automatically.
2012-02-10 13:10:39 -08:00
2012-02-07 08:37:13 -08:00
2012-01-31 14:10:07 -08:00
2012-01-29 06:36:07 -08:00
2012-02-10 12:29:57 -08:00
2011-06-21 12:48:50 -07:00
2012-01-22 07:05:47 -08:00
2012-02-06 14:44:54 -08:00
2012-02-06 15:34:47 -08:00
2012-02-06 15:34:47 -08:00

==============================
Intel(r) SPMD Program Compiler
==============================

``ispc`` is a compiler for a variant of the C programming language, with
extensions for `single program, multiple data
<http://en.wikipedia.org/wiki/SPMD>`_ programming.  Under the SPMD model,
the programmer writes a program that generally appears to be a regular
serial program, though the execution model is actually that a number of
*program instances* execute in parallel on the hardware.

Overview
--------

``ispc`` compiles a C-based SPMD programming language to run on the SIMD
units of CPUs; it frequently provides a 3x or more speedup on CPUs with
4-wide vector SSE units and 5x-6x on CPUs with 8-wide AVX vector units,
without any of the difficulty of writing intrinsics code.  Parallelization
across multiple cores is also supported by ``ispc``, making it
possible to write programs that achieve performance improvement that scales
by both number of cores and vector unit size.

There are a few key principles in the design of ``ispc``:

  * To build a small set of extensions to the C language that
    would deliver excellent performance to performance-oriented
    programmers who want to run SPMD programs on the CPU.

  * To provide a thin abstraction layer between the programmer
    and the hardware--in particular, to have an execution and
    data model where the programmer can cleanly reason about the
    mapping of their source program to compiled assembly language
    and the underlying hardware.

  * To make it possible to harness the computational power of SIMD
    vector units without the extremely low-programmer-productivity
    activity of directly writing intrinsics.

  * To explore opportunities from close coupling between C/C++
    application code and SPMD ``ispc`` code running on the
    same processor--to have lightweight function calls between
    the two languages and to share data directly via pointers without
    copying or reformatting.

``ispc`` is an open source compiler with the BSD license.  It uses the
remarkable `LLVM Compiler Infrastructure <http://llvm.org>`_ for back-end
code generation and optimization and is `hosted on
github <http://github.com/ispc/ispc/>`_. It supports Windows, Mac, and
Linux, with both x86 and x86-64 targets.  It currently supports the SSE2,
SSE4, AVX1, and AVX2 instruction sets.

Features
--------

``ispc`` provides a number of key features to developers:

  * Familiarity as an extension of the C programming
    language: ``ispc`` supports familiar C syntax and
    programming idioms, while adding the ability to write SPMD
    programs.

  * High-quality SIMD code generation: the performance
    of code generated by ``ispc`` is often close to that of
    hand-written intrinsics code.

  * Ease of adoption with existing software
    systems: functions written in ``ispc`` directly
    interoperate with application functions written in C/C++ and
    with application data structures.
            
  * Portability across over a decade of CPU
    generations: ``ispc`` has targets for SSE2, SSE4, AVX
    (and soon, AVX2).

  * Portability across operating systems: Microsoft
    Windows, Mac OS X, and Linux are all supported
    by ``ispc``.

  * Debugging with standard tools: ``ispc``
    programs can be debugged with standard debuggers (OS X and
    Linux only).

Additional Resources
--------------------

Prebuilt ``ispc`` binaries for Windows, OS X and Linux can be downloaded
from the `ispc downloads page <http://ispc.github.com/downloads.html>`_.
See also additional
`documentation <http://ispc.github.com/documentation.html>`_ and additional
`performance information <http://ispc.github.com/perf.html>`_.
Description
No description provided
Readme 34 MiB
Languages
C++ 63.5%
LLVM 19.1%
M4 11.6%
Python 4.5%
Makefile 0.5%
Other 0.6%