aaron/ispc - ispc - git.frat.tech

aaron/ispc

Go to file

Matt Pharr 73bf552cd6 Add support for coalescing memory accesses from gathers.

There are two related optimizations that happen now.  (These
currently only apply for gathers where the mask is known to be
all on, and to gathers that are accessing 32-bit sized elements,
but both of these may be generalized in the future.)

First, for any single gather, we are now more flexible in mapping it
to individual memory operations.  Previously, we would only either map
it to a general gather (one scalar load per SIMD lane), or an 
unaligned vector load (if the program instances could be determined
to be accessing a sequential set of locations in memory.)

Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit),
4-wide, or 8-wide loads.  Further, we now generate code that shuffles
these loads around.  Doing fewer, larger loads in this manner, when
possible, can be more efficient.

Second, we can coalesce memory accesses across multiple gathers. If 
we have a series of gathers without any memory writes in the middle,
then we try to analyze their reads collectively and choose an efficient
set of loads for them.  Not only does this help if different gathers
reuse values from the same location in memory, but it's specifically
helpful when data with AOS layout is being accessed; in this case,
we're often able to generate wide vector loads and appropriate shuffles
automatically.

2012-02-10 13:10:39 -08:00

builtins

Fix build with LLVM 2.9.

2012-02-07 08:37:13 -08:00

contrib

vim syntax highlighting for ispc from <andreas.wendleder@googlemail.com>

2011-08-04 05:49:28 -07:00

docs

Add FAQ about how to cross-inline ispc and C/C++ code.

2012-02-10 12:26:19 -08:00

examples

Add perfbench to examples: a few small microbenchmarks.

2012-02-10 12:27:13 -08:00

tests

Add support for coalescing memory accesses from gathers.

2012-02-10 13:10:39 -08:00

tests_errors

Improve error handling and reporting in the parser.

2012-02-07 11:13:32 -08:00

winstuff

Merge branch 'master' of https://github.com/jduprat/ispc

2012-01-26 14:18:25 -08:00

.gitignore

Release notes, bump doxygen version # for next release.

2011-07-17 16:52:36 +02:00

ast.cpp

Short-circuit evaluation of && and || operators.

2012-01-30 05:58:41 -08:00

ast.h

Short-circuit evaluation of && and || operators.

2012-01-30 05:58:41 -08:00

bitcode2cpp.py

Fixed off by one error in array size generated by bitcode2cpp.py

2012-01-10 11:22:13 -08:00

buildall.bat

Fix various warnings / build issues on Windows

2011-12-15 12:06:38 -08:00

buildispc.bat

Add buildispc.bat script for just building the compiler on windows.

2012-01-04 11:44:19 -08:00

builtins.cpp

Add support for 1-wide scalar target.

2012-01-29 06:36:07 -08:00

builtins.h

Add support for compiling to multiple targets.

2011-10-04 16:01:55 -07:00

cbackend.cpp

Fix C++ backend to not assert with LLVM 3.1 svn builds.

2012-02-10 12:30:31 -08:00

ctx.cpp

Move assert so that an error is issued for "break" outside of loops.

2012-02-06 15:35:43 -08:00

ctx.h

Improve code for uniform switches with a 'break' under varying control flow.

2012-01-19 08:41:19 -07:00

decl.cpp

Issue an error if an array of references is declared.

2012-02-06 15:35:43 -08:00

decl.h

Support function declarations in the definitions of other functions.

2012-01-06 13:50:10 -08:00

doxygen.cfg

Release notes, bump doxygen release number for 1.1.4

2012-02-04 15:38:17 -08:00

expr.cpp

Constant fold more cases in SelectExpr::Optimize()

2012-02-10 12:28:54 -08:00

expr.h

Implement NullPointerExpr::GetConstant()

2012-01-31 09:37:39 -08:00

func.cpp

Fix bug with multiple EmitCode() calls due to missing braces.

2012-01-10 16:50:13 -08:00

func.h

Significantly reduce the tendrils of DeclSpecs/Declarator/Declaration code

2011-10-18 15:37:29 -07:00

ispc.cpp

Add support for coalescing memory accesses from gathers.

2012-02-10 13:10:39 -08:00

ispc.h

Add support for coalescing memory accesses from gathers.

2012-02-10 13:10:39 -08:00

ispc.sln

Update run_tests.py to work on Windows. Removed JIT-based testing path entirely.

2011-12-06 13:46:20 -08:00

ispc.vcxproj

Windows build support for scalar target.

2012-01-29 13:48:01 -08:00

lex.ll

Improve error handling and reporting in the parser.

2012-02-07 11:13:32 -08:00

LICENSE.txt

Add support for in-memory half float data. Fixes issue #10

2011-07-21 15:55:45 +01:00

llvmutil.cpp

Fix build with LLVM 3.1 TOT

2012-01-31 14:10:07 -08:00

llvmutil.h

For << and >> with C++, detect when all instances are shifting by the same amount.

2012-01-19 10:04:32 -07:00

main.cpp

Add support for coalescing memory accesses from gathers.

2012-02-10 13:10:39 -08:00

Makefile

Add support for 1-wide scalar target.

2012-01-29 06:36:07 -08:00

module.cpp

Fix placement of ParserInit() call

2012-02-10 12:29:57 -08:00

module.h

Add support for emitting ~generic vectorized C++ code.

2012-01-04 12:59:03 -08:00

opt.cpp

Add support for coalescing memory accesses from gathers.

2012-02-10 13:10:39 -08:00

opt.h

Initial commit.

2011-06-21 12:48:50 -07:00

parse.yy

Improve error handling and reporting in the parser.

2012-02-07 11:13:32 -08:00

README.rst

Include AVX2 in supported ISAs

2012-01-22 07:05:47 -08:00

run_tests.py

Update run_tests and examples makefile for scalar target.

2012-01-29 16:22:25 -08:00

simple.vcxproj

Windows: fix some compiler warnings during build

2011-10-09 07:40:17 -07:00

stdlib2cpp.py

Python build compatible on both python 2 and 3

2012-01-10 10:42:15 -08:00

stdlib.ispc

Add missing "varying/varying" atomic_compare_exchange_global() functions.

2012-02-03 13:19:15 -08:00

stmt.cpp

Fix typo in IfStmt::EstimateCost()

2012-02-06 14:44:54 -08:00

stmt.h

Add support for "new" and "delete" to the language.

2012-01-27 14:47:06 -08:00

sym.cpp

Add fuzz testing of input programs.

2012-02-06 15:34:47 -08:00

sym.h

Add fuzz testing of input programs.

2012-02-06 15:34:47 -08:00

test_static.cpp

Use Assert() rather than assert()

2012-01-08 14:06:44 -08:00

type.cpp

Issue error on "void" typed variable, function parameter, or struct member.

2012-02-06 14:44:48 -08:00

type.h

Add notion of "unbound" variability to the type system.

2012-01-06 11:52:58 -08:00

util.cpp

Don't indent *too* much on continued lines with warnings/errors.

2012-02-10 12:26:35 -08:00

util.h

Implement vasprintf and asprintf for platforms lacking them.

2012-01-09 09:44:58 +01:00

README.rst

==============================
Intel(r) SPMD Program Compiler
==============================

``ispc`` is a compiler for a variant of the C programming language, with
extensions for `single program, multiple data
<http://en.wikipedia.org/wiki/SPMD>`_ programming.  Under the SPMD model,
the programmer writes a program that generally appears to be a regular
serial program, though the execution model is actually that a number of
*program instances* execute in parallel on the hardware.

Overview
--------

``ispc`` compiles a C-based SPMD programming language to run on the SIMD
units of CPUs; it frequently provides a 3x or more speedup on CPUs with
4-wide vector SSE units and 5x-6x on CPUs with 8-wide AVX vector units,
without any of the difficulty of writing intrinsics code.  Parallelization
across multiple cores is also supported by ``ispc``, making it
possible to write programs that achieve performance improvement that scales
by both number of cores and vector unit size.

There are a few key principles in the design of ``ispc``:

  * To build a small set of extensions to the C language that
    would deliver excellent performance to performance-oriented
    programmers who want to run SPMD programs on the CPU.

  * To provide a thin abstraction layer between the programmer
    and the hardware--in particular, to have an execution and
    data model where the programmer can cleanly reason about the
    mapping of their source program to compiled assembly language
    and the underlying hardware.

  * To make it possible to harness the computational power of SIMD
    vector units without the extremely low-programmer-productivity
    activity of directly writing intrinsics.

  * To explore opportunities from close coupling between C/C++
    application code and SPMD ``ispc`` code running on the
    same processor--to have lightweight function calls between
    the two languages and to share data directly via pointers without
    copying or reformatting.

``ispc`` is an open source compiler with the BSD license.  It uses the
remarkable `LLVM Compiler Infrastructure <http://llvm.org>`_ for back-end
code generation and optimization and is `hosted on
github <http://github.com/ispc/ispc/>`_. It supports Windows, Mac, and
Linux, with both x86 and x86-64 targets.  It currently supports the SSE2,
SSE4, AVX1, and AVX2 instruction sets.

Features
--------

``ispc`` provides a number of key features to developers:

  * Familiarity as an extension of the C programming
    language: ``ispc`` supports familiar C syntax and
    programming idioms, while adding the ability to write SPMD
    programs.

  * High-quality SIMD code generation: the performance
    of code generated by ``ispc`` is often close to that of
    hand-written intrinsics code.

  * Ease of adoption with existing software
    systems: functions written in ``ispc`` directly
    interoperate with application functions written in C/C++ and
    with application data structures.
            
  * Portability across over a decade of CPU
    generations: ``ispc`` has targets for SSE2, SSE4, AVX
    (and soon, AVX2).

  * Portability across operating systems: Microsoft
    Windows, Mac OS X, and Linux are all supported
    by ``ispc``.

  * Debugging with standard tools: ``ispc``
    programs can be debugged with standard debuggers (OS X and
    Linux only).

Additional Resources
--------------------

Prebuilt ``ispc`` binaries for Windows, OS X and Linux can be downloaded
from the `ispc downloads page <http://ispc.github.com/downloads.html>`_.
See also additional
`documentation <http://ispc.github.com/documentation.html>`_ and additional
`performance information <http://ispc.github.com/perf.html>`_.

Languages

C++ 63.5%

LLVM 19.1%

M4 11.6%

Python 4.5%

Makefile 0.5%

Other 0.6%