started to work on documentation

This commit is contained in:
evghenii
2014-07-08 08:41:29 +02:00
parent 2ed65c8b16
commit 1fc75ed494

View File

@@ -181,9 +181,9 @@ Contents:
* `Experimental support for PTX`_
+ `Overview`_
+ `Generation of PTX`_
+ `Execution of PTX`_
+ `Compiling For The NVIDIA Kepler GPU`_
+ `Hints`_
+ `Limitations & known issues`_
* `Disclaimer and Legal Information`_
@@ -4945,27 +4945,90 @@ program instances improves performance.
Experimental support for PTX
============================
One of the ``ispc`` goals is also to offer performance portability of ISPC
program across various parallel processors, in particular CPUs and GPUs. This
section describes how to use ISPC in combination with CUDA Toolkit to generate
and execute PTX.
``ispc`` has a limited support for PTX code generation which currently targets
NVIDIA GPUs with compute capability 3.5 [Kepler GPUs with support for dynamic
parallelism]. Due to its experimental support in ``ispc``, the PTX backend
currently impose several restrictions on the source code which will detailed
below.
Overview
--------
SPMD programming model can be mapped to CUDA cores.
SPMD programming in ``ispc`` with PTX target in mind should be thought of a
warp-synchronous CUDA programming. In particular, every program instances is
mapped to a CUDA thread, and a gang is mapped to a CUDA warp. To run efficiently
on GPU, `ispc`` program must use tasking functionality via ``launch`` keyword.
``export`` functions are also equipped with a CUDA C wrapper that schedule a
single thread-block of 32 threads--a warp--. In contract to CPU programming, it
is expected that this exported function, either directly or otherwise, will
utilize ``launch`` keyword to schedule a work across GPU. In contrast to CPU,
there is no other way to efficiently utilize rich GPU compute resources.
At PTX level, ``launch`` keyword is mapped to a CUDA Dynamic Parallelism that
schedules a grid of thread-blocks each 128 threads--or 4 warps--wide
[dim3(128,1,1)]. Therefore ``ispc`` currently tasking-granularity with PTX
target is 4 tasks; this restriction will be eliminated in future.
When passing pointers to an ``export`` function compiled for execution on GPU,
it is important that these pointers remain legal when access from GPU. Prior to
CUDA 6.0, this pointers has to hold address that is only accessible from the
GPU. With the release of CUDA 6.0, it is possible to pass a pointer to unified
memory. For this, ``ispc`` provides helper wrapper functions that call CUDA API
for managed memory allocations, therefore allowing the programming to avoid
explicit memory copies.
Generation of PTX
------------------
To generate PTX.
Execution of PTX
----------------
To execute PTX
Compiling For The NVIDIA Kepler GPU
-----------------------------------
Compilation for NVIDIA Kepler GPU is currently a several step procedure.
First we need to generate a LLVM bitcode from ``ispc`` source file:
::
$ISPC_HOME/ispc foo.ispc --emit-llvm --target=nvptx -o foo.bc
If ``ispc`` is compiled with LLVM 3.2, the resulting bitcode can immediately be
compile to PTX with the help of ``ptxgen`` tool which uses ``libNVVM`` [this
requires CUDA Toolkit installation]:
::
$ISPC_HOME/ptxtools/ptxgen --use_fast_math foo.bc -o foo.ptx
Otherwise, we need to decompile the bitcode with the ``llvm-dis`` that comes
with LLVM 3.2 distribution; this "trick" is required to generate an IR
compatible with libNVVM:
::
$LLVM32/bin/llvm-dis foo.bc -o foo.ll
$ISPC_HOME/ptxtools/ptxgen --use_fast_math foo.ll -o foo.ptx
At this point the resulting PTX code could be used to run on GPU with the help
of, for example, CUDA Driver API. Instead, we provide a ``ptxcc`` tool, which
compiles the PTX code into an object file:
::
$ISPC_HOME/ptxtools/ptxcc foo.ptx -o foo_cu.o -Xnvcc="--maxrregcount=64
-Xptxas=-v"
Finally, this object file can be linked with the main program via ``nvcc``:
::
nvcc foo_cu.o foo_main.o -o foo
Hints
-----
Few things to observe
Limitations & known issues
--------------------------
Disclaimer and Legal Information