Within each function that launches tasks, we now can easily track which
tasks that function launched, so that the sync at the end of the function
can just sync on the tasks launched by that function (not all tasks
launched by all functions.)
Implementing this led to a rework of the task system API that ispc generates
code to call; the example task systems in examples/tasksys.cpp have been
updated to conform to this API. (The updated API is also documented in
the ispc user's guide.)
As part of this, "launch[n]" syntax was added to launch a number of tasks
in a single launch statement, rather than requiring a loop over 'n' to
launch n tasks.
This commit thus fixes issue #84 (enhancement to launch multiple tasks from
a single launch statement) as well as issue #105 (recursive task launches
were broken).
Go back to running both sides of 'if' statements with masking and without
branching if we can determine that the code is relatively simple (as per
the simple cost model), and is safe to run even if the mask is 'all off'.
This gives a bit of a performance improvement for some of the examples
(most notably, the ray tracer), and is the code that one wants generated
in this case anyhow.
This is currently only used to decide whether it's worth doing an
"are all lanes running" check at the start of functions--for small
functions, it's not worth the overhead.
The cost is estimated relatively early in compilation (e.g. before
we know if an array access is a scatter/gather or not, before
constant folding, etc.), so there are many known shortcomings.
If no CPU is specified, use the host CPU type, not just a default of "nehalem".
Provide better features strings to the LLVM target machinery.
-> Thus ensuring that LLVM doesn't generate SSE>2 instructions for the SSE2
target (Fixes issue #82).
-> Slight code improvements from using cmovs in generated code now
Use the llvm popcnt intrinsic for the SSE2 target now (it now generates code
that doesn't call the popcnt instruction now that we properly tell LLVM
which instructions are and aren't available for SSE2.)
Set the Module's target appropriately when it's first created.
Compile separate 32 and 64 bit versions of the builtins-c bitcocde
and load the appropriate one based on the target we're compiling
for.
Fixes issue #77. Previously, it dumped out the entire module every time
a new function was defined, which got to be quite a lot of output by
the time the stdlib functions were all added!
- In the ispc-generated header files, a #define now indicates which compilation target
was used.
- The examples use utility routines from the new file examples/cpuid.h to check the
system's CPU's capabilities to see if it supports the ISA that was used for
compiling the example code and print error messages if things aren't going to
work...
Fixes bug #55. A number of tests were crashing on Windows due to the task
launch code using alloca to allocate space for the tasks' parameters. On
Windows, the stack isn't generally big enough for this to be a good idea.
Also added an alignment parmaeter to ISPCMalloc() to pass the alignment
requirement along.
build to handle recent change to API. If building with LLVM tot, a
version starting with or after this change must be used:
commit 276365dd4bc0c2160f91fd8062ae1fc90c86c324
Author: Evan Cheng <evan.cheng@apple.com>
Date: Thu Jun 30 01:53:36 2011 +0000
Fix the ridiculous SubtargetFeatures API where it implicitly expects CPU name to
be the first encoded as the first feature. It then uses the CPU name to look up
features / scheduling itineray even though clients know full well the CPU name
being used to query these properties.
The fix is to just have the clients explictly pass the CPU name!
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@134127 91177308-0d34-0410-b5e6-96231b3b80d8