Commit Graph

26 Commits

Author SHA1 Message Date
Matt Pharr
4ca90272ba Fixes to build with LLVM 3.1 top of tree 2011-11-28 20:25:33 -08:00
Matt Pharr
975db80ef6 Add support for pointers to the language.
Pointers can be either uniform or varying, and behave correspondingly.
e.g.: "uniform float * varying" is a varying pointer to uniform float
data in memory, and "float * uniform" is a uniform pointer to varying
data in memory.  Like other types, pointers are varying by default.

Pointer-based expressions, & and *, sizeof, ->, pointer arithmetic,
and the array/pointer duality all bahave as in C.  Array arguments
to functions are converted to pointers, also like C.

There is a built-in NULL for a null pointer value; conversion from
compile-time constant 0 values to NULL still needs to be implemented.

Other changes:
- Syntax for references has been updated to be C++ style; a useful
  warning is now issued if the "reference" keyword is used.
- It is now illegal to pass a varying lvalue as a reference parameter
  to a function; references are essentially uniform pointers.
  This case had previously been handled via special case call by value
  return code.  That path has been removed, now that varying pointers
  are available to handle this use case (and much more).
- Some stdlib routines have been updated to take pointers as
  arguments where appropriate (e.g. prefetch and the atomics).
  A number of others still need attention.
- All of the examples have been updated
- Many new tests

TODO: documentation
2011-11-27 13:09:59 -08:00
Matt Pharr
cabe358c0a Workaround change to linker behavior in LLVM 3.1
Now, the Linker::LinkModules() call doesn't link in any functions
marked as 'internal', which is problematic, since we'd like to have
just about all of the builtins marked as internal so that they are
eliminated after they've been inlined when they are in fact used.

This change removes all of the internal qualifiers in the builtins
and adds a lSetInternalFunctions() routine to builtins.cpp that
sets this property on the functions that need it after they've
been linked in by LinkModules().
2011-11-05 16:57:26 -07:00
Matt Pharr
afcd42028f Add support for function pointers.
Both uniform and varying function pointers are supported; when a function
is called through a varying function pointer, each unique function pointer
value across the running program instances is called once for the set of
active program instances that want to call it.
2011-11-03 16:14:14 -07:00
Matt Pharr
422b8268a9 Add assert() statement support. Issue #106. 2011-10-15 13:50:05 -07:00
Matt Pharr
49454bc207 Fix silly bug in 16-wide AOS-SOA 3-vector routine 2011-10-11 16:16:56 -07:00
Matt Pharr
286c23426e Add "double-wide" sse2-x2 target.
i.e. run 8 program instances together, along the lines of the double-pumped
sse4-x2 target.
2011-10-11 15:17:31 -07:00
Matt Pharr
3cb0115dce Add routines to standard library to do efficient AOS/SOA conversions.
Currently, we just support 3 and 4-wide variants (i.e. xyzxyz.. and xyzwxyzw..),
for int32 and float types.
2011-10-10 10:56:06 -07:00
Matt Pharr
cb7976bbf6 Added updated task launch implementation that now tracks task groups.
Within each function that launches tasks, we now can easily track which
tasks that function launched, so that the sync at the end of the function
can just sync on the tasks launched by that function (not all tasks
launched by all functions.)

Implementing this led to a rework of the task system API that ispc generates
code to call; the example task systems in examples/tasksys.cpp have been
updated to conform to this API.  (The updated API is also documented in
the ispc user's guide.)

As part of this, "launch[n]" syntax was added to launch a number of tasks
in a single launch statement, rather than requiring a loop over 'n' to
launch n tasks.

This commit thus fixes issue #84 (enhancement to launch multiple tasks from
a single launch statement) as well as issue #105 (recursive task launches
were broken).
2011-09-30 11:20:53 -07:00
Matt Pharr
5ee4d7fce8 Add comment 2011-09-30 11:11:52 -07:00
Matt Pharr
32a0a30cf5 Only allow exact matches for function overload resolution for builtins.
The intent is that the code in stdlib.ispc that is calling out to the built-ins
  should match argument types exactly (using explicit casts as needed), just
  for maximal clarity/safety.
2011-09-28 17:20:31 -07:00
Matt Pharr
aad269fdf4 Added support for 'uniform' global atomics.
Issue #93.
2011-09-28 16:06:07 -07:00
Matt Pharr
0c20483853 Make all "if" statements "coherent" ifs. Workaround for issue #74.
Using blend to do masked stores is unsafe if all of the lanes are off:
it may read from or write to invalid memory.  For now, this workaround
transforms all 'if' statements into coherent 'if's, ensuring that an
instruction only runs if at least on program instance wants to be running
it.

One nice thing about this change is that a number of implementations of
various builtins can be simplified, since they no longer need to confirm
that at least one program instance is running.

It might be nice to re-enable regular if statements in a future checkin,
but we'd want to make sure they don't have any masked loads or blended
masked stores in their statement lists.  There isn't a performance
impact for any of the examples with this change, so it's unclear if
this is important.

Note that this only impacts 'if' statements with a varying condition.
2011-09-12 16:25:08 -07:00
Matt Pharr
83f22f1939 Add experimental --fast-masked-vload flag for SSE. 2011-09-12 12:29:33 -07:00
Matt Pharr
c86128e8ee AVX: go back to using blend (vs. masked store) when possible.
All of the masked store calls were inhibiting putting values into
registers, which in turn led to a lot of unnecessary stack traffic.
This approach seems to give better code in the end.
2011-09-07 11:26:49 -07:00
Matt Pharr
e144724979 Improve performance of global atomics, taking advantage of associativity.
For associative atomic ops (add, and, or, xor), we can take advantage of
their associativity to do just a single hardware atomic instruction, 
rather than one for each of the running program instances (as the previous
implementation did.)

The basic approach is to locally compute a reduction across the active
program instances with the given op and to then issue a single HW atomic
with that reduced value as the operand.  We then take the old value that
was stored in the location that is returned from the HW atomic op and
use that to compute the values to return to each of the program instances
(conceptually representing the cumulative effect of each of the preceding
program instances having performed their atomic operation.)

Issue #56.
2011-08-31 05:35:01 -07:00
Matt Pharr
4ab982bc16 Various AVX fixes (found by inspection).
Emit calls to masked_store, not masked_store_blend, when handling
  masked stores emitted by the frontend.
Fix bug in binary8to16 macro in builtins.m4
Fix bug in 16-wide version of __reduce_add_float
Remove blend function implementations for masked_store_blend for
  AVX; just forward those on to the corresponding real masked store
  functions.
2011-08-26 12:58:02 -07:00
Matt Pharr
606cbab0d4 Performance improvements for global min/max atomics. Issue #57.
Compute a "local" min/max across the active program instances and 
  then do a single atomic memory op.
Added a few tests to exercise global min/max atomics (which were
  previously untested!)
2011-08-26 10:35:24 -07:00
Matt Pharr
7756265503 Add double-pumped AVX target (i.e., run 16-wide). Not yet tested. 2011-08-20 11:28:22 +01:00
Matt Pharr
87cf05e0d2 Improve performance of 64-bit reduce_equal implementations.
Just pulling out the elements and doing a set of scalar equality tests
is the best approach for those (nearly 2x better than the rotate and
vector equality check that we use for 32-bit stuff).
2011-08-14 07:39:05 +01:00
Matt Pharr
ff608eef71 Change reduce_equal to return false if no instances are executing 2011-08-14 07:11:45 +01:00
Matt Pharr
f868a63064 Add support for scan operations across program instances (add, and, or). 2011-08-13 20:11:41 +01:00
Matt Pharr
8c534d4d74 Add reduce_equal() function to standard library. 2011-08-10 15:55:55 -07:00
Matt Pharr
0ac4f7b620 Add various prefetch functions to the standard library. 2011-08-03 13:31:45 -07:00
Matt Pharr
a4bb6b5520 Add new example with implementation of Perlin Noise
~4.2x speedup versus serial on OSX / gcc.
~2.9x speedup versus serial on Windows / MSVC.
2011-08-01 10:33:18 +01:00
Matt Pharr
a552927a6a Cleanup implementation of target builtins code.
- Renamed stdlib-sse.ll to builtins-sse.ll (etc.) in an attempt to better indicate
the fact that the stuff in those files has a role beyond implementing stuff for
the standard library.
- Moved declarations of the various __pseudo_* functions from being done with LLVM
API calls in builtins.cpp to just straight up declarations in LLVM assembly
language in builtins.m4.  (Much less code to do it this way, and more clear what's
going on.)
2011-08-01 05:58:43 +01:00