aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Ilia Filippov	a174a90f86	Supporting dumping, switching off and debug printing of optimization phases	2013-08-01 11:37:52 +04:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Dmitry Babokin	922895de69	Changing ISPC version to 1.4.5dev	2013-07-19 18:47:43 -07:00
Dmitry Babokin	28f0bce9f2	Release 1.4.4	2013-07-19 16:22:10 -07:00
Dmitry Babokin	594485c38c	Release 1.4.3	2013-06-25 18:38:21 +04:00
Dmitry Babokin	cf9ceb6bf9	Release 1.4.2, 11 June 2013	2013-06-11 17:18:54 +04:00
Dmitry Babokin	29ceb42b7b	Bumping version to 1.4.1dev	2013-05-28 19:58:27 +04:00
Dmitry Babokin	6c392ee4a1	Changes for 1.4.1 release	2013-05-28 19:46:30 +04:00
Dmitry Babokin	481bcc732b	Changes for 1.4.0 release	2013-05-27 16:48:41 +04:00
Dmitry Babokin	1a7ac8b804	Enable memory alignment management via compiler options	2013-05-24 10:29:01 +04:00
Dmitry Babokin	b6b9daa3c5	Enabling llvm 3.4	2013-05-13 19:25:31 +04:00
Dmitry Babokin	a0462fe1ee	#469 : Fix for multi-target compilation	2013-04-12 14:06:12 +04:00
Dmitry Babokin	0af2a13349	DataLayout is changed to be managed from single place. v4-128-128 is added to generic DataLayout	2013-03-23 14:38:51 +04:00
Dmitry Babokin	7f0c92eb4d	Fix for #431 : memory leak due to multiple TargetMachine creation	2013-03-23 14:33:45 +04:00
Dmitry Babokin	0f86255279	Target class redesign: data moved to private. Also empty target-feature attribute is not added anymore (generic targets).	2013-03-23 14:28:05 +04:00
Dmitry Babokin	3f8a678c5a	Editorial change: fixing trailing white spaces and tabs	2013-03-18 16:17:55 +04:00
Dmitry Babokin	524939dc5b	Fix for issue #430	2013-02-27 18:03:07 +04:00
Matt Pharr	0bf1320a32	Remove support for building with LLVM 3.0	2013-01-06 12:27:53 -08:00
Peng Tu	16b0806d40	Fix LLVM TOT build issue.	2012-11-21 19:09:10 -08:00
Matt Pharr	be2108260e	Add --opt=force-aligned-memory option. This forces all vector loads/stores to be done assuming that the given pointer is aligned to the vector size, thus allowing the use of sometimes more-efficient instructions. (If it isn't the case that the memory is aligned, the program will fail!).	2012-09-14 13:49:45 -07:00
Jean-Luc Duprat	09bb36f58c	Updated the task system in the example directory to support: Cilk (cilk_for), OpenMP (#pragma omp parallel for), TBB(tbb::task_group and tbb::parallel_for) as well as a new pthreads-based model that fully subscribes the machine (good for KNC). With major contributions from Ingo Wald and James Brodman.	2012-08-28 11:13:12 -07:00
Matt Pharr	19d8f2e258	Generate FMA instructions with AVX2 (when possible). Issue #320.	2012-08-03 10:43:41 -07:00
Matt Pharr	10b79fb41b	Add support for non-factored variants of gather/scatter functions. We now have two ways of approaching gather/scatters with a common base pointer and with offset vectors. For targets with native gather/scatter, we just turn those into base + {1/2/4/8}offsets. For targets without, we turn those into base + {1/2/4/8}varying_offsets + const_offsets, where const_offsets is a compile-time constant. Infrastructure for issue #325.	2012-07-11 14:29:42 -07:00
Matt Pharr	2d8026625b	Always check the execution mask after break/continue/return. When "break", "continue", or "return" is used under varying control flow, we now always check the execution mask to see if all of the program instances are executing it. (Previously, this was only done with "cbreak", "ccontinue", and "creturn", which are now deprecated.) An important effect of this change is that it fixes a family of cases where we could end up running with an "all off" execution mask, which isn't supposed to happen, as it leads to all sorts of invalid behavior. This change does cause the volume rendering example to run 9% slower, but doesn't affect the other examples. Issue #257.	2012-07-06 11:09:11 -07:00
Matt Pharr	6aad4c7a39	Bump version number to 1.3.1dev	2012-07-05 13:35:34 -07:00
Matt Pharr	b69d783e09	Bump version to 1.3.0	2012-06-28 15:35:52 -07:00
Matt Pharr	6c7df4cb6b	Add initial support for "avx1.1" targets for Ivy Bridge. So far, only the use of the float/half conversion instructions distinguishes this from the "avx1" target. Partial work on issue #263.	2012-06-08 15:55:00 -07:00
Matt Pharr	1397dbdabc	Don't generate colorized output escapes when stderr isn't a TTY. When piping to a pile, more/less, etc, this is generally undesirable. This behavior can be overridden with the --colorized-output command-line flag.	2012-06-04 09:20:57 -07:00
Matt Pharr	90db01d038	Represent MOVMSK'ed masks with int64s rather than int32s. This allows us to scale up to 64-wide execution.	2012-05-25 11:57:23 -07:00
Matt Pharr	64807dfb3b	Add AssertPos() macro that provides rough source location in error It can sometimes be useful to know the general place we were in the program when an assertion hit; when the position is available / applicable, this macro is now used. Issue #268.	2012-05-25 10:59:45 -07:00
Matt Pharr	fbed0ac56b	Remove allOffMaskIsSafe from Target The intent of this was to indicate whether it was safe to run code with an 'all of' mask on the given target (and then sometimes be more flexible about e.g. running both true and false blocks of if statements, etc.) The problem is that even if the architecture has full native mask support, it's still not safe to run 'uniform' memory operations with the mask all off. Even more tricky, we sometimes transform masked varying memory operations to uniform ones during optimization (e.g. gather->load and broadcast). This fixes a number of the tests/switch-* tests that were failing on the generic targets due to this issue.	2012-05-09 14:18:47 -07:00
Matt Pharr	0c1b206185	Pass log/exp/pow transcendentals through to targets that support them. Currently, this is the generic targets.	2012-05-03 13:49:56 -07:00
Matt Pharr	d99bd279e8	Add generic-32 target.	2012-05-03 11:11:06 -07:00
Matt Pharr	ee1fe3aa9f	Update build to handle existence of LLVM 3.2 dev branch. We now compile with LLVM 3.0, 3.1, and 3.2svn.	2012-05-03 08:25:25 -07:00
Matt Pharr	03b2b8ae8f	Bump version number to 1.2.3dev	2012-04-20 14:31:46 -07:00
Matt Pharr	c5f6653564	Bump version number to 1.2.2	2012-04-20 11:54:12 -07:00
Matt Pharr	fefa86e0cf	Remove LLVM_TYPE_CONST #define / usage. Now with LLVM 3.0 and beyond, types aren't const.	2012-04-15 20:11:27 -07:00
Matt Pharr	098c4910de	Remove support for building with LLVM 2.9. A forthcoming change uses some features of LLVM 3.0's new type system, and it's not worth back-porting this to also all work with LLVM 2.9.	2012-04-15 20:08:51 -07:00
Matt Pharr	5ece6fec04	Substantial rewrite (again) of decl handling. The decl.* code now no longer interacts with Symbols, but just returns names, types, initializer expressions, etc., as needed. This makes the code a bit more understandable. Fixes issues #171 and #130.	2012-04-12 17:28:30 -07:00
Matt Pharr	8475dc082a	Bump version number to 1.2.2dev	2012-04-06 16:16:50 -07:00
Matt Pharr	c8feee238b	Bump release number to 1.2.1	2012-04-06 15:30:54 -07:00
Matt Pharr	b813452d33	Don't issue a slew of warnings if a bogus cpu type is specified. Issue #221.	2012-04-03 06:13:28 -07:00
Matt Pharr	349ab0b9c5	Bump version number to 1.2.1dev	2012-03-20 12:46:23 -07:00
Matt Pharr	cb7edf2725	Set version to 1.2.0 for release builds	2012-03-20 11:13:50 -07:00
Matt Pharr	777343331e	Print numeric version number with --verison.	2012-03-19 14:41:25 -07:00
Matt Pharr	db5db5aefd	Add native support for (AO)SOA data layout. There's now a SOA variability class (in addition to uniform, varying, and unbound variability); the SOA factor must be a positive power of 2. When applied to a type, the leaf elements of the type (i.e. atomic types, pointer types, and enum types) are widened out into arrays of the given SOA factor. For example, given struct Point { float x, y, z; }; Then "soa<8> Point" has a memory layout of "float x[8], y[8], z[8]". Furthermore, array indexing syntax has been augmented so that when indexing into arrays of SOA-variability data, the two-stage indexing (first into the array of soa<> elements and then into the leaf arrays of SOA data) is performed automatically.	2012-03-05 09:58:10 -08:00
Matt Pharr	73bf552cd6	Add support for coalescing memory accesses from gathers. There are two related optimizations that happen now. (These currently only apply for gathers where the mask is known to be all on, and to gathers that are accessing 32-bit sized elements, but both of these may be generalized in the future.) First, for any single gather, we are now more flexible in mapping it to individual memory operations. Previously, we would only either map it to a general gather (one scalar load per SIMD lane), or an unaligned vector load (if the program instances could be determined to be accessing a sequential set of locations in memory.) Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit), 4-wide, or 8-wide loads. Further, we now generate code that shuffles these loads around. Doing fewer, larger loads in this manner, when possible, can be more efficient. Second, we can coalesce memory accesses across multiple gathers. If we have a series of gathers without any memory writes in the middle, then we try to analyze their reads collectively and choose an efficient set of loads for them. Not only does this help if different gathers reuse values from the same location in memory, but it's specifically helpful when data with AOS layout is being accessed; in this case, we're often able to generate wide vector loads and appropriate shuffles automatically.	2012-02-10 13:10:39 -08:00
Matt Pharr	bb8e13e3c9	Add support for -I command-line argument to specify #include search directories.	2012-02-07 08:39:01 -08:00
Matt Pharr	3efbc71a01	Add fuzz testing of input programs. When the --fuzz-test command-line option is given, the input program will be randomly perturbed by the lexer in an effort to trigger assertions or crashes in the compiler (neither of which should ever happen, even for malformed programs.)	2012-02-06 15:34:47 -08:00

1 2

85 Commits