aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	dc120f3962	Fix regression in masked_store_blend for generic target. In `ee1fe3aa9f`, the LLVM_VERSION define was updated to never have the 'svn' suffix and the build was updated to handle LLVM 3.2. This file had a check for LLVM_3_1svn that was no longer hitting. This fixes some issues with unnecessary loads and stores in generated C++ code for the generic targets.	2012-05-09 14:18:47 -07:00
Matt Pharr	c6241581a0	Add an extra parameter to __smear functions to encode return type. Now, the __smear* functions in generated C++ code have an unused first parameter of the desired return type; this allows us to have headers that include variants of __smear for multiple target widths. (This approach is necessary since we can't overload by return type in C++.) Issue #256.	2012-05-08 09:54:23 -07:00
Matt Pharr	ee7e367981	Do global dead code elimination early in optimization. This gives a 15-20% speedup in compilation time for simple programs (but only ~2% for the big 21k monster program).	2012-05-05 15:47:19 -07:00
Matt Pharr	0c1b206185	Pass log/exp/pow transcendentals through to targets that support them. Currently, this is the generic targets.	2012-05-03 13:49:56 -07:00
Matt Pharr	d99bd279e8	Add generic-32 target.	2012-05-03 11:11:06 -07:00
Matt Pharr	fd846fbe77	Fix bug in __gather_base_offsets_32. In short, we weren't correctly zeroing the compile-time constant portion of the offsets for lanes that aren't executing. (!) Fixes issue #235.	2012-04-12 10:28:15 -07:00
Matt Pharr	3b95452481	Add memcpy(), memmove() and memset() to the standard library. Issue #183.	2012-03-05 16:09:00 -08:00
Matt Pharr	c152ae3c32	Add single-precision asin() and acos() to stdlib. Issue #184.	2012-03-05 13:32:13 -08:00
Matt Pharr	5b4673e8eb	Fix build with LLVM 2.9.	2012-02-07 08:37:13 -08:00
Matt Pharr	f2fbc168af	Scalar target builtins bugfixes. Typo in __max_varying_double. Add declarations for half functions. Use the gen_scatter macro to get the scatter functions.	2012-01-29 13:47:44 -08:00
Gabe Weisz	c67a286aa6	Add support for 1-wide scalar target. Issue #40.	2012-01-29 06:36:07 -08:00
Matt Pharr	664dc3bdda	Add support for "new" and "delete" to the language. Issue #139.	2012-01-27 14:47:06 -08:00
Matt Pharr	d9c0f9315a	Fix generic targets: half conversion functions weren't declared. (Broken by `1867b5b31`).	2012-01-27 14:44:43 -08:00
Matt Pharr	24f58fa16a	Update per_lane macro to not use ID for lane number in macro expansion This was leading to unintended consequences if WIDTH was used in macro code, which was undesirable.	2012-01-27 09:12:13 -08:00
Matt Pharr	1867b5b317	Use native float/half conversion instructions with the AVX2 target.	2012-01-24 15:33:38 -08:00
Matt Pharr	a5b7fca7e0	Extract constant offsets from gather/scatter base+offsets offset vectors. When we're able to turn a general gather/scatter into the "base + offsets" form, we now try to extract out any constant components of the offsets and then pass them as a separate parameter to the gather/scatter function implementation. We then in turn carefully emit code for the addressing calculation so that these constant offsets match LLVM's patterns to detect this case, such that we get the constant offsets directly encoded in the instruction's addressing calculation in many cases, saving arithmetic instructions to do these calculations. Improves performance of stencil by ~15%. Other workloads unchanged.	2012-01-24 14:41:15 -08:00
Matt Pharr	d805e8b183	Add clock() function to standard library. Also corrected the declaration of num_cores() to return a uniform value.	2012-01-22 13:05:27 -08:00
Matt Pharr	1bba9d4307	Improve atomic_swap_global() to take advantage of associativity. We now do a single atomic hardware swap and then effectively do swaps between the running program instances such that the result is the same as if they had happened to run a particular ordering of hardware swaps themselves. Also cleaned up __atomic_swap_uniform_* built-in implementations to not take the mask, which they weren't using anyway. Finishes Issue #56.	2012-01-20 10:37:33 -08:00
Matt Pharr	d14a2de168	Fix generic code emission when building with LLVM3.0/2.9. Specifically, don't use vector select for masked store blend there, but emit a call to a undefined __masked_store_blend_*() functions. Added implementations of these functions to the sse4.h and generic-16.h in examples/instrinsics. (Calls to these will never be generated with LLVM 3.1).	2012-01-17 23:42:22 -07:00
Matt Pharr	58a0b4a20d	Add separate set of builtins for AVX2. (i.e., stop just reusing the ones for AVX1). For now the only difference is that the int/uint min/max functions call the new intrinsic for that. Once gather is available from LLVM, that will go here as well.	2012-01-13 14:40:01 -08:00
Matt Pharr	652215861e	Update dynamic target dispatch code to support AVX2.	2012-01-12 08:37:18 -08:00
Pierre-Antoine Lacaze	d84cf781da	Mingw does not have sysconf, use the msc way of finding processors.	2012-01-09 09:45:40 +01:00
Matt Pharr	234e5cd3e1	Use vector select for masked store blend if building with LLVM3.1	2012-01-04 12:59:03 -08:00
Matt Pharr	f75c94a8f1	Have aos/soa and broadcast/shuffle/rotate functions provided by the target. The SSE/AVX targets use the old versions from util.m4, but these functions are now passed through to the generic targets.	2012-01-04 12:59:03 -08:00
Matt Pharr	848a432640	Fix various small things that were broken with single-bit-per-lane masks. Also small cleanups to declarations, "no captures" added, etc.	2012-01-04 12:59:03 -08:00
Matt Pharr	562d61caff	Added masked load optimization pass. This pass handles the "all on" and "all off" mask cases appropriately. Also renamed load_masked stuff in built-ins to masked_load for consistency with masked_store.	2012-01-04 11:51:26 -08:00
Matt Pharr	1d9201fe3d	Add "generic" 4, 8, and 16-wide targets. When used, these targets end up with calls to undefined functions for all of the various special vector stuff ispc needs to compile ispc programs (masked store, gather, min/max, sqrt, etc.). These targets are not yet useful for anything, but are a step toward having an option to C++ code with calls out to intrinsics. Reorganized the directory structure a bit and put the LLVM bitcode used to define target-specific stuff (as well as some generic built-ins stuff) into a builtins/ directory. Note that for building on Windows, it's now necessary to set a LLVM_VERSION environment variable (with values like LLVM_2_9, LLVM_3_0, LLVM_3_1svn, etc.)	2011-12-19 13:46:50 -08:00

27 Commits