aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
james.brodman	28080b0c22	Fix build against 3.4	2013-08-27 16:56:00 -04:00
james.brodman	be3a40e70b	Fix for 3.4	2013-08-27 15:15:16 -04:00
Matt Pharr	0c5742b6f8	Implement new naming scheme for --target. Now targets are named like "<isa>-i<mask size>x<gang size>", e.g. "sse4-i8x16", or "avx2-i32x16". The old target names are still supported.	2013-08-08 19:23:44 -07:00
Matt Pharr	cd9afe946c	Merge branch 'master' into arm Conflicts: Makefile builtins.cpp ispc.cpp ispc.h ispc.vcxproj opt.cpp	2013-08-06 17:39:21 -07:00
Matt Pharr	1276ea9844	Revert "Remove support for building with LLVM 3.1" This reverts commit `d3c567503b`. Conflicts: opt.cpp	2013-08-06 17:00:35 -07:00
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Ilia Filippov	a174a90f86	Supporting dumping, switching off and debug printing of optimization phases	2013-08-01 11:37:52 +04:00
Matt Pharr	4f48d3258a	Documentation updates for NEON	2013-07-31 20:06:04 -07:00
Matt Pharr	d3c567503b	Remove support for building with LLVM 3.1	2013-07-31 06:46:45 -07:00
Matt Pharr	ab3b633733	Add 8-bit and 16-bit specialized NEON targets. Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask elements, respectively, and thus should generate the best code when used for computation with datatypes of those sizes.	2013-07-30 08:44:16 -07:00
Matt Pharr	780b0dfe47	Add SSE4-16 target. Along the lines of sse4-8, this is an 8-wide target for SSE4, using 16-bit elements for the mask. It's thus (in principle) the best target for SIMD computation with 16-bit datatypes.	2013-07-25 09:46:01 -07:00
Matt Pharr	53414f12e6	Add SSE4 target optimized for computation with 8-bit datatypes. This change adds a new 'sse4-8' target, where programCount is 16 and the mask element size is 8-bits. (i.e. the most appropriate sizing of the mask for SIMD computation with 8-bit datatypes.)	2013-07-23 17:30:32 -07:00
Matt Pharr	068fd8098c	Explicitly set armv7-eabi target triple on ARM. This lets the compiler generate FMA instructions, which seems desirable.	2013-07-20 11:19:10 -07:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Dmitry Babokin	2267f278d2	Fix for #503 - avoid omitting frame pointer on Win32	2013-06-04 14:51:36 +04:00
Dmitry Babokin	1a7ac8b804	Enable memory alignment management via compiler options	2013-05-24 10:29:01 +04:00
Dmitry Babokin	b6b9daa3c5	Enabling llvm 3.4	2013-05-13 19:25:31 +04:00
Dmitry Babokin	95950885cf	Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS.	2013-04-26 20:33:24 +04:00
Dmitry Babokin	a0462fe1ee	#469 : Fix for multi-target compilation	2013-04-12 14:06:12 +04:00
Dmitry Babokin	0af2a13349	DataLayout is changed to be managed from single place. v4-128-128 is added to generic DataLayout	2013-03-23 14:38:51 +04:00
Dmitry Babokin	7f0c92eb4d	Fix for #431 : memory leak due to multiple TargetMachine creation	2013-03-23 14:33:45 +04:00
Dmitry Babokin	0f86255279	Target class redesign: data moved to private. Also empty target-feature attribute is not added anymore (generic targets).	2013-03-23 14:28:05 +04:00
Dmitry Babokin	3f8a678c5a	Editorial change: fixing trailing white spaces and tabs	2013-03-18 16:17:55 +04:00
Dmitry Babokin	524939dc5b	Fix for issue #430	2013-02-27 18:03:07 +04:00
Matt Pharr	0bf1320a32	Remove support for building with LLVM 3.0	2013-01-06 12:27:53 -08:00
Matt Pharr	63dd7d9859	Fix build to work with LLVM top-of-tree again	2013-01-06 12:02:08 -08:00
Matt Pharr	8cbfde6092	Small fixes to build with LLVM top-of-tree (now numbered as version 3.3)	2012-12-02 14:29:24 -08:00
Matt Pharr	172a189c6f	Fix build with LLVM top-of-tree	2012-10-17 11:11:50 -07:00
Matt Pharr	be2108260e	Add --opt=force-aligned-memory option. This forces all vector loads/stores to be done assuming that the given pointer is aligned to the vector size, thus allowing the use of sometimes more-efficient instructions. (If it isn't the case that the memory is aligned, the program will fail!).	2012-09-14 13:49:45 -07:00
Matt Pharr	19d8f2e258	Generate FMA instructions with AVX2 (when possible). Issue #320.	2012-08-03 10:43:41 -07:00
Matt Pharr	0bb4d282e2	Add sys/types.h include for linux/osx.	2012-07-23 08:32:41 -07:00
Matt Pharr	e9fe9f5043	Add cpu strings for Ivy Bridge and HSW. Default to avx2 ISA for HSW CPUs.	2012-07-23 08:24:18 -07:00
Matt Pharr	51210a869b	Support core-avx-i and core-avx2 CPU types. (And map them to avx1.1 and avx2 targets, respectively.)	2012-07-19 10:15:59 -07:00
Matt Pharr	6a410fc30e	Emit gather instructions for the AVX2 targets. Issue #308.	2012-07-13 12:29:05 -07:00
Matt Pharr	98b2e0e426	Fixes for intrinsics unsupported in earlier LLVM versions. Specifically, don't use the half/float conversion routines with LLVM 3.0, and don't try to use RDRAND with anything before LLVM 3.2.	2012-07-13 12:14:10 -07:00
Matt Pharr	371d4be8ef	Fix bugs in detection of Ivy Bridge systems. We were incorrectly characterizing them as basic AVX1 without further extensions, due to a bug in the logic to check CPU features.	2012-07-12 14:11:15 -07:00
Matt Pharr	216ac4b1a4	Stop factoring out constant offsets for gather/scatter if instr is available. For KNC (gather/scatter), it's not helpful to factor base+offsets gathers and scatters into base_ptr + {1/2/4/8} * varying_offsets + const_offsets. Now, if a HW instruction is available for gather/scatter, we just factor into base + {1/2/4/8} * offsets (if possible). Not only is this simpler, but it's also what we need to pass a value along to the scale by 2/4/8 available directly in those instructions. Finishes issue #325.	2012-07-11 14:52:29 -07:00
Matt Pharr	10b79fb41b	Add support for non-factored variants of gather/scatter functions. We now have two ways of approaching gather/scatters with a common base pointer and with offset vectors. For targets with native gather/scatter, we just turn those into base + {1/2/4/8}offsets. For targets without, we turn those into base + {1/2/4/8}varying_offsets + const_offsets, where const_offsets is a compile-time constant. Infrastructure for issue #325.	2012-07-11 14:29:42 -07:00
Matt Pharr	aabbdba068	Switch a few remaining fprintf() calls to use Warning()/Error().	2012-07-06 12:56:45 -07:00
Matt Pharr	4186ef204d	Fix build with LLVM top of tree.	2012-07-05 13:35:01 -07:00
Nicolas Trangez	3a007f939a	Build: Include unistd.h where required Some modules require an include of unistd.h (e.g. for getcwd and isatty definitions). These changes were required to build successfully on a Fedora 17 system, using GCC 4.7.0 & glibc-headers 2.15.	2012-07-04 14:49:00 +02:00
Matt Pharr	f38770bf2a	Fix build with LLVM ToT	2012-06-28 07:36:10 -07:00
Matt Pharr	40a295e951	Fix bug where "avx-x2" target would cause AVX1.1 to be used.	2012-06-12 13:37:38 -07:00
Matt Pharr	6c7df4cb6b	Add initial support for "avx1.1" targets for Ivy Bridge. So far, only the use of the float/half conversion instructions distinguishes this from the "avx1" target. Partial work on issue #263.	2012-06-08 15:55:00 -07:00
Matt Pharr	1397dbdabc	Don't generate colorized output escapes when stderr isn't a TTY. When piping to a pile, more/less, etc, this is generally undesirable. This behavior can be overridden with the --colorized-output command-line flag.	2012-06-04 09:20:57 -07:00
Matt Pharr	449d956966	Add support for generic-64 target.	2012-05-25 11:57:28 -07:00
Matt Pharr	72c41f104e	Fix various malformed program crashes.	2012-05-18 10:44:45 -07:00
Matt Pharr	299ae186f1	Expect support for half and transcendentals from all generic targets	2012-05-18 06:13:45 -07:00
Matt Pharr	fbed0ac56b	Remove allOffMaskIsSafe from Target The intent of this was to indicate whether it was safe to run code with an 'all of' mask on the given target (and then sometimes be more flexible about e.g. running both true and false blocks of if statements, etc.) The problem is that even if the architecture has full native mask support, it's still not safe to run 'uniform' memory operations with the mask all off. Even more tricky, we sometimes transform masked varying memory operations to uniform ones during optimization (e.g. gather->load and broadcast). This fixes a number of the tests/switch-* tests that were failing on the generic targets due to this issue.	2012-05-09 14:18:47 -07:00
Matt Pharr	0c1b206185	Pass log/exp/pow transcendentals through to targets that support them. Currently, this is the generic targets.	2012-05-03 13:49:56 -07:00

1 2 3

106 Commits