aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Dmitry Babokin	ce99b17616	Fix for Windows buils to include new target: avx-i64x4	2013-09-14 02:00:23 +04:00
Evghenii	9861375f0c	renamed avx-i64x4 -> avx1-i64x4	2013-09-13 15:07:14 +02:00
Evghenii	40af8d6ed5	fixed segfault in tests/launch-*.ispc. nativeVectoWidth in avx-i64x4 was set to 4. Fixed	2013-09-12 20:25:44 +02:00
egaburov	7364e06387	added mask64	2013-09-12 12:02:42 +02:00
egaburov	9c79d4d182	addded avxh with vectorWidth=4 support, use --target=avxh to enable it	2013-09-11 12:58:02 +02:00
james.brodman	28080b0c22	Fix build against 3.4	2013-08-27 16:56:00 -04:00
james.brodman	be3a40e70b	Fix for 3.4	2013-08-27 15:15:16 -04:00
Matt Pharr	0c5742b6f8	Implement new naming scheme for --target. Now targets are named like "<isa>-i<mask size>x<gang size>", e.g. "sse4-i8x16", or "avx2-i32x16". The old target names are still supported.	2013-08-08 19:23:44 -07:00
Matt Pharr	cd9afe946c	Merge branch 'master' into arm Conflicts: Makefile builtins.cpp ispc.cpp ispc.h ispc.vcxproj opt.cpp	2013-08-06 17:39:21 -07:00
Matt Pharr	1276ea9844	Revert "Remove support for building with LLVM 3.1" This reverts commit `d3c567503b`. Conflicts: opt.cpp	2013-08-06 17:00:35 -07:00
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Ilia Filippov	a174a90f86	Supporting dumping, switching off and debug printing of optimization phases	2013-08-01 11:37:52 +04:00
Matt Pharr	4f48d3258a	Documentation updates for NEON	2013-07-31 20:06:04 -07:00
Matt Pharr	d3c567503b	Remove support for building with LLVM 3.1	2013-07-31 06:46:45 -07:00
Matt Pharr	ab3b633733	Add 8-bit and 16-bit specialized NEON targets. Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask elements, respectively, and thus should generate the best code when used for computation with datatypes of those sizes.	2013-07-30 08:44:16 -07:00
Matt Pharr	780b0dfe47	Add SSE4-16 target. Along the lines of sse4-8, this is an 8-wide target for SSE4, using 16-bit elements for the mask. It's thus (in principle) the best target for SIMD computation with 16-bit datatypes.	2013-07-25 09:46:01 -07:00
Matt Pharr	53414f12e6	Add SSE4 target optimized for computation with 8-bit datatypes. This change adds a new 'sse4-8' target, where programCount is 16 and the mask element size is 8-bits. (i.e. the most appropriate sizing of the mask for SIMD computation with 8-bit datatypes.)	2013-07-23 17:30:32 -07:00
Matt Pharr	068fd8098c	Explicitly set armv7-eabi target triple on ARM. This lets the compiler generate FMA instructions, which seems desirable.	2013-07-20 11:19:10 -07:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Dmitry Babokin	2267f278d2	Fix for #503 - avoid omitting frame pointer on Win32	2013-06-04 14:51:36 +04:00
Dmitry Babokin	1a7ac8b804	Enable memory alignment management via compiler options	2013-05-24 10:29:01 +04:00
Dmitry Babokin	b6b9daa3c5	Enabling llvm 3.4	2013-05-13 19:25:31 +04:00
Dmitry Babokin	95950885cf	Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS.	2013-04-26 20:33:24 +04:00
Dmitry Babokin	a0462fe1ee	#469 : Fix for multi-target compilation	2013-04-12 14:06:12 +04:00
Dmitry Babokin	0af2a13349	DataLayout is changed to be managed from single place. v4-128-128 is added to generic DataLayout	2013-03-23 14:38:51 +04:00
Dmitry Babokin	7f0c92eb4d	Fix for #431 : memory leak due to multiple TargetMachine creation	2013-03-23 14:33:45 +04:00
Dmitry Babokin	0f86255279	Target class redesign: data moved to private. Also empty target-feature attribute is not added anymore (generic targets).	2013-03-23 14:28:05 +04:00
Dmitry Babokin	3f8a678c5a	Editorial change: fixing trailing white spaces and tabs	2013-03-18 16:17:55 +04:00
Dmitry Babokin	524939dc5b	Fix for issue #430	2013-02-27 18:03:07 +04:00
Matt Pharr	0bf1320a32	Remove support for building with LLVM 3.0	2013-01-06 12:27:53 -08:00
Matt Pharr	63dd7d9859	Fix build to work with LLVM top-of-tree again	2013-01-06 12:02:08 -08:00
Matt Pharr	8cbfde6092	Small fixes to build with LLVM top-of-tree (now numbered as version 3.3)	2012-12-02 14:29:24 -08:00
Matt Pharr	172a189c6f	Fix build with LLVM top-of-tree	2012-10-17 11:11:50 -07:00
Matt Pharr	be2108260e	Add --opt=force-aligned-memory option. This forces all vector loads/stores to be done assuming that the given pointer is aligned to the vector size, thus allowing the use of sometimes more-efficient instructions. (If it isn't the case that the memory is aligned, the program will fail!).	2012-09-14 13:49:45 -07:00
Matt Pharr	19d8f2e258	Generate FMA instructions with AVX2 (when possible). Issue #320.	2012-08-03 10:43:41 -07:00
Matt Pharr	0bb4d282e2	Add sys/types.h include for linux/osx.	2012-07-23 08:32:41 -07:00
Matt Pharr	e9fe9f5043	Add cpu strings for Ivy Bridge and HSW. Default to avx2 ISA for HSW CPUs.	2012-07-23 08:24:18 -07:00
Matt Pharr	51210a869b	Support core-avx-i and core-avx2 CPU types. (And map them to avx1.1 and avx2 targets, respectively.)	2012-07-19 10:15:59 -07:00
Matt Pharr	6a410fc30e	Emit gather instructions for the AVX2 targets. Issue #308.	2012-07-13 12:29:05 -07:00
Matt Pharr	98b2e0e426	Fixes for intrinsics unsupported in earlier LLVM versions. Specifically, don't use the half/float conversion routines with LLVM 3.0, and don't try to use RDRAND with anything before LLVM 3.2.	2012-07-13 12:14:10 -07:00
Matt Pharr	371d4be8ef	Fix bugs in detection of Ivy Bridge systems. We were incorrectly characterizing them as basic AVX1 without further extensions, due to a bug in the logic to check CPU features.	2012-07-12 14:11:15 -07:00
Matt Pharr	216ac4b1a4	Stop factoring out constant offsets for gather/scatter if instr is available. For KNC (gather/scatter), it's not helpful to factor base+offsets gathers and scatters into base_ptr + {1/2/4/8} * varying_offsets + const_offsets. Now, if a HW instruction is available for gather/scatter, we just factor into base + {1/2/4/8} * offsets (if possible). Not only is this simpler, but it's also what we need to pass a value along to the scale by 2/4/8 available directly in those instructions. Finishes issue #325.	2012-07-11 14:52:29 -07:00
Matt Pharr	10b79fb41b	Add support for non-factored variants of gather/scatter functions. We now have two ways of approaching gather/scatters with a common base pointer and with offset vectors. For targets with native gather/scatter, we just turn those into base + {1/2/4/8}offsets. For targets without, we turn those into base + {1/2/4/8}varying_offsets + const_offsets, where const_offsets is a compile-time constant. Infrastructure for issue #325.	2012-07-11 14:29:42 -07:00
Matt Pharr	aabbdba068	Switch a few remaining fprintf() calls to use Warning()/Error().	2012-07-06 12:56:45 -07:00
Matt Pharr	4186ef204d	Fix build with LLVM top of tree.	2012-07-05 13:35:01 -07:00
Nicolas Trangez	3a007f939a	Build: Include unistd.h where required Some modules require an include of unistd.h (e.g. for getcwd and isatty definitions). These changes were required to build successfully on a Fedora 17 system, using GCC 4.7.0 & glibc-headers 2.15.	2012-07-04 14:49:00 +02:00
Matt Pharr	f38770bf2a	Fix build with LLVM ToT	2012-06-28 07:36:10 -07:00
Matt Pharr	40a295e951	Fix bug where "avx-x2" target would cause AVX1.1 to be used.	2012-06-12 13:37:38 -07:00
Matt Pharr	6c7df4cb6b	Add initial support for "avx1.1" targets for Ivy Bridge. So far, only the use of the float/half conversion instructions distinguishes this from the "avx1" target. Partial work on issue #263.	2012-06-08 15:55:00 -07:00
Matt Pharr	1397dbdabc	Don't generate colorized output escapes when stderr isn't a TTY. When piping to a pile, more/less, etc, this is generally undesirable. This behavior can be overridden with the --colorized-output command-line flag.	2012-06-04 09:20:57 -07:00

1 2 3

111 Commits