aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Ilia Filippov	a174a90f86	Supporting dumping, switching off and debug printing of optimization phases	2013-08-01 11:37:52 +04:00
Matt Pharr	068fd8098c	Explicitly set armv7-eabi target triple on ARM. This lets the compiler generate FMA instructions, which seems desirable.	2013-07-20 11:19:10 -07:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Dmitry Babokin	2267f278d2	Fix for #503 - avoid omitting frame pointer on Win32	2013-06-04 14:51:36 +04:00
Dmitry Babokin	1a7ac8b804	Enable memory alignment management via compiler options	2013-05-24 10:29:01 +04:00
Dmitry Babokin	b6b9daa3c5	Enabling llvm 3.4	2013-05-13 19:25:31 +04:00
Dmitry Babokin	95950885cf	Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS.	2013-04-26 20:33:24 +04:00
Dmitry Babokin	a0462fe1ee	#469 : Fix for multi-target compilation	2013-04-12 14:06:12 +04:00
Dmitry Babokin	0af2a13349	DataLayout is changed to be managed from single place. v4-128-128 is added to generic DataLayout	2013-03-23 14:38:51 +04:00
Dmitry Babokin	7f0c92eb4d	Fix for #431 : memory leak due to multiple TargetMachine creation	2013-03-23 14:33:45 +04:00
Dmitry Babokin	0f86255279	Target class redesign: data moved to private. Also empty target-feature attribute is not added anymore (generic targets).	2013-03-23 14:28:05 +04:00
Dmitry Babokin	3f8a678c5a	Editorial change: fixing trailing white spaces and tabs	2013-03-18 16:17:55 +04:00
Dmitry Babokin	524939dc5b	Fix for issue #430	2013-02-27 18:03:07 +04:00
Matt Pharr	0bf1320a32	Remove support for building with LLVM 3.0	2013-01-06 12:27:53 -08:00
Matt Pharr	63dd7d9859	Fix build to work with LLVM top-of-tree again	2013-01-06 12:02:08 -08:00
Matt Pharr	8cbfde6092	Small fixes to build with LLVM top-of-tree (now numbered as version 3.3)	2012-12-02 14:29:24 -08:00
Matt Pharr	172a189c6f	Fix build with LLVM top-of-tree	2012-10-17 11:11:50 -07:00
Matt Pharr	be2108260e	Add --opt=force-aligned-memory option. This forces all vector loads/stores to be done assuming that the given pointer is aligned to the vector size, thus allowing the use of sometimes more-efficient instructions. (If it isn't the case that the memory is aligned, the program will fail!).	2012-09-14 13:49:45 -07:00
Matt Pharr	19d8f2e258	Generate FMA instructions with AVX2 (when possible). Issue #320.	2012-08-03 10:43:41 -07:00
Matt Pharr	0bb4d282e2	Add sys/types.h include for linux/osx.	2012-07-23 08:32:41 -07:00
Matt Pharr	e9fe9f5043	Add cpu strings for Ivy Bridge and HSW. Default to avx2 ISA for HSW CPUs.	2012-07-23 08:24:18 -07:00
Matt Pharr	51210a869b	Support core-avx-i and core-avx2 CPU types. (And map them to avx1.1 and avx2 targets, respectively.)	2012-07-19 10:15:59 -07:00
Matt Pharr	6a410fc30e	Emit gather instructions for the AVX2 targets. Issue #308.	2012-07-13 12:29:05 -07:00
Matt Pharr	98b2e0e426	Fixes for intrinsics unsupported in earlier LLVM versions. Specifically, don't use the half/float conversion routines with LLVM 3.0, and don't try to use RDRAND with anything before LLVM 3.2.	2012-07-13 12:14:10 -07:00
Matt Pharr	371d4be8ef	Fix bugs in detection of Ivy Bridge systems. We were incorrectly characterizing them as basic AVX1 without further extensions, due to a bug in the logic to check CPU features.	2012-07-12 14:11:15 -07:00
Matt Pharr	216ac4b1a4	Stop factoring out constant offsets for gather/scatter if instr is available. For KNC (gather/scatter), it's not helpful to factor base+offsets gathers and scatters into base_ptr + {1/2/4/8} * varying_offsets + const_offsets. Now, if a HW instruction is available for gather/scatter, we just factor into base + {1/2/4/8} * offsets (if possible). Not only is this simpler, but it's also what we need to pass a value along to the scale by 2/4/8 available directly in those instructions. Finishes issue #325.	2012-07-11 14:52:29 -07:00
Matt Pharr	10b79fb41b	Add support for non-factored variants of gather/scatter functions. We now have two ways of approaching gather/scatters with a common base pointer and with offset vectors. For targets with native gather/scatter, we just turn those into base + {1/2/4/8}offsets. For targets without, we turn those into base + {1/2/4/8}varying_offsets + const_offsets, where const_offsets is a compile-time constant. Infrastructure for issue #325.	2012-07-11 14:29:42 -07:00
Matt Pharr	aabbdba068	Switch a few remaining fprintf() calls to use Warning()/Error().	2012-07-06 12:56:45 -07:00
Matt Pharr	4186ef204d	Fix build with LLVM top of tree.	2012-07-05 13:35:01 -07:00
Nicolas Trangez	3a007f939a	Build: Include unistd.h where required Some modules require an include of unistd.h (e.g. for getcwd and isatty definitions). These changes were required to build successfully on a Fedora 17 system, using GCC 4.7.0 & glibc-headers 2.15.	2012-07-04 14:49:00 +02:00
Matt Pharr	f38770bf2a	Fix build with LLVM ToT	2012-06-28 07:36:10 -07:00
Matt Pharr	40a295e951	Fix bug where "avx-x2" target would cause AVX1.1 to be used.	2012-06-12 13:37:38 -07:00
Matt Pharr	6c7df4cb6b	Add initial support for "avx1.1" targets for Ivy Bridge. So far, only the use of the float/half conversion instructions distinguishes this from the "avx1" target. Partial work on issue #263.	2012-06-08 15:55:00 -07:00
Matt Pharr	1397dbdabc	Don't generate colorized output escapes when stderr isn't a TTY. When piping to a pile, more/less, etc, this is generally undesirable. This behavior can be overridden with the --colorized-output command-line flag.	2012-06-04 09:20:57 -07:00
Matt Pharr	449d956966	Add support for generic-64 target.	2012-05-25 11:57:28 -07:00
Matt Pharr	72c41f104e	Fix various malformed program crashes.	2012-05-18 10:44:45 -07:00
Matt Pharr	299ae186f1	Expect support for half and transcendentals from all generic targets	2012-05-18 06:13:45 -07:00
Matt Pharr	fbed0ac56b	Remove allOffMaskIsSafe from Target The intent of this was to indicate whether it was safe to run code with an 'all of' mask on the given target (and then sometimes be more flexible about e.g. running both true and false blocks of if statements, etc.) The problem is that even if the architecture has full native mask support, it's still not safe to run 'uniform' memory operations with the mask all off. Even more tricky, we sometimes transform masked varying memory operations to uniform ones during optimization (e.g. gather->load and broadcast). This fixes a number of the tests/switch-* tests that were failing on the generic targets due to this issue.	2012-05-09 14:18:47 -07:00
Matt Pharr	0c1b206185	Pass log/exp/pow transcendentals through to targets that support them. Currently, this is the generic targets.	2012-05-03 13:49:56 -07:00
Matt Pharr	d99bd279e8	Add generic-32 target.	2012-05-03 11:11:06 -07:00
Matt Pharr	ee1fe3aa9f	Update build to handle existence of LLVM 3.2 dev branch. We now compile with LLVM 3.0, 3.1, and 3.2svn.	2012-05-03 08:25:25 -07:00
Matt Pharr	d5cc2ad643	Call Verify() methods of various debugging llvm::DI* types after creation.	2012-04-25 08:43:11 -10:00
Matt Pharr	fefa86e0cf	Remove LLVM_TYPE_CONST #define / usage. Now with LLVM 3.0 and beyond, types aren't const.	2012-04-15 20:11:27 -07:00
Matt Pharr	972043c146	Fix serious bug in handling constant-valued initializers. In InitSymbol(), we try to be smart and emit a memcpy when there are a number of values to store (e.g. for arrays, structs, etc.) Unfortunately, this wasn't working as desired for bools (i.e. i1 types), since the SizeOf() call that tried to figure out how many bytes to copy would return 0 bytes, due to dividing the number of bits to copy by 8. Fixes issue #234.	2012-04-09 14:23:08 -07:00
Matt Pharr	b813452d33	Don't issue a slew of warnings if a bogus cpu type is specified. Issue #221.	2012-04-03 06:13:28 -07:00
Matt Pharr	560bf5ca09	Updated logic for selecting target ISA when not specified. Now, if the user specified a CPU then we base the ISA choice on that--only if no CPU and no target is specified do we use the CPUID-based check to pick a vector ISA. Improvement to fix to #205.	2012-03-30 16:36:12 -07:00
Matt Pharr	b3c5043dcc	Don't enable llvm's UnsafeFPMath option when --opt=fast-math is supplied. This was causing functions like round() to fail on SSE2, since it has code that does: x += 0x1.0p23f; x -= 0x1.0p23f; which was in turn being undesirably optimized away. Fixes issue #211.	2012-03-28 10:26:39 -07:00
Matt Pharr	3270e2bf5a	Call CPUID to more reliably detect level of SSE/AVX that the host supports. Fixes, I hope, issue #205.	2012-03-28 09:20:06 -07:00
Matt Pharr	73bf552cd6	Add support for coalescing memory accesses from gathers. There are two related optimizations that happen now. (These currently only apply for gathers where the mask is known to be all on, and to gathers that are accessing 32-bit sized elements, but both of these may be generalized in the future.) First, for any single gather, we are now more flexible in mapping it to individual memory operations. Previously, we would only either map it to a general gather (one scalar load per SIMD lane), or an unaligned vector load (if the program instances could be determined to be accessing a sequential set of locations in memory.) Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit), 4-wide, or 8-wide loads. Further, we now generate code that shuffles these loads around. Doing fewer, larger loads in this manner, when possible, can be more efficient. Second, we can coalesce memory accesses across multiple gathers. If we have a series of gathers without any memory writes in the middle, then we try to analyze their reads collectively and choose an efficient set of loads for them. Not only does this help if different gathers reuse values from the same location in memory, but it's specifically helpful when data with AOS layout is being accessed; in this case, we're often able to generate wide vector loads and appropriate shuffles automatically.	2012-02-10 13:10:39 -08:00

1 2

96 Commits