aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Evghenii	546f9cb409	MAJOR CHANGE--- STOP WITH THIS BRANCH--	2014-01-06 13:51:02 +01:00
Evghenii	2d8da306a1	merged with master	2013-12-25 21:32:34 +01:00
Dmitry Babokin	799e476b48	Bumping ISPC version to 1.6.1dev	2013-12-19 22:29:02 +04:00
Dmitry Babokin	040605a83c	Bumping up ispc version to 1.6.0	2013-12-19 21:17:42 +04:00
Evghenii	ddfe782151	merged	2013-12-13 11:56:43 +01:00
Dmitry Babokin	2d2d14744b	Fixing --opt=force-aligned-memory for LLVM 3.3+	2013-12-04 19:00:02 +04:00
evghenii	bb46b561fd	Merged with upstream/master	2013-11-22 08:13:16 +01:00
Ilia Filippov	3fd9d5a025	support of LLVM 3.5	2013-11-21 19:09:43 +04:00
Dmitry Babokin	e100040f28	Fix bug with fail when --target=avx1.1-i32x8,avx2-i32x8 - avx11 is not a valid target anymore, need more complete string	2013-11-14 15:37:11 +04:00
Dmitry Babokin	ffc9a33933	avx1-i32x4 implementation as sse4-i32x4 with avx target-feature flag	2013-11-14 15:34:30 +04:00
Evghenii	8db3d25844	moved PtxString to Globals	2013-10-30 21:05:22 +01:00
Evghenii	b31fc6f66d	now can generate both targets for npvtx64. m_isPTX is set true, to distuish when to either skip or exlcusive euse export	2013-10-29 14:17:11 +01:00
Evghenii	ac700d4860	checkpoint	2013-10-29 13:36:31 +01:00
egaburov	5d56d29240	merged with master	2013-10-08 19:13:30 +02:00
Dmitry Babokin	3b4cc90800	Changing ISPC to 1.5.dev	2013-09-28 01:32:00 +04:00
Dmitry Babokin	8a39af8f72	Release 1.5.0	2013-09-27 23:27:05 +04:00
james.brodman	8db378b265	Revert "Remove support for using SVML for math lib routines." This reverts commit `d9c38b5c1f`.	2013-09-04 16:01:58 -04:00
Matt Pharr	0c5742b6f8	Implement new naming scheme for --target. Now targets are named like "<isa>-i<mask size>x<gang size>", e.g. "sse4-i8x16", or "avx2-i32x16". The old target names are still supported.	2013-08-08 19:23:44 -07:00
Matt Pharr	cd9afe946c	Merge branch 'master' into arm Conflicts: Makefile builtins.cpp ispc.cpp ispc.h ispc.vcxproj opt.cpp	2013-08-06 17:39:21 -07:00
Matt Pharr	1276ea9844	Revert "Remove support for building with LLVM 3.1" This reverts commit `d3c567503b`. Conflicts: opt.cpp	2013-08-06 17:00:35 -07:00
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Ilia Filippov	a174a90f86	Supporting dumping, switching off and debug printing of optimization phases	2013-08-01 11:37:52 +04:00
Matt Pharr	d9c38b5c1f	Remove support for using SVML for math lib routines. This path was poorly maintained and wasn't actually available on most targets.	2013-07-31 06:56:48 -07:00
Matt Pharr	d3c567503b	Remove support for building with LLVM 3.1	2013-07-31 06:46:45 -07:00
Matt Pharr	ab3b633733	Add 8-bit and 16-bit specialized NEON targets. Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask elements, respectively, and thus should generate the best code when used for computation with datatypes of those sizes.	2013-07-30 08:44:16 -07:00
egaburov	67b549a937	Added nvptx64 target. Things to do: 1. builtins/target-nvptx64.ll to write, now it is just a copy of target-generic-1.ll 2. add __global__ & __device__ scope 2. make code work for a single cuda thread 3. use tasks to work as a block grid and programIndex as laneIdx, programCount as warpSize 4. ... and more...	2013-07-28 14:31:43 +02:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Dmitry Babokin	922895de69	Changing ISPC version to 1.4.5dev	2013-07-19 18:47:43 -07:00
Dmitry Babokin	28f0bce9f2	Release 1.4.4	2013-07-19 16:22:10 -07:00
Dmitry Babokin	594485c38c	Release 1.4.3	2013-06-25 18:38:21 +04:00
Dmitry Babokin	cf9ceb6bf9	Release 1.4.2, 11 June 2013	2013-06-11 17:18:54 +04:00
Dmitry Babokin	29ceb42b7b	Bumping version to 1.4.1dev	2013-05-28 19:58:27 +04:00
Dmitry Babokin	6c392ee4a1	Changes for 1.4.1 release	2013-05-28 19:46:30 +04:00
Dmitry Babokin	481bcc732b	Changes for 1.4.0 release	2013-05-27 16:48:41 +04:00
Dmitry Babokin	1a7ac8b804	Enable memory alignment management via compiler options	2013-05-24 10:29:01 +04:00
Dmitry Babokin	b6b9daa3c5	Enabling llvm 3.4	2013-05-13 19:25:31 +04:00
Dmitry Babokin	a0462fe1ee	#469 : Fix for multi-target compilation	2013-04-12 14:06:12 +04:00
Dmitry Babokin	0af2a13349	DataLayout is changed to be managed from single place. v4-128-128 is added to generic DataLayout	2013-03-23 14:38:51 +04:00
Dmitry Babokin	7f0c92eb4d	Fix for #431 : memory leak due to multiple TargetMachine creation	2013-03-23 14:33:45 +04:00
Dmitry Babokin	0f86255279	Target class redesign: data moved to private. Also empty target-feature attribute is not added anymore (generic targets).	2013-03-23 14:28:05 +04:00
Dmitry Babokin	3f8a678c5a	Editorial change: fixing trailing white spaces and tabs	2013-03-18 16:17:55 +04:00
Dmitry Babokin	524939dc5b	Fix for issue #430	2013-02-27 18:03:07 +04:00
Matt Pharr	0bf1320a32	Remove support for building with LLVM 3.0	2013-01-06 12:27:53 -08:00
Peng Tu	16b0806d40	Fix LLVM TOT build issue.	2012-11-21 19:09:10 -08:00
Matt Pharr	be2108260e	Add --opt=force-aligned-memory option. This forces all vector loads/stores to be done assuming that the given pointer is aligned to the vector size, thus allowing the use of sometimes more-efficient instructions. (If it isn't the case that the memory is aligned, the program will fail!).	2012-09-14 13:49:45 -07:00
Jean-Luc Duprat	09bb36f58c	Updated the task system in the example directory to support: Cilk (cilk_for), OpenMP (#pragma omp parallel for), TBB(tbb::task_group and tbb::parallel_for) as well as a new pthreads-based model that fully subscribes the machine (good for KNC). With major contributions from Ingo Wald and James Brodman.	2012-08-28 11:13:12 -07:00
Matt Pharr	19d8f2e258	Generate FMA instructions with AVX2 (when possible). Issue #320.	2012-08-03 10:43:41 -07:00
Matt Pharr	10b79fb41b	Add support for non-factored variants of gather/scatter functions. We now have two ways of approaching gather/scatters with a common base pointer and with offset vectors. For targets with native gather/scatter, we just turn those into base + {1/2/4/8}offsets. For targets without, we turn those into base + {1/2/4/8}varying_offsets + const_offsets, where const_offsets is a compile-time constant. Infrastructure for issue #325.	2012-07-11 14:29:42 -07:00
Matt Pharr	2d8026625b	Always check the execution mask after break/continue/return. When "break", "continue", or "return" is used under varying control flow, we now always check the execution mask to see if all of the program instances are executing it. (Previously, this was only done with "cbreak", "ccontinue", and "creturn", which are now deprecated.) An important effect of this change is that it fixes a family of cases where we could end up running with an "all off" execution mask, which isn't supposed to happen, as it leads to all sorts of invalid behavior. This change does cause the volume rendering example to run 9% slower, but doesn't affect the other examples. Issue #257.	2012-07-06 11:09:11 -07:00
Matt Pharr	6aad4c7a39	Bump version number to 1.3.1dev	2012-07-05 13:35:34 -07:00

1 2 3

109 Commits