aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Evghenii	4196c723eb	merged with nvptx	2014-02-20 11:01:58 +01:00
Evghenii	d3a6693eef	adding __have_native_{rsqrtd,rcpd} to select between native support for double precision reciprocals and using slower but safe version in stdlib	2014-02-04 16:29:23 +01:00
Evghenii	546f9cb409	MAJOR CHANGE--- STOP WITH THIS BRANCH--	2014-01-06 13:51:02 +01:00
Evghenii	2d8da306a1	merged with master	2013-12-25 21:32:34 +01:00
Dmitry Babokin	799e476b48	Bumping ISPC version to 1.6.1dev	2013-12-19 22:29:02 +04:00
Dmitry Babokin	040605a83c	Bumping up ispc version to 1.6.0	2013-12-19 21:17:42 +04:00
Evghenii	ddfe782151	merged	2013-12-13 11:56:43 +01:00
Dmitry Babokin	2d2d14744b	Fixing --opt=force-aligned-memory for LLVM 3.3+	2013-12-04 19:00:02 +04:00
evghenii	bb46b561fd	Merged with upstream/master	2013-11-22 08:13:16 +01:00
Ilia Filippov	3fd9d5a025	support of LLVM 3.5	2013-11-21 19:09:43 +04:00
Dmitry Babokin	e100040f28	Fix bug with fail when --target=avx1.1-i32x8,avx2-i32x8 - avx11 is not a valid target anymore, need more complete string	2013-11-14 15:37:11 +04:00
Dmitry Babokin	ffc9a33933	avx1-i32x4 implementation as sse4-i32x4 with avx target-feature flag	2013-11-14 15:34:30 +04:00
Evghenii	8db3d25844	moved PtxString to Globals	2013-10-30 21:05:22 +01:00
Evghenii	b31fc6f66d	now can generate both targets for npvtx64. m_isPTX is set true, to distuish when to either skip or exlcusive euse export	2013-10-29 14:17:11 +01:00
Evghenii	ac700d4860	checkpoint	2013-10-29 13:36:31 +01:00
egaburov	5d56d29240	merged with master	2013-10-08 19:13:30 +02:00
Dmitry Babokin	3b4cc90800	Changing ISPC to 1.5.dev	2013-09-28 01:32:00 +04:00
Dmitry Babokin	8a39af8f72	Release 1.5.0	2013-09-27 23:27:05 +04:00
james.brodman	8db378b265	Revert "Remove support for using SVML for math lib routines." This reverts commit `d9c38b5c1f`.	2013-09-04 16:01:58 -04:00
Matt Pharr	0c5742b6f8	Implement new naming scheme for --target. Now targets are named like "<isa>-i<mask size>x<gang size>", e.g. "sse4-i8x16", or "avx2-i32x16". The old target names are still supported.	2013-08-08 19:23:44 -07:00
Matt Pharr	cd9afe946c	Merge branch 'master' into arm Conflicts: Makefile builtins.cpp ispc.cpp ispc.h ispc.vcxproj opt.cpp	2013-08-06 17:39:21 -07:00
Matt Pharr	1276ea9844	Revert "Remove support for building with LLVM 3.1" This reverts commit `d3c567503b`. Conflicts: opt.cpp	2013-08-06 17:00:35 -07:00
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Ilia Filippov	a174a90f86	Supporting dumping, switching off and debug printing of optimization phases	2013-08-01 11:37:52 +04:00
Matt Pharr	d9c38b5c1f	Remove support for using SVML for math lib routines. This path was poorly maintained and wasn't actually available on most targets.	2013-07-31 06:56:48 -07:00
Matt Pharr	d3c567503b	Remove support for building with LLVM 3.1	2013-07-31 06:46:45 -07:00
Matt Pharr	ab3b633733	Add 8-bit and 16-bit specialized NEON targets. Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask elements, respectively, and thus should generate the best code when used for computation with datatypes of those sizes.	2013-07-30 08:44:16 -07:00
egaburov	67b549a937	Added nvptx64 target. Things to do: 1. builtins/target-nvptx64.ll to write, now it is just a copy of target-generic-1.ll 2. add __global__ & __device__ scope 2. make code work for a single cuda thread 3. use tasks to work as a block grid and programIndex as laneIdx, programCount as warpSize 4. ... and more...	2013-07-28 14:31:43 +02:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Dmitry Babokin	922895de69	Changing ISPC version to 1.4.5dev	2013-07-19 18:47:43 -07:00
Dmitry Babokin	28f0bce9f2	Release 1.4.4	2013-07-19 16:22:10 -07:00
Dmitry Babokin	594485c38c	Release 1.4.3	2013-06-25 18:38:21 +04:00
Dmitry Babokin	cf9ceb6bf9	Release 1.4.2, 11 June 2013	2013-06-11 17:18:54 +04:00
Dmitry Babokin	29ceb42b7b	Bumping version to 1.4.1dev	2013-05-28 19:58:27 +04:00
Dmitry Babokin	6c392ee4a1	Changes for 1.4.1 release	2013-05-28 19:46:30 +04:00
Dmitry Babokin	481bcc732b	Changes for 1.4.0 release	2013-05-27 16:48:41 +04:00
Dmitry Babokin	1a7ac8b804	Enable memory alignment management via compiler options	2013-05-24 10:29:01 +04:00
Dmitry Babokin	b6b9daa3c5	Enabling llvm 3.4	2013-05-13 19:25:31 +04:00
Dmitry Babokin	a0462fe1ee	#469 : Fix for multi-target compilation	2013-04-12 14:06:12 +04:00
Dmitry Babokin	0af2a13349	DataLayout is changed to be managed from single place. v4-128-128 is added to generic DataLayout	2013-03-23 14:38:51 +04:00
Dmitry Babokin	7f0c92eb4d	Fix for #431 : memory leak due to multiple TargetMachine creation	2013-03-23 14:33:45 +04:00
Dmitry Babokin	0f86255279	Target class redesign: data moved to private. Also empty target-feature attribute is not added anymore (generic targets).	2013-03-23 14:28:05 +04:00
Dmitry Babokin	3f8a678c5a	Editorial change: fixing trailing white spaces and tabs	2013-03-18 16:17:55 +04:00
Dmitry Babokin	524939dc5b	Fix for issue #430	2013-02-27 18:03:07 +04:00
Matt Pharr	0bf1320a32	Remove support for building with LLVM 3.0	2013-01-06 12:27:53 -08:00
Peng Tu	16b0806d40	Fix LLVM TOT build issue.	2012-11-21 19:09:10 -08:00
Matt Pharr	be2108260e	Add --opt=force-aligned-memory option. This forces all vector loads/stores to be done assuming that the given pointer is aligned to the vector size, thus allowing the use of sometimes more-efficient instructions. (If it isn't the case that the memory is aligned, the program will fail!).	2012-09-14 13:49:45 -07:00
Jean-Luc Duprat	09bb36f58c	Updated the task system in the example directory to support: Cilk (cilk_for), OpenMP (#pragma omp parallel for), TBB(tbb::task_group and tbb::parallel_for) as well as a new pthreads-based model that fully subscribes the machine (good for KNC). With major contributions from Ingo Wald and James Brodman.	2012-08-28 11:13:12 -07:00
Matt Pharr	19d8f2e258	Generate FMA instructions with AVX2 (when possible). Issue #320.	2012-08-03 10:43:41 -07:00
Matt Pharr	10b79fb41b	Add support for non-factored variants of gather/scatter functions. We now have two ways of approaching gather/scatters with a common base pointer and with offset vectors. For targets with native gather/scatter, we just turn those into base + {1/2/4/8}offsets. For targets without, we turn those into base + {1/2/4/8}varying_offsets + const_offsets, where const_offsets is a compile-time constant. Infrastructure for issue #325.	2012-07-11 14:29:42 -07:00

1 2 3

111 Commits