aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Evghenii	6fae459847	a+1	2013-11-04 10:22:05 +01:00
Evghenii	f9ec1a0097	.. work in programm to embed PTX into host code ..	2013-10-30 16:47:30 +01:00
egaburov	60881499dc	Merge branch 'nvptx' of github.com:egaburov/ispc into nvptx	2013-10-29 15:25:14 +01:00
egaburov	f19cf9274e	Merge remote-tracking branch 'upstream/master' into nvptx	2013-10-29 15:24:40 +01:00
Evghenii	b2baa35c3d	added correct datalayout for nvptx64	2013-10-29 11:34:01 +01:00
Dmitry Babokin	6585a925be	Merge pull request #641 from jbrodman/stdlibshift Add a "shift" operator to the stdlib.	2013-10-28 14:18:31 -07:00
Evghenii	ff98271a43	using mask i1 for nvptx64	2013-10-28 17:03:00 +01:00
Evghenii	500ad7fb51	using mask i1 for nvptx64	2013-10-28 17:01:03 +01:00
Evghenii	4f486333ed	now nvptx allows extern "C" task void, which is emits a kernel that should (?) be callable by driver API from external code	2013-10-28 16:47:40 +01:00
Evghenii	68ced6ce46	automatically adds -D__NVPTX__ when nvptx64 target is chosen	2013-10-28 14:08:32 +01:00
Evghenii	8391d05697	added blockIndex computations	2013-10-28 10:18:30 +01:00
Evghenii	ac095dbf3e	working on nvptx	2013-10-26 16:12:33 +02:00
james.brodman	4d289b16c2	Redesign after being hit with the KISS bat.	2013-10-23 14:25:43 -04:00
egaburov	f89bad1e94	launch now passes the right info into tasking	2013-10-23 12:51:06 +02:00
james.brodman	f97a2d68c8	Bugfix for non-const shift amt and unit tests.	2013-10-22 18:29:20 -04:00
james.brodman	899f85ce9c	Initial Support for new stdlib shift operator	2013-10-22 18:06:54 -04:00
egaburov	1710b9171f	removed LLVM_3_0 legacy part and changed copyright to 2013	2013-10-18 08:53:01 +02:00
egaburov	7e9b4c0924	added avx2-i64x4 and avx1.1-i64x4 targets	2013-10-15 10:02:10 +02:00
egaburov	8808a8cc9c	Merge remote-tracking branch 'upstream/master' into nvptx	2013-10-13 13:03:00 +02:00
Ilia Filippov	92773ada6d	fix for ISPC for compfails at sse4-i8 and sse4-i16	2013-10-11 15:23:40 +04:00
egaburov	5d56d29240	merged with master	2013-10-08 19:13:30 +02:00
Dmitry Babokin	43245bbc11	Adding check for OS AVX support to auto-dispatch code	2013-09-19 15:39:56 +04:00
Evghenii	9861375f0c	renamed avx-i64x4 -> avx1-i64x4	2013-09-13 15:07:14 +02:00
Evghenii	059d80cc11	included suggested changes, ./tests/launch-*.ispc still fails. something is mask64 related, not sure what. help...	2013-09-12 17:18:12 +02:00
egaburov	7364e06387	added mask64	2013-09-12 12:02:42 +02:00
egaburov	efc20c2110	added svml support to all sse/avx modes	2013-09-11 17:07:54 +02:00
egaburov	19379db3b6	svml cleanup	2013-09-11 16:48:56 +02:00
egaburov	7a32699573	added svml.m4	2013-09-11 15:18:03 +02:00
egaburov	320c41ffcf	added svml support. experimental. for some reason all sybmols are visible..	2013-09-11 15:16:50 +02:00
egaburov	9c79d4d182	addded avxh with vectorWidth=4 support, use --target=avxh to enable it	2013-09-11 12:58:02 +02:00
james.brodman	8db378b265	Revert "Remove support for using SVML for math lib routines." This reverts commit `d9c38b5c1f`.	2013-09-04 16:01:58 -04:00
Matt Pharr	1276ea9844	Revert "Remove support for building with LLVM 3.1" This reverts commit `d3c567503b`. Conflicts: opt.cpp	2013-08-06 17:00:35 -07:00
Matt Pharr	5b20b06bd9	Add avg_{up,down}_int{8,16} routines to stdlib These compute the average of two given values, rounding up and down, respectively, if the result isn't exact. When possible, these are mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US] on NEON.) A subsequent commit will add pattern-matching to generate calls to these intrinsincs when the corresponding patterns are detected in the IR.)	2013-08-06 08:41:12 -07:00
Matt Pharr	d9c38b5c1f	Remove support for using SVML for math lib routines. This path was poorly maintained and wasn't actually available on most targets.	2013-07-31 06:56:48 -07:00
Matt Pharr	d3c567503b	Remove support for building with LLVM 3.1	2013-07-31 06:46:45 -07:00
Matt Pharr	48ff03112f	Remove __pause from stdlib_core() in utils.m4. It wasn't ever being used, and was breaking compilation on ARM.	2013-07-30 08:44:22 -07:00
Matt Pharr	ab3b633733	Add 8-bit and 16-bit specialized NEON targets. Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask elements, respectively, and thus should generate the best code when used for computation with datatypes of those sizes.	2013-07-30 08:44:16 -07:00
egaburov	153fbc3d7d	some changes	2013-07-29 11:05:05 +02:00
egaburov	af61c9bae3	working on target-nvptx64... need to add nvptx64	2013-07-28 15:50:08 +02:00
egaburov	67b549a937	Added nvptx64 target. Things to do: 1. builtins/target-nvptx64.ll to write, now it is just a copy of target-generic-1.ll 2. add __global__ & __device__ scope 2. make code work for a single cuda thread 3. use tasks to work as a block grid and programIndex as laneIdx, programCount as warpSize 4. ... and more...	2013-07-28 14:31:43 +02:00
Matt Pharr	b6df447b55	Add reduce_add() for int8 and int16 types. This maps to specialized instructions (e.g. PSADBW) when available.	2013-07-25 09:46:01 -07:00
Matt Pharr	2d063925a1	Explicitly call the PBLENDVB intrinsic for i8 blending with sse4-8. This is slightly cleaner than trunc-ing the i8 mask to i1 and using a vector select. (And is probably more safe in terms of good code.)	2013-07-25 09:46:01 -07:00
Matt Pharr	780b0dfe47	Add SSE4-16 target. Along the lines of sse4-8, this is an 8-wide target for SSE4, using 16-bit elements for the mask. It's thus (in principle) the best target for SIMD computation with 16-bit datatypes.	2013-07-25 09:46:01 -07:00
Matt Pharr	53414f12e6	Add SSE4 target optimized for computation with 8-bit datatypes. This change adds a new 'sse4-8' target, where programCount is 16 and the mask element size is 8-bits. (i.e. the most appropriate sizing of the mask for SIMD computation with 8-bit datatypes.)	2013-07-23 17:30:32 -07:00
Matt Pharr	15a3ef370a	Use @llvm.readcyclecounter to implement stdlib clock() function. Also added a test for the clock builtin.	2013-07-23 17:24:57 -07:00
Matt Pharr	e7abf3f2ea	Add support for mask vectors of 8 and 16-bit element types. There were a number of places throughout the system that assumed that the execution mask would only have either 32-bit or 1-bit elements. This commit makes it possible to have a target with an 8- or 16-bit mask.	2013-07-23 16:50:11 -07:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Dmitry Babokin	7bedb4a081	Add memory alignment dependant on the platform (16/32/64/etc)	2013-05-24 10:29:01 +04:00
Dmitry Babokin	630215f56f	Defining memory routines completely separately for Windows/Unix 32/64 bit.	2013-05-24 10:29:01 +04:00
Dmitry Babokin	5362dade37	Fixing util.m4 to declare nothing unless some macro is instantiated	2013-05-24 10:29:00 +04:00

1 2 3

118 Commits