aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Evghenii	c23dd8a951	fixed __puts_nvptx	2014-02-05 17:48:04 +01:00
Evghenii	7b2ceba128	added "internal" for helper functions to avoid them being exported to PTX	2014-02-05 17:02:05 +01:00
Evghenii	686c1d676d	improvements	2014-02-05 12:04:36 +01:00
Evghenii	a3b00fdcd6	added support for global atomics	2014-01-26 14:23:26 +01:00
Evghenii	a7d4a3f922	fix for __any	2014-01-26 13:15:13 +01:00
Evghenii	ddb9b2fc47	added basic printing from ptx	2014-01-24 13:44:38 +01:00
Evghenii	be6ac0408a	added compile-time constant __is_nvptx_traget that can be used with stdlib.ispc	2014-01-24 09:02:12 +01:00
Evghenii	1cf1dab649	fixed foreach_unique and local_atomics	2014-01-23 21:57:20 +01:00
Evghenii	98fc43d859	Merge branch 'master' into nvptx	2014-01-21 20:05:27 +01:00
Ilia Filippov	aa31957d84	supporting LLVM trunk	2014-01-21 14:21:26 +04:00
Evghenii	84134678dc	ISPC can emit LLVM PTX now	2014-01-10 07:53:09 +01:00
evghenii	9053eed4b4	added basic optimization pass that promotes uniform into varying variables (not array) for nvptx target	2014-01-10 06:32:57 +01:00
Evghenii	9b74e60185	added conversion from addrspace(3)/__local/__shared__ to addspace(0)/generic when PtrToInt is called	2014-01-07 14:29:55 +01:00
Evghenii	18a50aa679	further cleaning...	2014-01-06 14:34:28 +01:00
Evghenii	546f9cb409	MAJOR CHANGE--- STOP WITH THIS BRANCH--	2014-01-06 13:51:02 +01:00
Evghenii	d77789d8fe	+merged with master	2013-12-18 11:37:01 +01:00
Ilia Filippov	473f1cb4d2	packed_store_active2	2013-12-17 21:14:29 +04:00
evghenii	bb46b561fd	Merged with upstream/master	2013-11-22 08:13:16 +01:00
Evghenii	918ca339b6	now programIndex returns laneIdx = %tid.x & (%warpsize-1) & programCount returns 32	2013-11-14 19:27:52 +01:00
Dmitry Babokin	65ea6fd48a	Reasoning to use sse4 bitcode file	2013-11-14 15:34:30 +04:00
Dmitry Babokin	ffc9a33933	avx1-i32x4 implementation as sse4-i32x4 with avx target-feature flag	2013-11-14 15:34:30 +04:00
Evghenii	4cd7e10ad3	reversed to original changes. Here is the plan to use CDP and genarate only device code with host wrapper..	2013-11-12 12:51:56 +01:00
egaburov	60881499dc	Merge branch 'nvptx' of github.com:egaburov/ispc into nvptx	2013-10-29 15:25:14 +01:00
egaburov	f19cf9274e	Merge remote-tracking branch 'upstream/master' into nvptx	2013-10-29 15:24:40 +01:00
Evghenii	ed9bca0e12	add __soa_to_aos_float1 and __aos_to_soa_float1 builtins	2013-10-29 15:06:08 +01:00
Dmitry Babokin	6585a925be	Merge pull request #641 from jbrodman/stdlibshift Add a "shift" operator to the stdlib.	2013-10-28 14:18:31 -07:00
Evghenii	8391d05697	added blockIndex computations	2013-10-28 10:18:30 +01:00
Evghenii	ac095dbf3e	working on nvptx	2013-10-26 16:12:33 +02:00
james.brodman	899f85ce9c	Initial Support for new stdlib shift operator	2013-10-22 18:06:54 -04:00
egaburov	7e9b4c0924	added avx2-i64x4 and avx1.1-i64x4 targets	2013-10-15 10:02:10 +02:00
egaburov	5d56d29240	merged with master	2013-10-08 19:13:30 +02:00
Evghenii	9861375f0c	renamed avx-i64x4 -> avx1-i64x4	2013-09-13 15:07:14 +02:00
egaburov	7364e06387	added mask64	2013-09-12 12:02:42 +02:00
egaburov	9cf8e8cbf3	builtins fix for double precision svml and __stdlib_asin	2013-09-11 15:23:45 +02:00
egaburov	320c41ffcf	added svml support. experimental. for some reason all sybmols are visible..	2013-09-11 15:16:50 +02:00
egaburov	9c79d4d182	addded avxh with vectorWidth=4 support, use --target=avxh to enable it	2013-09-11 12:58:02 +02:00
james.brodman	8db378b265	Revert "Remove support for using SVML for math lib routines." This reverts commit `d9c38b5c1f`.	2013-09-04 16:01:58 -04:00
Matt Pharr	cd9afe946c	Merge branch 'master' into arm Conflicts: Makefile builtins.cpp ispc.cpp ispc.h ispc.vcxproj opt.cpp	2013-08-06 17:39:21 -07:00
Matt Pharr	1276ea9844	Revert "Remove support for building with LLVM 3.1" This reverts commit `d3c567503b`. Conflicts: opt.cpp	2013-08-06 17:00:35 -07:00
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Matt Pharr	d9c38b5c1f	Remove support for using SVML for math lib routines. This path was poorly maintained and wasn't actually available on most targets.	2013-07-31 06:56:48 -07:00
Matt Pharr	d3c567503b	Remove support for building with LLVM 3.1	2013-07-31 06:46:45 -07:00
Matt Pharr	48ff03112f	Remove __pause from stdlib_core() in utils.m4. It wasn't ever being used, and was breaking compilation on ARM.	2013-07-30 08:44:22 -07:00
Matt Pharr	ab3b633733	Add 8-bit and 16-bit specialized NEON targets. Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask elements, respectively, and thus should generate the best code when used for computation with datatypes of those sizes.	2013-07-30 08:44:16 -07:00
egaburov	67b549a937	Added nvptx64 target. Things to do: 1. builtins/target-nvptx64.ll to write, now it is just a copy of target-generic-1.ll 2. add __global__ & __device__ scope 2. make code work for a single cuda thread 3. use tasks to work as a block grid and programIndex as laneIdx, programCount as warpSize 4. ... and more...	2013-07-28 14:31:43 +02:00
Matt Pharr	b6df447b55	Add reduce_add() for int8 and int16 types. This maps to specialized instructions (e.g. PSADBW) when available.	2013-07-25 09:46:01 -07:00
Matt Pharr	780b0dfe47	Add SSE4-16 target. Along the lines of sse4-8, this is an 8-wide target for SSE4, using 16-bit elements for the mask. It's thus (in principle) the best target for SIMD computation with 16-bit datatypes.	2013-07-25 09:46:01 -07:00
Matt Pharr	53414f12e6	Add SSE4 target optimized for computation with 8-bit datatypes. This change adds a new 'sse4-8' target, where programCount is 16 and the mask element size is 8-bits. (i.e. the most appropriate sizing of the mask for SIMD computation with 8-bit datatypes.)	2013-07-23 17:30:32 -07:00
Matt Pharr	e7abf3f2ea	Add support for mask vectors of 8 and 16-bit element types. There were a number of places throughout the system that assumed that the execution mask would only have either 32-bit or 1-bit elements. This commit makes it possible to have a target with an 8- or 16-bit mask.	2013-07-23 16:50:11 -07:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00

1 2 3

131 Commits