aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Evghenii	546f9cb409	MAJOR CHANGE--- STOP WITH THIS BRANCH--	2014-01-06 13:51:02 +01:00
Evghenii	15299571f9	use trunk and compile nvvm,llvm,cu files	2014-01-06 08:35:07 +01:00
Evghenii	b2368e243c	+1	2014-01-06 08:15:50 +01:00
Evghenii	0d944ac87e	use trunk llvm, use openmp in tasksys	2014-01-05 19:04:40 +01:00
evghenii	aa90a48872	+1	2014-01-01 13:41:52 +01:00
evghenii	1672ec80d0	+!	2013-12-25 20:12:36 +01:00
evghenii	bb46b561fd	Merged with upstream/master	2013-11-22 08:13:16 +01:00
Ilia Filippov	3fd9d5a025	support of LLVM 3.5	2013-11-21 19:09:43 +04:00
Evghenii	6f200d310f	fixed to work with LLVM 3.2	2013-11-21 11:03:03 +01:00
Evghenii	1445202e0e	identified bug due to llvm-3.4	2013-11-14 21:18:25 +01:00
Evghenii	e9bc2b7b54	added uniform_new/uniform_delete in util_ptx.m4 and __shfl intrinsics	2013-11-11 09:18:15 +01:00
Evghenii	b50d3944ea	allow easy switch between llvm	2013-10-29 10:22:07 +01:00
Evghenii	ac095dbf3e	working on nvptx	2013-10-26 16:12:33 +02:00
egaburov	7e9b4c0924	added avx2-i64x4 and avx1.1-i64x4 targets	2013-10-15 10:02:10 +02:00
egaburov	8808a8cc9c	Merge remote-tracking branch 'upstream/master' into nvptx	2013-10-13 13:03:00 +02:00
Dmitry Babokin	17b54cb0c8	Fix problem with building ISPC by clang 3.4	2013-10-11 16:29:17 +04:00
Dmitry Babokin	8297edd251	Switching default compiler on Unix from g++ to clang++	2013-10-11 16:29:16 +04:00
egaburov	5d56d29240	merged with master	2013-10-08 19:13:30 +02:00
Evghenii	9861375f0c	renamed avx-i64x4 -> avx1-i64x4	2013-09-13 15:07:14 +02:00
egaburov	7364e06387	added mask64	2013-09-12 12:02:42 +02:00
egaburov	320c41ffcf	added svml support. experimental. for some reason all sybmols are visible..	2013-09-11 15:16:50 +02:00
egaburov	9c79d4d182	addded avxh with vectorWidth=4 support, use --target=avxh to enable it	2013-09-11 12:58:02 +02:00
Ilia Filippov	320b1700ff	correction of adding -Werror option	2013-08-30 16:01:01 +04:00
Ilia Filippov	f620cdbaa1	Changes in perf.py functionality, unification of examples, correction build warnings	2013-08-26 14:04:59 +04:00
Matt Pharr	ea8591a85a	Fix build with LLVM top-of-tree (link libcurses)	2013-08-10 11:22:43 -07:00
Matt Pharr	cd9afe946c	Merge branch 'master' into arm Conflicts: Makefile builtins.cpp ispc.cpp ispc.h ispc.vcxproj opt.cpp	2013-08-06 17:39:21 -07:00
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Matt Pharr	ab3b633733	Add 8-bit and 16-bit specialized NEON targets. Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask elements, respectively, and thus should generate the best code when used for computation with datatypes of those sizes.	2013-07-30 08:44:16 -07:00
egaburov	67b549a937	Added nvptx64 target. Things to do: 1. builtins/target-nvptx64.ll to write, now it is just a copy of target-generic-1.ll 2. add __global__ & __device__ scope 2. make code work for a single cuda thread 3. use tasks to work as a block grid and programIndex as laneIdx, programCount as warpSize 4. ... and more...	2013-07-28 14:31:43 +02:00
Matt Pharr	780b0dfe47	Add SSE4-16 target. Along the lines of sse4-8, this is an 8-wide target for SSE4, using 16-bit elements for the mask. It's thus (in principle) the best target for SIMD computation with 16-bit datatypes.	2013-07-25 09:46:01 -07:00
Matt Pharr	53414f12e6	Add SSE4 target optimized for computation with 8-bit datatypes. This change adds a new 'sse4-8' target, where programCount is 16 and the mask element size is 8-bits. (i.e. the most appropriate sizing of the mask for SIMD computation with 8-bit datatypes.)	2013-07-23 17:30:32 -07:00
Matt Pharr	e7abf3f2ea	Add support for mask vectors of 8 and 16-bit element types. There were a number of places throughout the system that assumed that the execution mask would only have either 32-bit or 1-bit elements. This commit makes it possible to have a target with an 8- or 16-bit mask.	2013-07-23 16:50:11 -07:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Dmitry Babokin	95fcdc36ee	Tracking ToT changes, which now require to link option library. This is Unix only. Windows will be fixed separately	2013-06-18 22:12:33 +04:00
Dmitry Babokin	4b388edca9	Splitting .ll files to be compiled in two versions - 32 and 64 bit. Unix only	2013-05-24 10:29:00 +04:00
Dmitry Babokin	e084f1c311	Adding missing copyright info in Makefile	2013-04-26 19:11:20 +02:00
Dmitry Babokin	95950885cf	Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS.	2013-04-26 20:33:24 +04:00
Dmitry Babokin	0f631ad49b	Add info about compiler used for ispc build to Makefle output	2013-03-18 12:30:06 +04:00
Dmitry Babokin	bee3029764	Adding debug and clang targets, changing asan target	2013-02-21 17:26:21 +04:00
Dmitry Babokin	150d6d1f56	Adding Address Sanitizer build	2013-02-15 06:50:26 -08:00
Dmitry Babokin	8d8d9c63fe	Fix for #349 : build issue when no git found	2013-02-11 11:01:46 -08:00
Dmitry Babokin	52147ce631	Fixing issue #428 : need to specify LLVM libs explicitly	2013-02-11 04:15:50 -08:00
james.brodman	3aaf2ef2d4	ToT Fixes / M4 macro fix	2013-01-14 14:55:10 -05:00
Peng Tu	16b0806d40	Fix LLVM TOT build issue.	2012-11-21 19:09:10 -08:00
Ingo Wald	d492af7bc0	64-bit gather/scatter, aligned load/store, i8 support	2012-09-17 03:39:02 +02:00
Matt Pharr	1a4434d314	Fix build with LLVM top-of-tree	2012-08-11 09:28:48 -07:00
Matt Pharr	38bcecd2f3	Print a useful error if llvm-config isn't found when building. Previously, there was a ton of unintelligible error spew. Issue #273.	2012-07-06 13:18:11 -07:00
Matt Pharr	6c7df4cb6b	Add initial support for "avx1.1" targets for Ivy Bridge. So far, only the use of the float/half conversion instructions distinguishes this from the "avx1" target. Partial work on issue #263.	2012-06-08 15:55:00 -07:00
Matt Pharr	449d956966	Add support for generic-64 target.	2012-05-25 11:57:28 -07:00
Matt Pharr	4f053e5b83	Pass OPT flags when linking	2012-05-08 13:25:09 -07:00

1 2

88 Commits