aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
jbrodman	3d35d4485d	Change opt help message to be more clear.	2015-03-16 12:11:46 -07:00
jbrodman	a54c0db457	Added missing newline.	2015-03-06 10:36:07 -08:00
jbrodman	9baade2cb5	Change dll export feature to a switch.	2015-02-23 11:43:06 -08:00
Andrey Guskov	2f2af816e7	3.7-related copyright update	2015-01-20 14:56:58 +03:00
Andrey Guskov	ae8b724d92	Added LLVM 3.7 support	2015-01-19 17:30:59 +03:00
Dmitry Babokin	701bd9b029	Removing -debug-ir functionality for 3.6, as it was removed from LLVM.	2014-12-03 18:28:32 +03:00
evghenii	8745888ce9	merged with master	2014-08-11 10:04:54 +02:00
Anton Mitrokhin	60fa76ccc1	reversed macros LLVM_3_6 to LLVM_3_5+ in .cpp and .h files	2014-08-01 15:40:48 +04:00
Anton Mitrokhin	d0c9b7c9b5	wiped out all LLVM 3.1 support	2014-08-01 14:54:08 +04:00
Anton Mitrokhin	725be222ac	added LLVM_3_6 var	2014-07-30 11:50:15 +04:00
evghenii	b3c5a9c4d6	added #ifdef ISPC_NVPTX_ENALED ... #endif guards	2014-07-09 12:32:18 +02:00
Evghenii	84134678dc	ISPC can emit LLVM PTX now	2014-01-10 07:53:09 +01:00
evghenii	bb46b561fd	Merged with upstream/master	2013-11-22 08:13:16 +01:00
Ilia Filippov	3fd9d5a025	support of LLVM 3.5	2013-11-21 19:09:43 +04:00
egaburov	5d56d29240	merged with master	2013-10-08 19:13:30 +02:00
james.brodman	8db378b265	Revert "Remove support for using SVML for math lib routines." This reverts commit `d9c38b5c1f`.	2013-09-04 16:01:58 -04:00
Ilia Filippov	f620cdbaa1	Changes in perf.py functionality, unification of examples, correction build warnings	2013-08-26 14:04:59 +04:00
Dmitry Babokin	3f2217646e	Merge pull request #562 from mmp/arm New target naming scheme, new targets (SSE4-i8x16 and SSE4-i16x8), plus some cleanup and improvements.	2013-08-22 08:33:25 -07:00
james.brodman	6be3c24ee5	Separate -O and -g	2013-08-15 15:24:46 -04:00
Matt Pharr	0c5742b6f8	Implement new naming scheme for --target. Now targets are named like "<isa>-i<mask size>x<gang size>", e.g. "sse4-i8x16", or "avx2-i32x16". The old target names are still supported.	2013-08-08 19:23:44 -07:00
Matt Pharr	cd9afe946c	Merge branch 'master' into arm Conflicts: Makefile builtins.cpp ispc.cpp ispc.h ispc.vcxproj opt.cpp	2013-08-06 17:39:21 -07:00
Matt Pharr	1276ea9844	Revert "Remove support for building with LLVM 3.1" This reverts commit `d3c567503b`. Conflicts: opt.cpp	2013-08-06 17:00:35 -07:00
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Ilia Filippov	a174a90f86	Supporting dumping, switching off and debug printing of optimization phases	2013-08-01 11:37:52 +04:00
Matt Pharr	d9c38b5c1f	Remove support for using SVML for math lib routines. This path was poorly maintained and wasn't actually available on most targets.	2013-07-31 06:56:48 -07:00
Matt Pharr	d3c567503b	Remove support for building with LLVM 3.1	2013-07-31 06:46:45 -07:00
egaburov	67b549a937	Added nvptx64 target. Things to do: 1. builtins/target-nvptx64.ll to write, now it is just a copy of target-generic-1.ll 2. add __global__ & __device__ scope 2. make code work for a single cuda thread 3. use tasks to work as a block grid and programIndex as laneIdx, programCount as warpSize 4. ... and more...	2013-07-28 14:31:43 +02:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Dmitry Babokin	1a7ac8b804	Enable memory alignment management via compiler options	2013-05-24 10:29:01 +04:00
Dmitry Babokin	b6b9daa3c5	Enabling llvm 3.4	2013-05-13 19:25:31 +04:00
Dmitry Babokin	7497e86902	Adding Windows support for aligned memory allocation on Windows	2013-04-26 22:07:30 +02:00
Dmitry Babokin	95950885cf	Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS.	2013-04-26 20:33:24 +04:00
Dmitry Babokin	11528b0def	Fix for #474 : colon separated path in -I	2013-04-17 18:38:57 +04:00
Dmitry Babokin	3f8a678c5a	Editorial change: fixing trailing white spaces and tabs	2013-03-18 16:17:55 +04:00
Matt Pharr	0bf1320a32	Remove support for building with LLVM 3.0	2013-01-06 12:27:53 -08:00
Peng Tu	16b0806d40	Fix LLVM TOT build issue.	2012-11-21 19:09:10 -08:00
Matt Pharr	be2108260e	Add --opt=force-aligned-memory option. This forces all vector loads/stores to be done assuming that the given pointer is aligned to the vector size, thus allowing the use of sometimes more-efficient instructions. (If it isn't the case that the memory is aligned, the program will fail!).	2012-09-14 13:49:45 -07:00
Matt Pharr	19d8f2e258	Generate FMA instructions with AVX2 (when possible). Issue #320.	2012-08-03 10:43:41 -07:00
Nicolas Trangez	3a007f939a	Build: Include unistd.h where required Some modules require an include of unistd.h (e.g. for getcwd and isatty definitions). These changes were required to build successfully on a Fedora 17 system, using GCC 4.7.0 & glibc-headers 2.15.	2012-07-04 14:49:00 +02:00
Ingo Wald	789e04ce90	Add support for host/device stub functions for offload.	2012-06-12 10:23:49 -07:00
Matt Pharr	1397dbdabc	Don't generate colorized output escapes when stderr isn't a TTY. When piping to a pile, more/less, etc, this is generally undesirable. This behavior can be overridden with the --colorized-output command-line flag.	2012-06-04 09:20:57 -07:00
Matt Pharr	ee1fe3aa9f	Update build to handle existence of LLVM 3.2 dev branch. We now compile with LLVM 3.0, 3.1, and 3.2svn.	2012-05-03 08:25:25 -07:00
Matt Pharr	098c4910de	Remove support for building with LLVM 2.9. A forthcoming change uses some features of LLVM 3.0's new type system, and it's not worth back-porting this to also all work with LLVM 2.9.	2012-04-15 20:08:51 -07:00
Matt Pharr	581472564d	Print "friendly" ispc message when abort/seg fault signal is thrown. Make crashes that happen in LLVM less inscrutable. Issue #222.	2012-04-05 15:51:44 -07:00
Matt Pharr	b813452d33	Don't issue a slew of warnings if a bogus cpu type is specified. Issue #221.	2012-04-03 06:13:28 -07:00
Lu Guanqun	da9dba80a0	fix --outfile option eror	2012-03-20 09:44:49 +08:00
Matt Pharr	777343331e	Print numeric version number with --verison.	2012-03-19 14:41:25 -07:00
Matt Pharr	3082ea4765	Require Type::Equal() for all type equality comparisons. Previously, we uniqued AtomicTypes, so that they could be compared by pointer equality, but with forthcoming SOA variability changes, this would become too unwieldy (lacking a more general / ubiquitous type uniquing implementation.)	2012-03-05 09:58:09 -08:00
Matt Pharr	73bf552cd6	Add support for coalescing memory accesses from gathers. There are two related optimizations that happen now. (These currently only apply for gathers where the mask is known to be all on, and to gathers that are accessing 32-bit sized elements, but both of these may be generalized in the future.) First, for any single gather, we are now more flexible in mapping it to individual memory operations. Previously, we would only either map it to a general gather (one scalar load per SIMD lane), or an unaligned vector load (if the program instances could be determined to be accessing a sequential set of locations in memory.) Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit), 4-wide, or 8-wide loads. Further, we now generate code that shuffles these loads around. Doing fewer, larger loads in this manner, when possible, can be more efficient. Second, we can coalesce memory accesses across multiple gathers. If we have a series of gathers without any memory writes in the middle, then we try to analyze their reads collectively and choose an efficient set of loads for them. Not only does this help if different gathers reuse values from the same location in memory, but it's specifically helpful when data with AOS layout is being accessed; in this case, we're often able to generate wide vector loads and appropriate shuffles automatically.	2012-02-10 13:10:39 -08:00
Matt Pharr	bb8e13e3c9	Add support for -I command-line argument to specify #include search directories.	2012-02-07 08:39:01 -08:00

1 2

90 Commits