aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Dmitry Babokin	95fcdc36ee	Tracking ToT changes, which now require to link option library. This is Unix only. Windows will be fixed separately	2013-06-18 22:12:33 +04:00
Dmitry Babokin	4b388edca9	Splitting .ll files to be compiled in two versions - 32 and 64 bit. Unix only	2013-05-24 10:29:00 +04:00
Dmitry Babokin	e084f1c311	Adding missing copyright info in Makefile	2013-04-26 19:11:20 +02:00
Dmitry Babokin	95950885cf	Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS.	2013-04-26 20:33:24 +04:00
Dmitry Babokin	0f631ad49b	Add info about compiler used for ispc build to Makefle output	2013-03-18 12:30:06 +04:00
Dmitry Babokin	bee3029764	Adding debug and clang targets, changing asan target	2013-02-21 17:26:21 +04:00
Dmitry Babokin	150d6d1f56	Adding Address Sanitizer build	2013-02-15 06:50:26 -08:00
Dmitry Babokin	8d8d9c63fe	Fix for #349 : build issue when no git found	2013-02-11 11:01:46 -08:00
Dmitry Babokin	52147ce631	Fixing issue #428 : need to specify LLVM libs explicitly	2013-02-11 04:15:50 -08:00
james.brodman	3aaf2ef2d4	ToT Fixes / M4 macro fix	2013-01-14 14:55:10 -05:00
Peng Tu	16b0806d40	Fix LLVM TOT build issue.	2012-11-21 19:09:10 -08:00
Ingo Wald	d492af7bc0	64-bit gather/scatter, aligned load/store, i8 support	2012-09-17 03:39:02 +02:00
Matt Pharr	1a4434d314	Fix build with LLVM top-of-tree	2012-08-11 09:28:48 -07:00
Matt Pharr	38bcecd2f3	Print a useful error if llvm-config isn't found when building. Previously, there was a ton of unintelligible error spew. Issue #273.	2012-07-06 13:18:11 -07:00
Matt Pharr	6c7df4cb6b	Add initial support for "avx1.1" targets for Ivy Bridge. So far, only the use of the float/half conversion instructions distinguishes this from the "avx1" target. Partial work on issue #263.	2012-06-08 15:55:00 -07:00
Matt Pharr	449d956966	Add support for generic-64 target.	2012-05-25 11:57:28 -07:00
Matt Pharr	4f053e5b83	Pass OPT flags when linking	2012-05-08 13:25:09 -07:00
Matt Pharr	c756c855ea	Compile with -O2 by default on Linux/OSX.	2012-05-04 13:55:37 -07:00
Matt Pharr	d99bd279e8	Add generic-32 target.	2012-05-03 11:11:06 -07:00
Matt Pharr	ee1fe3aa9f	Update build to handle existence of LLVM 3.2 dev branch. We now compile with LLVM 3.0, 3.1, and 3.2svn.	2012-05-03 08:25:25 -07:00
Nipunn Koorapati	d0c7b5d35c	Merge remote-tracking branch 'upstream/master'	2012-04-06 17:58:21 -04:00
Nipunn Koorapati	802add1f97	Added to the Makefile the ability to point to a custom installation of llvm and clang.	2012-04-06 17:54:55 -04:00
Matt Pharr	1dac05960a	Fix build with LLVM 3.1 ToT	2012-04-05 08:17:56 -07:00
Matt Pharr	a69b7a5a01	Fix build with LLVM 3.1 TOT	2012-03-10 13:06:53 -08:00
Gabe Weisz	c67a286aa6	Add support for 1-wide scalar target. Issue #40.	2012-01-29 06:36:07 -08:00
Matt Pharr	58a0b4a20d	Add separate set of builtins for AVX2. (i.e., stop just reusing the ones for AVX1). For now the only difference is that the int/uint min/max functions call the new intrinsic for that. Once gather is available from LLVM, that will go here as well.	2012-01-13 14:40:01 -08:00
Matt Pharr	b60f8b4f70	Fix merge conflicts	2012-01-11 17:13:51 -08:00
Jean-Luc Duprat	0519eea951	Makefile does not hardcode link paths on Linux Link statically for both x86 and x86-64	2012-01-10 10:34:57 -08:00
Matt Pharr	2be1251c70	Fix Makefile on OSX (uname -o not supported)	2012-01-09 07:40:47 -08:00
Pierre-Antoine Lacaze	b683aa11b1	Fix linking under mingw, libdl is Linux only.	2012-01-09 10:52:46 +01:00
Pierre-Antoine Lacaze	2654bb0112	Handle python installations in non-standards locations.	2012-01-09 10:29:54 +01:00
Matt Pharr	8938e14442	Add support for emitting ~generic vectorized C++ code. The compiler now supports an --emit-c++ option, which generates generic vector C++ code. To actually compile this code, the user must provide C++ code that implements a variety of types and operations (e.g. adding two floating-point vector values together, comparing them, etc). There are two examples of this required code in examples/intrinsics: generic-16.h is a "generic" 16-wide implementation that does all required with scalar math; it's useful for demonstrating the requirements of the implementation. Then, sse4.h shows a simple implementation of a SSE4 target that maps the emitted function calls to SSE intrinsics. When using these example implementations with the ispc test suite, all but one or two tests pass with gcc and clang on Linux and OSX. There are currently ~10 failures with icc on Linux, and ~50 failures with MSVC 2010. (To be fixed in coming days.) Performance varies: when running the examples through the sse4.h target, some have the same performance as when compiled with --target=sse4 from ispc directly (options), while noise is 12% slower, rt is 26% slower, and aobench is 2.2x slower. The details of this haven't yet been carefully investigated, but will be in coming days as well. Issue #92.	2012-01-04 12:59:03 -08:00
Matt Pharr	1d9201fe3d	Add "generic" 4, 8, and 16-wide targets. When used, these targets end up with calls to undefined functions for all of the various special vector stuff ispc needs to compile ispc programs (masked store, gather, min/max, sqrt, etc.). These targets are not yet useful for anything, but are a step toward having an option to C++ code with calls out to intrinsics. Reorganized the directory structure a bit and put the LLVM bitcode used to define target-specific stuff (as well as some generic built-ins stuff) into a builtins/ directory. Note that for building on Windows, it's now necessary to set a LLVM_VERSION environment variable (with values like LLVM_2_9, LLVM_3_0, LLVM_3_1svn, etc.)	2011-12-19 13:46:50 -08:00
Matt Pharr	04df63d955	Update run_tests.py to work on Windows. Removed JIT-based testing path entirely.	2011-12-06 13:46:20 -08:00
Matt Pharr	e2b6ed3db8	Fix built for LLVM2.9 and 3.1svn	2011-12-06 08:08:41 -08:00
Matt Pharr	455d963962	Don't ignore return value from getcwd()	2011-12-05 09:26:33 -08:00
Matt Pharr	286c23426e	Add "double-wide" sse2-x2 target. i.e. run 8 program instances together, along the lines of the double-pumped sse4-x2 target.	2011-10-11 15:17:31 -07:00
Matt Pharr	f9c67ff806	Explicit representation of ASTs for all the functions in a compile unit. Added AST and Function classes. Now, we parse the whole file and build up the AST for all of the functions in the Module before we emit IR for the functions (vs. before, when we generated IR along the way as we parsed the source file.)	2011-10-06 15:35:27 -07:00
Matt Pharr	06975bc7ab	Add support for compiling to multiple targets. If a flag along the lines of "--target=sse4,avx-x2" is provided on the command-line, then the program will be compiled for each of the given targets, with a separate output file generated for each one. Further, an output file with dispatch functions that check the current system's CPU and then chooses the best available variant is also created. Issue #11.	2011-10-04 16:01:55 -07:00
Matt Pharr	85063f493c	Revert attempt to be clever about which LLVM libraries to link in--just link all of them. (This was causing build problems for some folks.)	2011-09-01 05:02:44 -07:00
Matt Pharr	b67498766e	Big rewrite / improvement of target handling. If no CPU is specified, use the host CPU type, not just a default of "nehalem". Provide better features strings to the LLVM target machinery. -> Thus ensuring that LLVM doesn't generate SSE>2 instructions for the SSE2 target (Fixes issue #82). -> Slight code improvements from using cmovs in generated code now Use the llvm popcnt intrinsic for the SSE2 target now (it now generates code that doesn't call the popcnt instruction now that we properly tell LLVM which instructions are and aren't available for SSE2.)	2011-08-26 09:54:45 -07:00
Matt Pharr	7756265503	Add double-pumped AVX target (i.e., run 16-wide). Not yet tested.	2011-08-20 11:28:22 +01:00
Matt Pharr	04c93043d6	Target handling fixes. Set the Module's target appropriately when it's first created. Compile separate 32 and 64 bit versions of the builtins-c bitcocde and load the appropriate one based on the target we're compiling for.	2011-08-15 16:03:50 +01:00
Matt Pharr	0ac4f7b620	Add various prefetch functions to the standard library.	2011-08-03 13:31:45 -07:00
Matt Pharr	a552927a6a	Cleanup implementation of target builtins code. - Renamed stdlib-sse.ll to builtins-sse.ll (etc.) in an attempt to better indicate the fact that the stuff in those files has a role beyond implementing stuff for the standard library. - Moved declarations of the various __pseudo_* functions from being done with LLVM API calls in builtins.cpp to just straight up declarations in LLVM assembly language in builtins.m4. (Much less code to do it this way, and more clear what's going on.)	2011-08-01 05:58:43 +01:00
Matt Pharr	bba7211654	Add support for int8/int16 types. Addresses issues #9 and #42 .	2011-07-21 06:57:40 +01:00
Matt Pharr	5bcc611409	Implement global atomics and a memory barrier in the standard library. This checkin provides the standard set of atomic operations and a memory barrier in the ispc standard library. Both signed and unsigned 32- and 64-bit integer types are supported.	2011-07-04 17:20:42 +01:00
Matt Pharr	c14c3ceba6	Provide both signed and unsigned int variants of bitcode-based builtins. When creating function Symbols for functions that were defined in LLVM bitcode for the standard library, if any of the function parameters are integer types, create two ispc-side Symbols: one where the integer types are all signed and the other where they are all unsigned. This allows us to provide, for example, both store_to_int16(reference int a[], uniform int offset, int val) as well as store_to_int16(reference unsigned int a[], uniform int offset, unsigned int val). functions. Added some additional tests to exercise the new variants of these. Also fixed some cases where the __{load,store}_int{8,16} builtins would read from/write to memory even if the mask was all off (which could cause crashes in some cases.)	2011-07-04 12:10:26 +01:00

1 2

57 Commits