aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	a69b7a5a01	Fix build with LLVM 3.1 TOT	2012-03-10 13:06:53 -08:00
Gabe Weisz	c67a286aa6	Add support for 1-wide scalar target. Issue #40.	2012-01-29 06:36:07 -08:00
Matt Pharr	58a0b4a20d	Add separate set of builtins for AVX2. (i.e., stop just reusing the ones for AVX1). For now the only difference is that the int/uint min/max functions call the new intrinsic for that. Once gather is available from LLVM, that will go here as well.	2012-01-13 14:40:01 -08:00
Matt Pharr	b60f8b4f70	Fix merge conflicts	2012-01-11 17:13:51 -08:00
Jean-Luc Duprat	0519eea951	Makefile does not hardcode link paths on Linux Link statically for both x86 and x86-64	2012-01-10 10:34:57 -08:00
Matt Pharr	2be1251c70	Fix Makefile on OSX (uname -o not supported)	2012-01-09 07:40:47 -08:00
Pierre-Antoine Lacaze	b683aa11b1	Fix linking under mingw, libdl is Linux only.	2012-01-09 10:52:46 +01:00
Pierre-Antoine Lacaze	2654bb0112	Handle python installations in non-standards locations.	2012-01-09 10:29:54 +01:00
Matt Pharr	8938e14442	Add support for emitting ~generic vectorized C++ code. The compiler now supports an --emit-c++ option, which generates generic vector C++ code. To actually compile this code, the user must provide C++ code that implements a variety of types and operations (e.g. adding two floating-point vector values together, comparing them, etc). There are two examples of this required code in examples/intrinsics: generic-16.h is a "generic" 16-wide implementation that does all required with scalar math; it's useful for demonstrating the requirements of the implementation. Then, sse4.h shows a simple implementation of a SSE4 target that maps the emitted function calls to SSE intrinsics. When using these example implementations with the ispc test suite, all but one or two tests pass with gcc and clang on Linux and OSX. There are currently ~10 failures with icc on Linux, and ~50 failures with MSVC 2010. (To be fixed in coming days.) Performance varies: when running the examples through the sse4.h target, some have the same performance as when compiled with --target=sse4 from ispc directly (options), while noise is 12% slower, rt is 26% slower, and aobench is 2.2x slower. The details of this haven't yet been carefully investigated, but will be in coming days as well. Issue #92.	2012-01-04 12:59:03 -08:00
Matt Pharr	1d9201fe3d	Add "generic" 4, 8, and 16-wide targets. When used, these targets end up with calls to undefined functions for all of the various special vector stuff ispc needs to compile ispc programs (masked store, gather, min/max, sqrt, etc.). These targets are not yet useful for anything, but are a step toward having an option to C++ code with calls out to intrinsics. Reorganized the directory structure a bit and put the LLVM bitcode used to define target-specific stuff (as well as some generic built-ins stuff) into a builtins/ directory. Note that for building on Windows, it's now necessary to set a LLVM_VERSION environment variable (with values like LLVM_2_9, LLVM_3_0, LLVM_3_1svn, etc.)	2011-12-19 13:46:50 -08:00
Matt Pharr	04df63d955	Update run_tests.py to work on Windows. Removed JIT-based testing path entirely.	2011-12-06 13:46:20 -08:00
Matt Pharr	e2b6ed3db8	Fix built for LLVM2.9 and 3.1svn	2011-12-06 08:08:41 -08:00
Matt Pharr	455d963962	Don't ignore return value from getcwd()	2011-12-05 09:26:33 -08:00
Matt Pharr	286c23426e	Add "double-wide" sse2-x2 target. i.e. run 8 program instances together, along the lines of the double-pumped sse4-x2 target.	2011-10-11 15:17:31 -07:00
Matt Pharr	f9c67ff806	Explicit representation of ASTs for all the functions in a compile unit. Added AST and Function classes. Now, we parse the whole file and build up the AST for all of the functions in the Module before we emit IR for the functions (vs. before, when we generated IR along the way as we parsed the source file.)	2011-10-06 15:35:27 -07:00
Matt Pharr	06975bc7ab	Add support for compiling to multiple targets. If a flag along the lines of "--target=sse4,avx-x2" is provided on the command-line, then the program will be compiled for each of the given targets, with a separate output file generated for each one. Further, an output file with dispatch functions that check the current system's CPU and then chooses the best available variant is also created. Issue #11.	2011-10-04 16:01:55 -07:00
Matt Pharr	85063f493c	Revert attempt to be clever about which LLVM libraries to link in--just link all of them. (This was causing build problems for some folks.)	2011-09-01 05:02:44 -07:00
Matt Pharr	b67498766e	Big rewrite / improvement of target handling. If no CPU is specified, use the host CPU type, not just a default of "nehalem". Provide better features strings to the LLVM target machinery. -> Thus ensuring that LLVM doesn't generate SSE>2 instructions for the SSE2 target (Fixes issue #82). -> Slight code improvements from using cmovs in generated code now Use the llvm popcnt intrinsic for the SSE2 target now (it now generates code that doesn't call the popcnt instruction now that we properly tell LLVM which instructions are and aren't available for SSE2.)	2011-08-26 09:54:45 -07:00
Matt Pharr	7756265503	Add double-pumped AVX target (i.e., run 16-wide). Not yet tested.	2011-08-20 11:28:22 +01:00
Matt Pharr	04c93043d6	Target handling fixes. Set the Module's target appropriately when it's first created. Compile separate 32 and 64 bit versions of the builtins-c bitcocde and load the appropriate one based on the target we're compiling for.	2011-08-15 16:03:50 +01:00
Matt Pharr	0ac4f7b620	Add various prefetch functions to the standard library.	2011-08-03 13:31:45 -07:00
Matt Pharr	a552927a6a	Cleanup implementation of target builtins code. - Renamed stdlib-sse.ll to builtins-sse.ll (etc.) in an attempt to better indicate the fact that the stuff in those files has a role beyond implementing stuff for the standard library. - Moved declarations of the various __pseudo_* functions from being done with LLVM API calls in builtins.cpp to just straight up declarations in LLVM assembly language in builtins.m4. (Much less code to do it this way, and more clear what's going on.)	2011-08-01 05:58:43 +01:00
Matt Pharr	bba7211654	Add support for int8/int16 types. Addresses issues #9 and #42 .	2011-07-21 06:57:40 +01:00
Matt Pharr	5bcc611409	Implement global atomics and a memory barrier in the standard library. This checkin provides the standard set of atomic operations and a memory barrier in the ispc standard library. Both signed and unsigned 32- and 64-bit integer types are supported.	2011-07-04 17:20:42 +01:00
Matt Pharr	c14c3ceba6	Provide both signed and unsigned int variants of bitcode-based builtins. When creating function Symbols for functions that were defined in LLVM bitcode for the standard library, if any of the function parameters are integer types, create two ispc-side Symbols: one where the integer types are all signed and the other where they are all unsigned. This allows us to provide, for example, both store_to_int16(reference int a[], uniform int offset, int val) as well as store_to_int16(reference unsigned int a[], uniform int offset, unsigned int val). functions. Added some additional tests to exercise the new variants of these. Also fixed some cases where the __{load,store}_int{8,16} builtins would read from/write to memory even if the mask was all off (which could cause crashes in some cases.)	2011-07-04 12:10:26 +01:00
Pete Couperus	fac50ba454	Use clang's preprocessor, rather than forking a process to run cpp on Mac/Linux (and not having a built-in preprocessor solution at all on Windows.) Fixes issue #32.	2011-07-04 08:35:31 +01:00
Matt Pharr	5f7e61f9b5	Another stdlib dependency improvement	2011-06-29 12:26:44 +01:00
Matt Pharr	2709c354d7	Add support for broadcast(), rotate(), and shuffle() stdlib routines	2011-06-27 17:31:44 -07:00
Matt Pharr	d340dcbfcc	Modify makefile to print out llvm version and install directory it's using	2011-06-23 16:02:09 -07:00
Matt Pharr	40bd133dec	Add dependency to make sure that bison runs (to generate parse.hh) before we try to compile lex.cpp	2011-06-22 14:47:03 -07:00
Mark Lacey	38d4ecccf4	Fix pi in two places 3.14159<2>653<6>.	2011-06-21 23:55:44 -07:00
Matt Pharr	18af5226ba	Initial commit.	2011-06-21 12:48:50 -07:00

32 Commits