aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Matt Pharr	0575b1f38d	Update run_tests and examples makefile for scalar target. Fixed a number of tests that didn't handle the programCount == 1 case correctly.	2012-01-29 16:22:25 -08:00
Matt Pharr	f6cd01f7cf	Windows build support for scalar target.	2012-01-29 13:48:01 -08:00
Matt Pharr	f2fbc168af	Scalar target builtins bugfixes. Typo in __max_varying_double. Add declarations for half functions. Use the gen_scatter macro to get the scatter functions.	2012-01-29 13:47:44 -08:00
Matt Pharr	b50f6f1730	Fix RNG seed code in stdlib for scalar target.	2012-01-29 13:46:57 -08:00
Matt Pharr	f8a7120d9c	Detect division by 0 during constant folding and issue a sensible error.	2012-01-29 13:46:38 -08:00
Matt Pharr	20dbf59420	Don't lose source position when returning values of constant symbols.	2012-01-29 13:46:17 -08:00
Gabe Weisz	c67a286aa6	Add support for 1-wide scalar target. Issue #40.	2012-01-29 06:36:07 -08:00
Matt Pharr	c96fef6bc8	Fix silly error in generic-16.h example C++ bindings.	2012-01-27 17:04:57 -08:00
Matt Pharr	bba02f87ea	Improve implementations of unsigned <=, >= in sse4 intrinsics file.	2012-01-27 16:49:41 -08:00
Matt Pharr	12dc3f5c28	Fixes to c++ backend for new and delete Don't include declarations of malloc/free in the generated code (get the standard ones from system headers instead). Add a cast to (uint8_t ) before calls to malloc, which C++ requires, since proper malloc returns a void .	2012-01-27 16:49:09 -08:00
Matt Pharr	0f01a5dcbe	Handle undef values in LLVMVectorValuesAllEqual()	2012-01-27 16:48:14 -08:00
Matt Pharr	664dc3bdda	Add support for "new" and "delete" to the language. Issue #139.	2012-01-27 14:47:06 -08:00
Matt Pharr	bdba3cd97d	Bugfix: add per-lane offsets when accessing varying data through a pointer!	2012-01-27 14:44:52 -08:00
Matt Pharr	d9c0f9315a	Fix generic targets: half conversion functions weren't declared. (Broken by `1867b5b31`).	2012-01-27 14:44:43 -08:00
Matt Pharr	b7f17d435f	Fix crash in gather/scatter optimization pass.	2012-01-27 14:44:35 -08:00
Matt Pharr	37cdc18639	Issue error instead of crashing given attempted function call through non-function. Fixes issue #163.	2012-01-27 10:01:06 -08:00
Matt Pharr	5893a9c49d	Remove incorrect assert	2012-01-27 09:14:45 -08:00
Matt Pharr	24f58fa16a	Update per_lane macro to not use ID for lane number in macro expansion This was leading to unintended consequences if WIDTH was used in macro code, which was undesirable.	2012-01-27 09:12:13 -08:00
Matt Pharr	56ffc78fa4	Require semicolons after sync, assert, and print statements. (Silly parser oversight.)	2012-01-27 09:12:13 -08:00
Matt Pharr	061e68bc77	Fix compiler crash from malformed program.	2012-01-27 09:12:13 -08:00
Matt Pharr	177e6312b4	Fix build with LLVM ToT (ConstantVector::getVectorElements() is gone now).	2012-01-27 09:07:58 -08:00
Matt Pharr	1acf4032c2	Merge branch 'master' of https://github.com/jduprat/ispc	2012-01-26 14:18:25 -08:00
Jean-Luc Duprat	9c5444698e	run_tests.py fixes: - Python 3 fixes (can't use print) - Fixed for running tests on Windows	2012-01-26 13:39:54 -08:00
Matt Pharr	65f3252760	Various fixes to test running script for Windows. Also, removed the --valgrind option and replaced it with a more general --wrap-exe option, which can be used both for running Valgrind and SDE.	2012-01-26 10:56:29 -08:00
Matt Pharr	e612abe4ba	Fix parsing of 64-bit integer constants on Windows. (i.e., use the 64-bit unsigned integer parsing function, not the 64-bit signed one.) Fixes bug #68.	2012-01-26 10:56:28 -08:00
Jean-Luc Duprat	34352e4e0e	beefed up stdin.h on Windows so it compiles ispc 1.1.3	2012-01-25 15:04:19 -08:00
Matt Pharr	1867b5b317	Use native float/half conversion instructions with the AVX2 target.	2012-01-24 15:33:38 -08:00
Matt Pharr	a5b7fca7e0	Extract constant offsets from gather/scatter base+offsets offset vectors. When we're able to turn a general gather/scatter into the "base + offsets" form, we now try to extract out any constant components of the offsets and then pass them as a separate parameter to the gather/scatter function implementation. We then in turn carefully emit code for the addressing calculation so that these constant offsets match LLVM's patterns to detect this case, such that we get the constant offsets directly encoded in the instruction's addressing calculation in many cases, saving arithmetic instructions to do these calculations. Improves performance of stencil by ~15%. Other workloads unchanged.	2012-01-24 14:41:15 -08:00
Matt Pharr	7be2c399b1	Rename various optimization passes to have more descriptive names. No functionality change.	2012-01-23 14:49:48 -08:00
Matt Pharr	d6337b3b22	Code cleanups in opt.cpp; no functional change	2012-01-23 14:36:32 -08:00
Matt Pharr	d2f8b0ace5	Add __clock to list of symbols to make internal from builtins.	2012-01-23 06:19:16 -08:00
Matt Pharr	d805e8b183	Add clock() function to standard library. Also corrected the declaration of num_cores() to return a uniform value.	2012-01-22 13:05:27 -08:00
Matt Pharr	1f0f2ec05f	Include AVX2 in supported ISAs	2012-01-22 07:05:47 -08:00
Matt Pharr	91ac3b9d7c	Back out WIP changes to opt.cpp that were inadvertently checked in.	2012-01-21 07:34:53 -08:00
Matt Pharr	d65bf2eb2f	Doxygen number bump and release notes for 1.1.3 v1.1.3	2012-01-20 17:04:16 -08:00
Matt Pharr	1bba9d4307	Improve atomic_swap_global() to take advantage of associativity. We now do a single atomic hardware swap and then effectively do swaps between the running program instances such that the result is the same as if they had happened to run a particular ordering of hardware swaps themselves. Also cleaned up __atomic_swap_uniform_* built-in implementations to not take the mask, which they weren't using anyway. Finishes Issue #56.	2012-01-20 10:37:33 -08:00
Matt Pharr	4388338dad	Fix performance regression introduced in `be0c77d556` Effectively, the patterns that detected when given a gather or scatter in base+offsets form, the offsets were actually a multiple of 2/4/8, were no longer working. This change not only fixes this, but also expands the set of patterns that are matched by this. For example, given offsets of the form 4v1 + 16v2, it identifies a scale of 4 and new offsets of v1 + 4*v2. This fix makes the volume renderer run 1.19x faster, and noise 1.54x faster.	2012-01-19 17:57:59 -08:00
Matt Pharr	2fb59c90cf	Fix C++ backend bug introduced in `d14a2de168`. (This was causing a number of tests to fail with the generic targets.)	2012-01-19 11:35:02 -07:00
Matt Pharr	68f6ea8def	For << and >> with C++, detect when all instances are shifting by the same amount. In this case, we now emit calls to potentially-specialized functions for the left/right shifts that take a single integer value for the shift amount. These in turn can be matched to the corresponding intrinsics for the SSE target. Issue #145.	2012-01-19 10:04:32 -07:00
Matt Pharr	3f89295d10	Update RNG code in stdlib to use -> operator where appropriate.	2012-01-19 10:02:47 -07:00
Matt Pharr	748b292e77	Improve code for uniform switches with a 'break' under varying control flow. Previously, when we had a switch statement with a uniform switch condition but a 'break' statement that was under varying control flow inside the switch, we'd promote the switch condition to be varying so that the break would work correctly. Now, we leave the condition as uniform and are thus able to use the more-efficient LLVM switch instruction in this case. Issue #156.	2012-01-19 08:41:19 -07:00
Matt Pharr	6451c3d99d	Fix bug with code for initializers for static arrays in generated C++ code. (This was preventing aobench from compiling successfully with the generic target.)	2012-01-18 16:55:09 -07:00
Matt Pharr	d14a2de168	Fix generic code emission when building with LLVM3.0/2.9. Specifically, don't use vector select for masked store blend there, but emit a call to a undefined __masked_store_blend_*() functions. Added implementations of these functions to the sse4.h and generic-16.h in examples/instrinsics. (Calls to these will never be generated with LLVM 3.1).	2012-01-17 23:42:22 -07:00
Matt Pharr	642150095d	Include LLVM version used to build in version info printed out.	2012-01-17 23:42:22 -07:00
Matt Pharr	3bf3ac7922	Be more conservative about using blending in place of masked store. More specifically, we do a proper masked store (rather than a load- blend-store) unless we can determine that we're accessing a stack-allocated "varying" variable. This fixes a number of nefarious bugs where given code like: uniform float a[21]; foreach (i = 0 … 21) a[i] = 0; We'd use a blend and in turn read past the end of a[] in the last iteration. Also made slight changes to inlining in aobench; this keeps compiles to ~5s, versus ~45s without them (with this change). Fixes issue #160.	2012-01-17 23:42:22 -07:00
Matt Pharr	c6d1cebad4	Update masked_load/store implementations for generic targets to take void *s (Fixes compile errors when we try to actually use these!)	2012-01-17 23:42:22 -07:00
Matt Pharr	08189ce08c	Update "inline" qualifiers in a few examples.	2012-01-17 23:42:22 -07:00
Matt Pharr	7013d7d52f	Small documentation updates and cleanups	2012-01-17 23:42:21 -07:00
Matt Pharr	7045b76f84	Improvements to code generation for "foreach" Specialize the code for the innermost loop to not do any masking computations for the innermost dimension for the iterations where we are certainly working on a full vector's worth of data. This fix improves performance/code quality of "foreach" such that it's essentially the same as the equivalent "for" loop. Fixes issue #151.	2012-01-17 11:34:00 -08:00
Matt Pharr	58a0b4a20d	Add separate set of builtins for AVX2. (i.e., stop just reusing the ones for AVX1). For now the only difference is that the int/uint min/max functions call the new intrinsic for that. Once gather is available from LLVM, that will go here as well.	2012-01-13 14:40:01 -08:00

1 2 3 4 5 ...

550 Commits