aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
jbrodman	0755e4f8ff	Merge pull request #561 from dbabokin/neon_condition Fix for Windows build and making NEON target optional	2013-08-06 13:45:30 -07:00
Dmitry Babokin	dff7735af9	Fix for Windows build and making NEON target optional	2013-08-02 19:24:34 -07:00
Dmitry Babokin	fb34fc5a85	Merge pull request #559 from ifilippov/debug_phases Supporting dumping, switching off and debug printing of optimization phases.	2013-08-01 14:55:07 -07:00
Dmitry Babokin	43423c276f	Merge pull request #560 from ifilippov/perf Supporting perf.py on Mac OS	2013-08-01 13:20:01 -07:00
jbrodman	5ffc3a8f4c	Merge pull request #558 from dbabokin/win_examples Fix for examples to make them work on Windows properly	2013-08-01 08:02:42 -07:00
Ilia Filippov	3c06924a02	Supporting perf.py on Mac OS	2013-08-01 12:47:37 +04:00
Ilia Filippov	a174a90f86	Supporting dumping, switching off and debug printing of optimization phases	2013-08-01 11:37:52 +04:00
Dmitry Babokin	220f0b0b40	Renaming mandelbrot_tasks files to be different from mandelbrot	2013-07-30 19:53:12 -07:00
Dmitry Babokin	fa93cb7d0b	InterlockedAdd -> InterlockedExchangeAdd for better portability (InterlockedAdd is not always supported)	2013-07-29 22:46:36 -07:00
Dmitry Babokin	663ebf7857	Merge pull request #551 from mmp/constfold Improvements to constant folding.	2013-07-24 10:27:04 -07:00
Dmitry Babokin	10c0b42d0d	Merge pull request #549 from mmp/fix-tot Fix build with LLVM top-of-tree.	2013-07-23 09:14:08 -07:00
Matt Pharr	564e61c828	Improvements to constant folding. We can now do constant folding with all basic datatypes (the previous implementation handled int32 well, but had limited, if any, coverage for other datatypes.) Reduced a bit of repeated code in the constant folding implementation through template helper functions.	2013-07-22 16:12:02 -07:00
Matt Pharr	946c39a5df	Fix build with LLVM top-of-tree. The DIBuilder::getCU() method has been removed; we now just store the compilation unit returned when we call DIBuilder::createCompileUnit.	2013-07-22 15:42:52 -07:00
Jean-Luc Duprat	2948e84846	Merge pull request #547 from mmp/arm-merge Add ARM NEON support	2013-07-22 09:24:16 -07:00
Matt Pharr	068fd8098c	Explicitly set armv7-eabi target triple on ARM. This lets the compiler generate FMA instructions, which seems desirable.	2013-07-20 11:19:10 -07:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Matt Pharr	b007bba59f	Replace inline assembly in task system with equivalent gcc intrinsics. gcc/icc build only: the Windows build still uses the Win32 calls for these.	2013-07-19 23:07:24 -07:00
Dmitry Babokin	abf43ad01d	Merge pull request #546 from dbabokin/release Release 1.4.4	2013-07-19 18:49:07 -07:00
Dmitry Babokin	922895de69	Changing ISPC version to 1.4.5dev	2013-07-19 18:47:43 -07:00
Dmitry Babokin	28f0bce9f2	Release 1.4.4 v1.4.4	2013-07-19 16:22:10 -07:00
Dmitry Babokin	0f82f216a2	Merge pull request #544 from mmp/master Handle SHL with a constant vector in LLVMVectorIsLinear().	2013-07-18 11:46:11 -07:00
Matt Pharr	7454b1399c	Handle SHL with a constant vector in LLVMVectorIsLinear(). LLVM3.4 seems to be turning multiplies by a constant power of 2 into the equivalent SHL, which was in turn thwarting the pattern matching for turning gathers/scatters into vector loads/stores.	2013-07-17 14:12:43 -07:00
jbrodman	4ebf46bd63	Merge pull request #543 from mmp/master Fix build with LLVM top-of-tree	2013-07-17 10:38:06 -07:00
Matt Pharr	f1cce0ef5f	Fix build with LLVM top-of-tree	2013-07-17 09:25:00 -07:00
Dmitry Babokin	8c9e873c10	Merge pull request #540 from dbabokin/embree_bug Fix for the bug introduced by --intrumentation fix	2013-07-04 10:45:06 -07:00
Dmitry Babokin	c85439e7bb	Fix for the bug introduced by --intrumentation fix	2013-07-04 21:41:57 +04:00
Ilia Filippov	fd7f87b55e	Supporting perf.py on Windows and some small corrections in it	2013-07-02 19:23:18 +04:00
Dmitry Babokin	8be4128c5a	Merge pull request #534 from ifilippov/perf add script for measuring performance	2013-07-01 05:09:03 -07:00
Ilia Filippov	806e37338c	add script for measuring performance	2013-07-01 13:30:49 +04:00
Dmitry Babokin	ec1095624a	Merge pull request #527 from tkoziara/master examples/sort added	2013-06-25 10:11:39 -07:00
Tomasz Koziara	a23d69ebe8	Copyright changed to simplify legal matters.	2013-06-25 17:28:27 +01:00
Dmitry Babokin	0aff61ffc6	Merge pull request #533 from dbabokin/patch Quick fix for LLVM 3.3 patch	2013-06-25 08:50:32 -07:00
Dmitry Babokin	05aa540984	Quick fix for LLVM 3.3 patch	2013-06-25 19:49:41 +04:00
Dmitry Babokin	033e83e490	Merge pull request #532 from dbabokin/release_1_4_3 Release 1.4.3	2013-06-25 07:42:08 -07:00
Dmitry Babokin	594485c38c	Release 1.4.3 v1.4.3	2013-06-25 18:38:21 +04:00
Dmitry Babokin	d52e2d5a8d	License update (just dates)	2013-06-25 17:02:42 +04:00
Dmitry Babokin	1e5d852e2f	Merge pull request #531 from ifilippov/qsize_fail replacement of qsize due to it's fails on MacOS	2013-06-25 05:36:45 -07:00
Ilia Filippov	cc32d913a0	replacement of qsize due to it's fails on MacOS	2013-06-25 16:27:25 +04:00
Dmitry Babokin	fc66066d4d	Merge pull request #530 from dbabokin/llvm_fix Adding LLVM patch to fix #519 with LLVM 3.3	2013-06-25 05:22:09 -07:00
Dmitry Babokin	6169338815	Adding LLVM patch to fix #519 with LLVM 3.3	2013-06-25 16:21:14 +04:00
Tomasz Koziara	86ee8db778	Parallel prefix sum added + minor amendements.	2013-06-25 12:45:51 +01:00
Dmitry Babokin	6bc8cb1ff1	Merge pull request #529 from ifilippov/instrument_fix correction of --instrument option support	2013-06-25 03:08:02 -07:00
Dmitry Babokin	0fc49b1c37	Merge pull request #528 from ifilippov/test3 Reapplying lost commits	2013-06-25 02:14:24 -07:00
Ilia Filippov	9fb981e9a0	correction of --instrument option support	2013-06-25 12:33:23 +04:00
Ilia Filippov	cba1b3cedd	additional libraries for LLVM_3_4 build	2013-06-25 12:22:53 +04:00
Ilia Filippov	12c4512932	adding two additional libraries for LLVM_3_4 build	2013-06-25 12:22:53 +04:00
Tomasz Koziara	f2452f040d	First commit of the radix sort example.	2013-06-24 18:37:44 +01:00
Dmitry Babokin	0dd1dbb568	Merge pull request #526 from dbabokin/master Tracking LLVM trunk: removing llvm::createSimplifyLibCallsPass() call	2013-06-23 23:10:19 -07:00
Dmitry Babokin	fdcec5a219	Tracking LLVM trunk: removing llvm::createSimplifyLibCallsPass() call	2013-06-24 10:08:06 +04:00
Dmitry Babokin	bebab7ab0d	Merge pull request #525 from dbabokin/debug --debug output: stdout instead of stderr	2013-06-21 03:56:17 -07:00

1 2 3 4 5 ...

1342 Commits