aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
Evghenii	785b2f5d24	added examples	2014-01-06 14:00:36 +01:00
Evghenii	2d8da306a1	merged with master	2013-12-25 21:32:34 +01:00
Ilia Filippov	7bf64bc490	changes in examples (windows)	2013-12-19 21:13:09 +04:00
Evghenii	ddfe782151	merged	2013-12-13 11:56:43 +01:00
Ilia Filippov	f3ff1fcbeb	supporting targets in perf windows	2013-11-26 19:12:02 +04:00
Ilia Filippov	935800d7f6	making common.props	2013-11-26 18:58:49 +04:00
evghenii	bb46b561fd	Merged with upstream/master	2013-11-22 08:13:16 +01:00
Dmitry Babokin	017e7890f7	Examples makefiles to support setting single target via ISPC_IA_TARGETS	2013-11-14 15:34:30 +04:00
Evghenii	ce5f8cd46f	replaced with fresh examples	2013-11-08 14:17:26 +01:00
Ilia Filippov	a910bfb539	Windows support	2013-11-05 16:31:01 +04:00
Ilia Filippov	87cecddabb	adding sort to performance checking	2013-09-20 18:57:20 +04:00
Dmitry Babokin	b258027061	Merge pull request #582 from tkoziara/master Uniform memory allocation in sort example is fixed.	2013-09-16 03:29:43 -07:00
Tomasz Koziara	97068765e8	Copyright reversed.	2013-09-14 18:09:04 +01:00
Tomasz Koziara	ed825b3773	Uniform memory allocation fixed.	2013-09-13 13:14:31 +01:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Tomasz Koziara	a23d69ebe8	Copyright changed to simplify legal matters.	2013-06-25 17:28:27 +01:00
Tomasz Koziara	86ee8db778	Parallel prefix sum added + minor amendements.	2013-06-25 12:45:51 +01:00
Tomasz Koziara	f2452f040d	First commit of the radix sort example.	2013-06-24 18:37:44 +01:00

18 Commits