aaron/ispc - ispc - git.frat.tech

aaron/ispc

Author	SHA1	Message	Date
egaburov	ade8751442	taskIndex_x,y,z are passed to the task	2013-10-23 08:39:17 +02:00
evghenii	fb1a2a0a40	__masked_store_* uses vscatter now, and is thread-safe	2013-10-15 17:10:46 +03:00
Dmitry Babokin	99df2d9dbf	Switch examples on Unix from using g++ to clang++	2013-10-11 16:29:17 +04:00
evghenii	3da152a150	fixed zmm __mul for i64 with icc < 14.0.0, 4 knc::fails lefts, but I doubt these are due to this include..	2013-10-07 18:30:22 +03:00
evghenii	4222605f87	fixed lshr/ashr/shl shifts. __mul i64 vector version for icc < 14.0.0 works only on signed, so commented it out in favour of sequential	2013-10-07 14:24:27 +03:00
evghenii	1b196520f6	knc-i1x16.h is cleaned: int32,float,double are complete, int64 is partially complete	2013-10-05 22:10:05 +03:00
evghenii	10223cfac3	workong on shuffle/rotate for double, there seems to be a bug in cvt2zmm cvt2hilo	2013-10-05 15:23:55 +03:00
evghenii	8b0fc558cb	complete cleaning	2013-10-05 14:15:33 +03:00
evghenii	8a6789ef61	cleaned float added fails info	2013-10-04 14:11:09 +03:00
evghenii	57f019a6e0	cleaned int64 added fails info	2013-10-04 13:39:15 +03:00
evghenii	32c77be2f3	cleaned mask & int32, only test141 fails	2013-10-04 11:42:52 +03:00
Dmitry Babokin	2741e3c1d0	Merge pull request #616 from jbrodman/master Adding missing typecasts and guarding i64 __mul with icc version check	2013-10-01 08:59:52 -07:00
james.brodman	dc8895352a	Adding missing typecasts and guarding i64 __mul with compiler version check	2013-10-01 11:53:56 -04:00
Dmitry Babokin	c7b4164122	Redefining ISPC should not discard ISPC_FLAGS	2013-10-01 18:40:26 +04:00
Dmitry Babokin	2d6f7a7c93	Support i686 architecture recognition as x86 and enable 32 bit x86 platforms	2013-10-01 17:37:34 +04:00
jbrodman	39c2274f1a	Merge pull request #588 from egaburov/knc-modes Added knc-i1x16.h , knc-i1x8.h and knc-i1x8unsafe_fast.h	2013-09-27 11:20:56 -07:00
Dmitry Babokin	23cb59427d	Merge pull request #607 from ifilippov/testing correction of test system	2013-09-26 04:02:49 -07:00
Ilia Filippov	1c858c34f7	correction of test system	2013-09-26 14:54:15 +04:00
evghenii	019043f55e	patched half2float & float2half to pass the tests. Now only test-141 is failed. but it seems to be test rather than knc-i1x16.h related	2013-09-23 09:55:55 +03:00
Ilia Filippov	87cecddabb	adding sort to performance checking	2013-09-20 18:57:20 +04:00
evghenii	ddecdeb834	move remaining int64 from knc.h some of fails to pass tests, grep for evghenii::fails to find out which functions fail and on what tests	2013-09-20 14:55:15 +03:00
evghenii	5cabf0bef0	adding int64 support form knc.h, phase 1. bugs: __lshr & __ashr fail idiv.ispc test, __equal_i64 & __equal_i64_and_mask fails reduce_equal_8.ispc test	2013-09-20 14:13:40 +03:00
evghenii	0ed89e93fa	added fails info	2013-09-19 16:34:06 +03:00
egaburov	d68dbbc7bc	Merge remote-tracking branch 'upstream/master' into knc-modes	2013-09-19 15:08:17 +02:00
evghenii	0c274212c2	performance tuning for knc-i1x8.h. this gives goed enough performance for double only. float performance is terrible	2013-09-19 16:07:22 +03:00
evghenii	dbef4fd7d7	fixed notation	2013-09-19 14:52:22 +03:00
evghenii	6a21218c13	fix warrning and add KNC 1	2013-09-19 13:45:31 +03:00
Ilia Filippov	00cd90c6b0	test system	2013-09-19 12:26:57 +04:00
evghenii	3cf63362a4	small tuning	2013-09-18 20:03:08 +03:00
evghenii	e4b1f58595	performance fix.. still some issues left with equal_i1 for __vec8_i1	2013-09-18 19:14:41 +03:00
evghenii	4b1a0b4bc4	added fails	2013-09-18 18:41:22 +03:00
evghenii	922edb1128	completed knc-i1x16.h and added knc-i1x8.h with knc-i1x8unsafe_fast.h that doesnt pass several tests..	2013-09-18 18:14:07 +03:00
Dmitry Babokin	191d9dede5	Merge pull request #585 from tkoziara/master Sort description.	2013-09-16 10:08:49 -07:00
Tomasz Koziara	6e0b9ddc74	Sort description.	2013-09-16 18:02:07 +01:00
Dmitry Babokin	b258027061	Merge pull request #582 from tkoziara/master Uniform memory allocation in sort example is fixed.	2013-09-16 03:29:43 -07:00
Tomasz Koziara	97068765e8	Copyright reversed.	2013-09-14 18:09:04 +01:00
Tomasz Koziara	ed825b3773	Uniform memory allocation fixed.	2013-09-13 13:14:31 +01:00
Ilia Filippov	f620cdbaa1	Changes in perf.py functionality, unification of examples, correction build warnings	2013-08-26 14:04:59 +04:00
Matt Pharr	2b2905b567	Fix (preexisting) bugs in generic-32/64.h with type of "__any", etc. This should be a bool, not a one-wide vector of bools. The equivalent fix was previously made in generic-16.h, but not made here. (Note that many tests are still failing with these targets, but at least they compile properly now.)	2013-08-20 09:05:50 -07:00
Matt Pharr	e7f067d70c	Fix handling of __clock() builtin for "generic" targets.	2013-08-20 09:04:52 -07:00
Matt Pharr	7ab4c5391c	Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target.	2013-08-09 19:56:43 -07:00
Matt Pharr	cd9afe946c	Merge branch 'master' into arm Conflicts: Makefile builtins.cpp ispc.cpp ispc.h ispc.vcxproj opt.cpp	2013-08-06 17:39:21 -07:00
Dmitry Babokin	43423c276f	Merge pull request #560 from ifilippov/perf Supporting perf.py on Mac OS	2013-08-01 13:20:01 -07:00
Ilia Filippov	3c06924a02	Supporting perf.py on Mac OS	2013-08-01 12:47:37 +04:00
Dmitry Babokin	220f0b0b40	Renaming mandelbrot_tasks files to be different from mandelbrot	2013-07-30 19:53:12 -07:00
Dmitry Babokin	fa93cb7d0b	InterlockedAdd -> InterlockedExchangeAdd for better portability (InterlockedAdd is not always supported)	2013-07-29 22:46:36 -07:00
Matt Pharr	b6df447b55	Add reduce_add() for int8 and int16 types. This maps to specialized instructions (e.g. PSADBW) when available.	2013-07-25 09:46:01 -07:00
Matt Pharr	d7b0c5794e	Add support for ARM NEON targets. Initial support for ARM NEON on Cortex-A9 and A15 CPUs. All but ~10 tests pass, and all examples compile and run correctly. Most of the examples show a ~2x speedup on a single A15 core versus scalar code. Current open issues/TODOs - Code quality looks decent, but hasn't been carefully examined. Known issues/opportunities for improvement include: - fp32 vector divide is done as a series of scalar divides rather than a vector divide (which I believe exists, but I may be mistaken.) This is particularly harmful to examples/rt, which only runs ~1.5x faster with ispc, likely due to long chains of scalar divides. - The compiler isn't generating a vmin.f32 for e.g. the final scalar min in reduce_min(); instead it's generating a compare and then a select instruction (and similarly elsewhere). - There are some additional FIXMEs in builtins/target-neon.ll that include both a few pieces of missing functionality (e.g. rounding doubles) as well as places that deserve attention for possible code quality improvements. - Currently only the "cortex-a9" and "cortex-15" CPU targets are supported; LLVM supports many other ARM CPUs and ispc should provide access to all of the ones that have NEON support (and aren't too obscure.) - ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately only when the compiler runs on an ARM host, though). - The Windows build hasn't been tested (though I've tried to update ispc.vcxproj appropriately). It may just work, but will more likely have various small issues.) - Anything related to 64-bit ARM has seen no attention.	2013-07-19 23:07:24 -07:00
Matt Pharr	b007bba59f	Replace inline assembly in task system with equivalent gcc intrinsics. gcc/icc build only: the Windows build still uses the Win32 calls for these.	2013-07-19 23:07:24 -07:00
Ilia Filippov	fd7f87b55e	Supporting perf.py on Windows and some small corrections in it	2013-07-02 19:23:18 +04:00

1 2 3 4 5

209 Commits