Evghenii
|
294fb039fe
|
some tuning, adding cuda kernels
|
2013-11-14 22:33:58 +01:00 |
|
Evghenii
|
f12826bac5
|
+added approx rcp/rsqrt/rtz with ftz=true
|
2013-11-14 22:17:57 +01:00 |
|
Evghenii
|
2c8afde6d9
|
chaning MF
|
2013-11-14 21:38:25 +01:00 |
|
Evghenii
|
1445202e0e
|
identified bug due to llvm-3.4
|
2013-11-14 21:18:25 +01:00 |
|
Evghenii
|
1b940fd41e
|
+1
|
2013-11-14 20:19:59 +01:00 |
|
Evghenii
|
f1fc3bdfba
|
added nvptx declaration to other target & fixed nvptx64 recognition
|
2013-11-14 20:12:58 +01:00 |
|
Evghenii
|
7aa37b19a9
|
added some more macros as quick hack...
|
2013-11-14 20:04:05 +01:00 |
|
Evghenii
|
967a49dd66
|
+1
|
2013-11-14 19:54:18 +01:00 |
|
Evghenii
|
25df23fed3
|
workaround for programIndex via preprocessor
|
2013-11-14 19:48:50 +01:00 |
|
Evghenii
|
e162d5a99d
|
programIndex still not working, found where change is needed...
|
2013-11-14 19:46:08 +01:00 |
|
Evghenii
|
918ca339b6
|
now programIndex returns laneIdx = %tid.x & (%warpsize-1) & programCount returns 32
|
2013-11-14 19:27:52 +01:00 |
|
Evghenii
|
8bb8f0eda4
|
+1
|
2013-11-14 17:04:50 +01:00 |
|
Evghenii
|
be2cc8f946
|
restored foreach in sort
|
2013-11-14 16:51:59 +01:00 |
|
Evghenii
|
599ada8354
|
added deferred shading foreach_tile
|
2013-11-14 16:49:47 +01:00 |
|
Evghenii
|
83b9cc5c0a
|
+1
|
2013-11-14 16:44:09 +01:00 |
|
Evghenii
|
af75afeb7a
|
foreach[_tiled] seems to work now
|
2013-11-14 16:29:40 +01:00 |
|
Dmitry Babokin
|
42e181112a
|
Add avx1-i32x4 to the list of supported targets
|
2013-11-14 16:21:30 +04:00 |
|
Dmitry Babokin
|
801f78f8a8
|
Rebuild *.ispc when necessary
|
2013-11-14 15:37:11 +04:00 |
|
Dmitry Babokin
|
e100040f28
|
Fix bug with fail when --target=avx1.1-i32x8,avx2-i32x8 - avx11 is not a valid target anymore, need more complete string
|
2013-11-14 15:37:11 +04:00 |
|
Dmitry Babokin
|
b8a39a1b26
|
minor improvements in examples/common.mk
|
2013-11-14 15:37:10 +04:00 |
|
Dmitry Babokin
|
8f768633ad
|
Make perf.py changes work as part of alloy.py
|
2013-11-14 15:37:10 +04:00 |
|
Dmitry Babokin
|
65ea6fd48a
|
Reasoning to use sse4 bitcode file
|
2013-11-14 15:34:30 +04:00 |
|
Dmitry Babokin
|
d2c7b356cc
|
Ordering functions in target-[avx|sse2].ll to be in the same order. No real changes, except adding a few alwaysinline in SSE4 target
|
2013-11-14 15:34:30 +04:00 |
|
Dmitry Babokin
|
af58955140
|
target-[sse4|avx]_common.ll are twin brothers, which diffes only cosmetically. This commit makes them diffable. No real changes, except adding alwaysinline to sse version iof __max_uniform_int32/__max_uniform_uint32
|
2013-11-14 15:34:30 +04:00 |
|
Dmitry Babokin
|
ffc9a33933
|
avx1-i32x4 implementation as sse4-i32x4 with avx target-feature flag
|
2013-11-14 15:34:30 +04:00 |
|
Dmitry Babokin
|
fbab9874f6
|
perf.py - target switch was added
|
2013-11-14 15:34:30 +04:00 |
|
Dmitry Babokin
|
017e7890f7
|
Examples makefiles to support setting single target via ISPC_IA_TARGETS
|
2013-11-14 15:34:30 +04:00 |
|
Evghenii
|
48644813d4
|
stmt.cpp forking on foreach
|
2013-11-14 11:30:22 +01:00 |
|
evghenii
|
c81821ed28
|
+1
|
2013-11-13 21:17:21 +01:00 |
|
Evghenii
|
42cfe97427
|
using now cuda_ispc.h
|
2013-11-13 21:06:40 +01:00 |
|
Evghenii
|
09a2c12ea0
|
added cuda_ispc.h & cuda eror_strings
|
2013-11-13 21:04:59 +01:00 |
|
Evghenii
|
a0f6f264f6
|
fixed problem with new/delete and added Mel/sec counter
|
2013-11-13 20:34:01 +01:00 |
|
Evghenii
|
6f9cea5b58
|
removed binary
|
2013-11-13 19:43:45 +01:00 |
|
Evghenii
|
dd4ac42491
|
added print m
|
2013-11-13 19:43:32 +01:00 |
|
Evghenii
|
01df6ed4a9
|
added ispc timers w/o task
|
2013-11-13 19:13:04 +01:00 |
|
Evghenii
|
e71259006c
|
+1
|
2013-11-13 19:06:02 +01:00 |
|
Evghenii
|
0f161b500f
|
+1
|
2013-11-13 19:02:45 +01:00 |
|
Evghenii
|
e442139c39
|
runs, next check correctness
|
2013-11-13 18:15:52 +01:00 |
|
Evghenii
|
8b0f871c06
|
+1
|
2013-11-13 17:23:23 +01:00 |
|
Evghenii
|
61fab0340c
|
working on sort
|
2013-11-13 17:07:55 +01:00 |
|
Evghenii
|
525eacd035
|
+1
|
2013-11-13 16:32:56 +01:00 |
|
Evghenii
|
cddddfd255
|
+1
|
2013-11-13 16:23:24 +01:00 |
|
Evghenii
|
780e9f31fe
|
some tuning
|
2013-11-13 16:23:05 +01:00 |
|
Evghenii
|
c0b54aa58c
|
added Makefile_gpu
|
2013-11-13 16:20:51 +01:00 |
|
Evghenii
|
c0c1cc1ba7
|
+added Makefile and some fixes
|
2013-11-13 14:16:48 +01:00 |
|
Evghenii
|
dededd1929
|
cleaned
|
2013-11-13 13:56:45 +01:00 |
|
Evghenii
|
6cd8a8f895
|
cleaned-up
|
2013-11-13 13:47:53 +01:00 |
|
Evghenii
|
d3ade0654e
|
added Makefile
|
2013-11-13 13:45:24 +01:00 |
|
Evghenii
|
2dd7128db5
|
added Makefile
|
2013-11-13 13:40:08 +01:00 |
|
Evghenii
|
1f13a236bf
|
small tuning
|
2013-11-13 13:03:26 +01:00 |
|