This website requires JavaScript.
6b65f6d9f4
+1
Evghenii
2013-11-15 16:21:09 +01:00
c93e71698e
restored intrinsics and added tuning options to ptxgen
Evghenii
2013-11-15 15:04:04 +01:00
f9d2ede83c
+1
Evghenii
2013-11-14 23:15:51 +01:00
53bf4573f0
+fix
Evghenii
2013-11-14 23:05:44 +01:00
86652738c0
working on rt
Evghenii
2013-11-14 22:54:37 +01:00
294fb039fe
some tuning, adding cuda kernels
Evghenii
2013-11-14 22:33:58 +01:00
f12826bac5
+added approx rcp/rsqrt/rtz with ftz=true
Evghenii
2013-11-14 22:17:57 +01:00
2c8afde6d9
chaning MF
Evghenii
2013-11-14 21:38:25 +01:00
1445202e0e
identified bug due to llvm-3.4
Evghenii
2013-11-14 21:18:25 +01:00
1b940fd41e
+1
Evghenii
2013-11-14 20:19:59 +01:00
f1fc3bdfba
added nvptx declaration to other target & fixed nvptx64 recognition
Evghenii
2013-11-14 20:12:58 +01:00
7aa37b19a9
added some more macros as quick hack...
Evghenii
2013-11-14 20:04:05 +01:00
967a49dd66
+1
Evghenii
2013-11-14 19:54:18 +01:00
25df23fed3
workaround for programIndex via preprocessor
Evghenii
2013-11-14 19:48:50 +01:00
e162d5a99d
programIndex still not working, found where change is needed...
Evghenii
2013-11-14 19:46:08 +01:00
918ca339b6
now programIndex returns laneIdx = %tid.x & (%warpsize-1) & programCount returns 32
Evghenii
2013-11-14 19:27:52 +01:00
8bb8f0eda4
+1
Evghenii
2013-11-14 17:04:50 +01:00
be2cc8f946
restored foreach in sort
Evghenii
2013-11-14 16:51:59 +01:00
599ada8354
added deferred shading foreach_tile
Evghenii
2013-11-14 16:49:47 +01:00
83b9cc5c0a
+1
Evghenii
2013-11-14 16:44:09 +01:00
af75afeb7a
foreach[_tiled] seems to work now
Evghenii
2013-11-14 16:29:40 +01:00
42e181112a
Add avx1-i32x4 to the list of supported targets
Dmitry Babokin
2013-11-14 16:21:30 +04:00
801f78f8a8
Rebuild *.ispc when necessary
Dmitry Babokin
2013-11-13 22:48:14 +04:00
e100040f28
Fix bug with fail when --target=avx1.1-i32x8,avx2-i32x8 - avx11 is not a valid target anymore, need more complete string
Dmitry Babokin
2013-11-13 22:35:37 +04:00
b8a39a1b26
minor improvements in examples/common.mk
Dmitry Babokin
2013-11-13 16:34:10 +04:00
8f768633ad
Make perf.py changes work as part of alloy.py
Dmitry Babokin
2013-11-13 15:07:21 +04:00
65ea6fd48a
Reasoning to use sse4 bitcode file
Dmitry Babokin
2013-11-13 13:15:01 +04:00
d2c7b356cc
Ordering functions in target-[avx|sse2].ll to be in the same order. No real changes, except adding a few alwaysinline in SSE4 target
Dmitry Babokin
2013-11-12 14:56:52 +04:00
af58955140
target-[sse4|avx]_common.ll are twin brothers, which diffes only cosmetically. This commit makes them diffable. No real changes, except adding alwaysinline to sse version iof __max_uniform_int32/__max_uniform_uint32
Dmitry Babokin
2013-11-12 10:00:42 +04:00
ffc9a33933
avx1-i32x4 implementation as sse4-i32x4 with avx target-feature flag
Dmitry Babokin
2013-11-10 23:48:49 +04:00
fbab9874f6
perf.py - target switch was added
Dmitry Babokin
2013-11-10 23:47:19 +04:00
017e7890f7
Examples makefiles to support setting single target via ISPC_IA_TARGETS
Dmitry Babokin
2013-11-10 02:58:48 +04:00
48644813d4
stmt.cpp forking on foreach
Evghenii
2013-11-14 11:30:22 +01:00
c81821ed28
+1
evghenii
2013-11-13 21:17:21 +01:00
42cfe97427
using now cuda_ispc.h
Evghenii
2013-11-13 21:06:40 +01:00
09a2c12ea0
added cuda_ispc.h & cuda eror_strings
Evghenii
2013-11-13 21:04:59 +01:00
a0f6f264f6
fixed problem with new/delete and added Mel/sec counter
Evghenii
2013-11-13 20:34:01 +01:00
6f9cea5b58
removed binary
Evghenii
2013-11-13 19:43:45 +01:00
dd4ac42491
added print m
Evghenii
2013-11-13 19:43:32 +01:00
01df6ed4a9
added ispc timers w/o task
Evghenii
2013-11-13 19:13:04 +01:00
e71259006c
+1
Evghenii
2013-11-13 19:06:02 +01:00
0f161b500f
+1
Evghenii
2013-11-13 19:02:45 +01:00
e442139c39
runs, next check correctness
Evghenii
2013-11-13 18:15:52 +01:00
8b0f871c06
+1
Evghenii
2013-11-13 17:23:23 +01:00
61fab0340c
working on sort
Evghenii
2013-11-13 17:07:55 +01:00
525eacd035
+1
Evghenii
2013-11-13 16:32:56 +01:00
cddddfd255
+1
Evghenii
2013-11-13 16:23:24 +01:00
780e9f31fe
some tuning
Evghenii
2013-11-13 16:23:05 +01:00
c0b54aa58c
added Makefile_gpu
Evghenii
2013-11-13 16:20:51 +01:00
c0c1cc1ba7
+added Makefile and some fixes
Evghenii
2013-11-13 14:16:48 +01:00
dededd1929
cleaned
Evghenii
2013-11-13 13:56:45 +01:00
6cd8a8f895
cleaned-up
Evghenii
2013-11-13 13:47:53 +01:00
d3ade0654e
added Makefile
Evghenii
2013-11-13 13:45:24 +01:00
2dd7128db5
added Makefile
Evghenii
2013-11-13 13:40:08 +01:00
1f13a236bf
small tuning
Evghenii
2013-11-13 13:03:26 +01:00
ca1dbc3d3b
fixed cuda kernel
Evghenii
2013-11-13 12:54:52 +01:00
74db8cbab3
+1
Evghenii
2013-11-13 12:12:09 +01:00
62bc39e600
+CDP works with deferred shading
Evghenii
2013-11-13 11:57:37 +01:00
268be7f0b5
fixed ISPCSync functionality
Evghenii
2013-11-13 11:19:10 +01:00
55bf0d23c2
resotred non-ptx functionality
Evghenii
2013-11-13 11:08:58 +01:00
f433aa3ad5
CDP works now
Evghenii
2013-11-13 10:43:52 +01:00
f587e0a459
handwired CDP launch
Evghenii
2013-11-12 21:20:10 +01:00
76bfcc29c2
ao1.ispc is not functional just yet :S
Evghenii
2013-11-12 19:30:41 +01:00
1d91a626f2
ISPC sync is not added
Evghenii
2013-11-12 17:02:31 +01:00
dbde936c3c
bugfix in inlined ptx, now NVCC also compiles the ptx
Evghenii
2013-11-12 16:47:47 +01:00
cf679187b1
added CDP calls into IR, next step ... check :)
Evghenii
2013-11-12 16:39:22 +01:00
fd17ad236a
export functions are now also generated... next add proper CDP calls..
Evghenii
2013-11-12 14:05:12 +01:00
dbb96c1885
need to fix launch code
Evghenii
2013-11-12 13:41:03 +01:00
4cd7e10ad3
reversed to original changes. Here is the plan to use CDP and genarate only device code with host wrapper..
Evghenii
2013-11-12 12:51:56 +01:00
3fd76d59ea
+1
Evghenii
2013-11-12 11:32:42 +01:00
f445a470df
handwired CDP launch
Evghenii
2013-11-12 11:25:43 +01:00
4e5299a9bf
added CDP
Evghenii
2013-11-12 11:19:23 +01:00
a6afef9f3f
+added some more mem management stuff
Evghenii
2013-11-12 08:31:45 +01:00
6a1fb8ea31
some kernel tuning
Evghenii
2013-11-11 14:24:13 +01:00
f2c66dc4c3
added any/none/all for bool
Evghenii
2013-11-11 12:59:40 +01:00
a91c8e15e2
added reduce_min/max_float, packed_store_active for CUDA, and now kerenls1.ispc just work :)
Evghenii
2013-11-11 12:33:39 +01:00
9c7a842163
ptx has support for half-float
Evghenii
2013-11-11 12:25:47 +01:00
3dd6173a65
added packed_store_active that can be called with active flag
Evghenii
2013-11-11 12:25:15 +01:00
e9bc2b7b54
added uniform_new/uniform_delete in util_ptx.m4 and __shfl intrinsics
Evghenii
2013-11-11 09:18:15 +01:00
38947ab71b
made CU version working
Evghenii
2013-11-10 20:10:37 +01:00
8a7801264a
added tuned code
Evghenii
2013-11-10 16:02:10 +01:00
66edc180be
working on aobench
Evghenii
2013-11-10 14:29:53 +01:00
17809992d7
working on ao
Evghenii
2013-11-10 14:26:00 +01:00
c10033211b
removed
evghenii
2013-11-10 14:17:59 +01:00
7d4ea1b6f0
added wc-timer
Evghenii
2013-11-10 14:15:16 +01:00
0dfe823c32
added kernels that use shared memory
Evghenii
2013-11-10 14:06:06 +01:00
bef275f62c
amadded drv_api_error_String.h
Evghenii
2013-11-10 14:05:34 +01:00
edb4c57e3d
+added host code as well and restored original main.cpp
evghenii
2013-11-10 14:07:15 +01:00
c1b3face8f
change time from sec to ms
evghenii
2013-11-10 14:04:01 +01:00
9d23c10475
deffered_shading probilem identified. need solution
Evghenii
2013-11-10 13:59:41 +01:00
78d509dba5
working on deferred shading
Evghenii
2013-11-10 12:10:10 +01:00
1a37135f98
+1
Evghenii
2013-11-09 21:23:34 +01:00
dbd0581cb3
+added CUDA code
Evghenii
2013-11-09 21:05:28 +01:00
946530019a
Merge branch 'nvptx' of github.com:egaburov/ispc into nvptx
Evghenii
2013-11-09 20:56:55 +01:00
8f6f6d10e7
+some tuning
Evghenii
2013-11-09 20:56:48 +01:00
3a549e5c2f
xeonphi tests added for rt
evghenii
2013-11-09 19:26:19 +01:00
dc7015c5f2
added wc-timer for host code
evghenii
2013-11-09 19:08:08 +01:00
356e9c6810
+fixed rt.cpp to compile with nvvm
Evghenii
2013-11-09 19:02:14 +01:00
d0ddec469a
Merge branch 'master' into nvptx
egaburov
2013-11-08 15:42:58 +01:00
87de3a2d06
added wc-timer for host code
evghenii
2013-11-08 15:39:57 +01:00