Evghenii
|
a82368956e
|
added ms timer
|
2013-11-18 10:16:24 +01:00 |
|
evghenii
|
6841d8ba9a
|
added Makefile_mic
|
2013-11-18 09:59:41 +01:00 |
|
Evghenii
|
927da8e861
|
change register allocation, makes code much faster
|
2013-11-18 09:46:51 +01:00 |
|
Evghenii
|
3c220a2813
|
loop unrolling, maks code 10x faster
|
2013-11-18 09:37:25 +01:00 |
|
Evghenii
|
5a01819fdc
|
unrolled loops in binomial options cuda version
|
2013-11-18 09:16:09 +01:00 |
|
Dmitry Babokin
|
4977933d81
|
Merge pull request #658 from dbabokin/fail_db
fail_db.txt update on Linux
|
2013-11-17 15:42:51 -08:00 |
|
Dmitry Babokin
|
953e467a85
|
fail_db.txt update on Linux
|
2013-11-18 03:39:09 +04:00 |
|
jbrodman
|
131ab07c2b
|
Merge pull request #657 from dbabokin/avx-i32x4
avx1-i32x4 target
|
2013-11-15 16:00:57 -08:00 |
|
Evghenii
|
1d94667a15
|
+speed-up binomial options via use of shared memory
|
2013-11-15 20:46:55 +01:00 |
|
Evghenii
|
4421fb7e19
|
+1
|
2013-11-15 20:21:31 +01:00 |
|
Dmitry Babokin
|
131ff50333
|
Adding avx1-i32x4 to alloy.py testing
|
2013-11-15 22:09:13 +04:00 |
|
Evghenii
|
bc8b5b3896
|
added cuda versino
|
2013-11-15 18:09:03 +01:00 |
|
Evghenii
|
a2d12517e7
|
+options
|
2013-11-15 17:59:04 +01:00 |
|
Evghenii
|
95d6647dce
|
+1
|
2013-11-15 17:32:59 +01:00 |
|
Evghenii
|
3454f51d2c
|
added some ptx options
|
2013-11-15 17:23:22 +01:00 |
|
Evghenii
|
6b65f6d9f4
|
+1
|
2013-11-15 16:21:09 +01:00 |
|
Evghenii
|
c93e71698e
|
restored intrinsics and added tuning options to ptxgen
|
2013-11-15 15:04:04 +01:00 |
|
Evghenii
|
f9d2ede83c
|
+1
|
2013-11-14 23:15:51 +01:00 |
|
Evghenii
|
53bf4573f0
|
+fix
|
2013-11-14 23:05:44 +01:00 |
|
Evghenii
|
86652738c0
|
working on rt
|
2013-11-14 22:54:37 +01:00 |
|
Evghenii
|
294fb039fe
|
some tuning, adding cuda kernels
|
2013-11-14 22:33:58 +01:00 |
|
Evghenii
|
f12826bac5
|
+added approx rcp/rsqrt/rtz with ftz=true
|
2013-11-14 22:17:57 +01:00 |
|
Evghenii
|
2c8afde6d9
|
chaning MF
|
2013-11-14 21:38:25 +01:00 |
|
Evghenii
|
1445202e0e
|
identified bug due to llvm-3.4
|
2013-11-14 21:18:25 +01:00 |
|
Evghenii
|
1b940fd41e
|
+1
|
2013-11-14 20:19:59 +01:00 |
|
Evghenii
|
f1fc3bdfba
|
added nvptx declaration to other target & fixed nvptx64 recognition
|
2013-11-14 20:12:58 +01:00 |
|
Evghenii
|
7aa37b19a9
|
added some more macros as quick hack...
|
2013-11-14 20:04:05 +01:00 |
|
Evghenii
|
967a49dd66
|
+1
|
2013-11-14 19:54:18 +01:00 |
|
Evghenii
|
25df23fed3
|
workaround for programIndex via preprocessor
|
2013-11-14 19:48:50 +01:00 |
|
Evghenii
|
e162d5a99d
|
programIndex still not working, found where change is needed...
|
2013-11-14 19:46:08 +01:00 |
|
Evghenii
|
918ca339b6
|
now programIndex returns laneIdx = %tid.x & (%warpsize-1) & programCount returns 32
|
2013-11-14 19:27:52 +01:00 |
|
Evghenii
|
8bb8f0eda4
|
+1
|
2013-11-14 17:04:50 +01:00 |
|
Evghenii
|
be2cc8f946
|
restored foreach in sort
|
2013-11-14 16:51:59 +01:00 |
|
Evghenii
|
599ada8354
|
added deferred shading foreach_tile
|
2013-11-14 16:49:47 +01:00 |
|
Evghenii
|
83b9cc5c0a
|
+1
|
2013-11-14 16:44:09 +01:00 |
|
Evghenii
|
af75afeb7a
|
foreach[_tiled] seems to work now
|
2013-11-14 16:29:40 +01:00 |
|
Dmitry Babokin
|
42e181112a
|
Add avx1-i32x4 to the list of supported targets
|
2013-11-14 16:21:30 +04:00 |
|
Dmitry Babokin
|
801f78f8a8
|
Rebuild *.ispc when necessary
|
2013-11-14 15:37:11 +04:00 |
|
Dmitry Babokin
|
e100040f28
|
Fix bug with fail when --target=avx1.1-i32x8,avx2-i32x8 - avx11 is not a valid target anymore, need more complete string
|
2013-11-14 15:37:11 +04:00 |
|
Dmitry Babokin
|
b8a39a1b26
|
minor improvements in examples/common.mk
|
2013-11-14 15:37:10 +04:00 |
|
Dmitry Babokin
|
8f768633ad
|
Make perf.py changes work as part of alloy.py
|
2013-11-14 15:37:10 +04:00 |
|
Dmitry Babokin
|
65ea6fd48a
|
Reasoning to use sse4 bitcode file
|
2013-11-14 15:34:30 +04:00 |
|
Dmitry Babokin
|
d2c7b356cc
|
Ordering functions in target-[avx|sse2].ll to be in the same order. No real changes, except adding a few alwaysinline in SSE4 target
|
2013-11-14 15:34:30 +04:00 |
|
Dmitry Babokin
|
af58955140
|
target-[sse4|avx]_common.ll are twin brothers, which diffes only cosmetically. This commit makes them diffable. No real changes, except adding alwaysinline to sse version iof __max_uniform_int32/__max_uniform_uint32
|
2013-11-14 15:34:30 +04:00 |
|
Dmitry Babokin
|
ffc9a33933
|
avx1-i32x4 implementation as sse4-i32x4 with avx target-feature flag
|
2013-11-14 15:34:30 +04:00 |
|
Dmitry Babokin
|
fbab9874f6
|
perf.py - target switch was added
|
2013-11-14 15:34:30 +04:00 |
|
Dmitry Babokin
|
017e7890f7
|
Examples makefiles to support setting single target via ISPC_IA_TARGETS
|
2013-11-14 15:34:30 +04:00 |
|
Evghenii
|
48644813d4
|
stmt.cpp forking on foreach
|
2013-11-14 11:30:22 +01:00 |
|
evghenii
|
c81821ed28
|
+1
|
2013-11-13 21:17:21 +01:00 |
|
Evghenii
|
42cfe97427
|
using now cuda_ispc.h
|
2013-11-13 21:06:40 +01:00 |
|