Commit Graph

  • 6b65f6d9f4 +1 Evghenii 2013-11-15 16:21:09 +01:00
  • c93e71698e restored intrinsics and added tuning options to ptxgen Evghenii 2013-11-15 15:04:04 +01:00
  • f9d2ede83c +1 Evghenii 2013-11-14 23:15:51 +01:00
  • 53bf4573f0 +fix Evghenii 2013-11-14 23:05:44 +01:00
  • 86652738c0 working on rt Evghenii 2013-11-14 22:54:37 +01:00
  • 294fb039fe some tuning, adding cuda kernels Evghenii 2013-11-14 22:33:58 +01:00
  • f12826bac5 +added approx rcp/rsqrt/rtz with ftz=true Evghenii 2013-11-14 22:17:57 +01:00
  • 2c8afde6d9 chaning MF Evghenii 2013-11-14 21:38:25 +01:00
  • 1445202e0e identified bug due to llvm-3.4 Evghenii 2013-11-14 21:18:25 +01:00
  • 1b940fd41e +1 Evghenii 2013-11-14 20:19:59 +01:00
  • f1fc3bdfba added nvptx declaration to other target & fixed nvptx64 recognition Evghenii 2013-11-14 20:12:58 +01:00
  • 7aa37b19a9 added some more macros as quick hack... Evghenii 2013-11-14 20:04:05 +01:00
  • 967a49dd66 +1 Evghenii 2013-11-14 19:54:18 +01:00
  • 25df23fed3 workaround for programIndex via preprocessor Evghenii 2013-11-14 19:48:50 +01:00
  • e162d5a99d programIndex still not working, found where change is needed... Evghenii 2013-11-14 19:46:08 +01:00
  • 918ca339b6 now programIndex returns laneIdx = %tid.x & (%warpsize-1) & programCount returns 32 Evghenii 2013-11-14 19:27:52 +01:00
  • 8bb8f0eda4 +1 Evghenii 2013-11-14 17:04:50 +01:00
  • be2cc8f946 restored foreach in sort Evghenii 2013-11-14 16:51:59 +01:00
  • 599ada8354 added deferred shading foreach_tile Evghenii 2013-11-14 16:49:47 +01:00
  • 83b9cc5c0a +1 Evghenii 2013-11-14 16:44:09 +01:00
  • af75afeb7a foreach[_tiled] seems to work now Evghenii 2013-11-14 16:29:40 +01:00
  • 42e181112a Add avx1-i32x4 to the list of supported targets Dmitry Babokin 2013-11-14 16:21:30 +04:00
  • 801f78f8a8 Rebuild *.ispc when necessary Dmitry Babokin 2013-11-13 22:48:14 +04:00
  • e100040f28 Fix bug with fail when --target=avx1.1-i32x8,avx2-i32x8 - avx11 is not a valid target anymore, need more complete string Dmitry Babokin 2013-11-13 22:35:37 +04:00
  • b8a39a1b26 minor improvements in examples/common.mk Dmitry Babokin 2013-11-13 16:34:10 +04:00
  • 8f768633ad Make perf.py changes work as part of alloy.py Dmitry Babokin 2013-11-13 15:07:21 +04:00
  • 65ea6fd48a Reasoning to use sse4 bitcode file Dmitry Babokin 2013-11-13 13:15:01 +04:00
  • d2c7b356cc Ordering functions in target-[avx|sse2].ll to be in the same order. No real changes, except adding a few alwaysinline in SSE4 target Dmitry Babokin 2013-11-12 14:56:52 +04:00
  • af58955140 target-[sse4|avx]_common.ll are twin brothers, which diffes only cosmetically. This commit makes them diffable. No real changes, except adding alwaysinline to sse version iof __max_uniform_int32/__max_uniform_uint32 Dmitry Babokin 2013-11-12 10:00:42 +04:00
  • ffc9a33933 avx1-i32x4 implementation as sse4-i32x4 with avx target-feature flag Dmitry Babokin 2013-11-10 23:48:49 +04:00
  • fbab9874f6 perf.py - target switch was added Dmitry Babokin 2013-11-10 23:47:19 +04:00
  • 017e7890f7 Examples makefiles to support setting single target via ISPC_IA_TARGETS Dmitry Babokin 2013-11-10 02:58:48 +04:00
  • 48644813d4 stmt.cpp forking on foreach Evghenii 2013-11-14 11:30:22 +01:00
  • c81821ed28 +1 evghenii 2013-11-13 21:17:21 +01:00
  • 42cfe97427 using now cuda_ispc.h Evghenii 2013-11-13 21:06:40 +01:00
  • 09a2c12ea0 added cuda_ispc.h & cuda eror_strings Evghenii 2013-11-13 21:04:59 +01:00
  • a0f6f264f6 fixed problem with new/delete and added Mel/sec counter Evghenii 2013-11-13 20:34:01 +01:00
  • 6f9cea5b58 removed binary Evghenii 2013-11-13 19:43:45 +01:00
  • dd4ac42491 added print m Evghenii 2013-11-13 19:43:32 +01:00
  • 01df6ed4a9 added ispc timers w/o task Evghenii 2013-11-13 19:13:04 +01:00
  • e71259006c +1 Evghenii 2013-11-13 19:06:02 +01:00
  • 0f161b500f +1 Evghenii 2013-11-13 19:02:45 +01:00
  • e442139c39 runs, next check correctness Evghenii 2013-11-13 18:15:52 +01:00
  • 8b0f871c06 +1 Evghenii 2013-11-13 17:23:23 +01:00
  • 61fab0340c working on sort Evghenii 2013-11-13 17:07:55 +01:00
  • 525eacd035 +1 Evghenii 2013-11-13 16:32:56 +01:00
  • cddddfd255 +1 Evghenii 2013-11-13 16:23:24 +01:00
  • 780e9f31fe some tuning Evghenii 2013-11-13 16:23:05 +01:00
  • c0b54aa58c added Makefile_gpu Evghenii 2013-11-13 16:20:51 +01:00
  • c0c1cc1ba7 +added Makefile and some fixes Evghenii 2013-11-13 14:16:48 +01:00
  • dededd1929 cleaned Evghenii 2013-11-13 13:56:45 +01:00
  • 6cd8a8f895 cleaned-up Evghenii 2013-11-13 13:47:53 +01:00
  • d3ade0654e added Makefile Evghenii 2013-11-13 13:45:24 +01:00
  • 2dd7128db5 added Makefile Evghenii 2013-11-13 13:40:08 +01:00
  • 1f13a236bf small tuning Evghenii 2013-11-13 13:03:26 +01:00
  • ca1dbc3d3b fixed cuda kernel Evghenii 2013-11-13 12:54:52 +01:00
  • 74db8cbab3 +1 Evghenii 2013-11-13 12:12:09 +01:00
  • 62bc39e600 +CDP works with deferred shading Evghenii 2013-11-13 11:57:37 +01:00
  • 268be7f0b5 fixed ISPCSync functionality Evghenii 2013-11-13 11:19:10 +01:00
  • 55bf0d23c2 resotred non-ptx functionality Evghenii 2013-11-13 11:08:58 +01:00
  • f433aa3ad5 CDP works now Evghenii 2013-11-13 10:43:52 +01:00
  • f587e0a459 handwired CDP launch Evghenii 2013-11-12 21:20:10 +01:00
  • 76bfcc29c2 ao1.ispc is not functional just yet :S Evghenii 2013-11-12 19:30:41 +01:00
  • 1d91a626f2 ISPC sync is not added Evghenii 2013-11-12 17:02:31 +01:00
  • dbde936c3c bugfix in inlined ptx, now NVCC also compiles the ptx Evghenii 2013-11-12 16:47:47 +01:00
  • cf679187b1 added CDP calls into IR, next step ... check :) Evghenii 2013-11-12 16:39:22 +01:00
  • fd17ad236a export functions are now also generated... next add proper CDP calls.. Evghenii 2013-11-12 14:05:12 +01:00
  • dbb96c1885 need to fix launch code Evghenii 2013-11-12 13:41:03 +01:00
  • 4cd7e10ad3 reversed to original changes. Here is the plan to use CDP and genarate only device code with host wrapper.. Evghenii 2013-11-12 12:51:56 +01:00
  • 3fd76d59ea +1 Evghenii 2013-11-12 11:32:42 +01:00
  • f445a470df handwired CDP launch Evghenii 2013-11-12 11:25:43 +01:00
  • 4e5299a9bf added CDP Evghenii 2013-11-12 11:19:23 +01:00
  • a6afef9f3f +added some more mem management stuff Evghenii 2013-11-12 08:31:45 +01:00
  • 6a1fb8ea31 some kernel tuning Evghenii 2013-11-11 14:24:13 +01:00
  • f2c66dc4c3 added any/none/all for bool Evghenii 2013-11-11 12:59:40 +01:00
  • a91c8e15e2 added reduce_min/max_float, packed_store_active for CUDA, and now kerenls1.ispc just work :) Evghenii 2013-11-11 12:33:39 +01:00
  • 9c7a842163 ptx has support for half-float Evghenii 2013-11-11 12:25:47 +01:00
  • 3dd6173a65 added packed_store_active that can be called with active flag Evghenii 2013-11-11 12:25:15 +01:00
  • e9bc2b7b54 added uniform_new/uniform_delete in util_ptx.m4 and __shfl intrinsics Evghenii 2013-11-11 09:18:15 +01:00
  • 38947ab71b made CU version working Evghenii 2013-11-10 20:10:37 +01:00
  • 8a7801264a added tuned code Evghenii 2013-11-10 16:02:10 +01:00
  • 66edc180be working on aobench Evghenii 2013-11-10 14:29:53 +01:00
  • 17809992d7 working on ao Evghenii 2013-11-10 14:26:00 +01:00
  • c10033211b removed evghenii 2013-11-10 14:17:59 +01:00
  • 7d4ea1b6f0 added wc-timer Evghenii 2013-11-10 14:15:16 +01:00
  • 0dfe823c32 added kernels that use shared memory Evghenii 2013-11-10 14:06:06 +01:00
  • bef275f62c amadded drv_api_error_String.h Evghenii 2013-11-10 14:05:34 +01:00
  • edb4c57e3d +added host code as well and restored original main.cpp evghenii 2013-11-10 14:07:15 +01:00
  • c1b3face8f change time from sec to ms evghenii 2013-11-10 14:04:01 +01:00
  • 9d23c10475 deffered_shading probilem identified. need solution Evghenii 2013-11-10 13:59:41 +01:00
  • 78d509dba5 working on deferred shading Evghenii 2013-11-10 12:10:10 +01:00
  • 1a37135f98 +1 Evghenii 2013-11-09 21:23:34 +01:00
  • dbd0581cb3 +added CUDA code Evghenii 2013-11-09 21:05:28 +01:00
  • 946530019a Merge branch 'nvptx' of github.com:egaburov/ispc into nvptx Evghenii 2013-11-09 20:56:55 +01:00
  • 8f6f6d10e7 +some tuning Evghenii 2013-11-09 20:56:48 +01:00
  • 3a549e5c2f xeonphi tests added for rt evghenii 2013-11-09 19:26:19 +01:00
  • dc7015c5f2 added wc-timer for host code evghenii 2013-11-09 19:08:08 +01:00
  • 356e9c6810 +fixed rt.cpp to compile with nvvm Evghenii 2013-11-09 19:02:14 +01:00
  • d0ddec469a Merge branch 'master' into nvptx egaburov 2013-11-08 15:42:58 +01:00
  • 87de3a2d06 added wc-timer for host code evghenii 2013-11-08 15:39:57 +01:00