Commit Graph

115 Commits

Author SHA1 Message Date
Evghenii
ac05de6835 merged with master 2014-02-21 08:25:28 +01:00
Dmitry Babokin
f280b32fa4 Merge pull request #736 from egaburov/native_trigonometry
Native trigonometry
2014-02-20 19:18:35 +03:00
Evghenii
690a8acb30 merged with master 2014-02-20 15:22:09 +01:00
Evghenii
4196c723eb merged with nvptx 2014-02-20 11:01:58 +01:00
Vsevolod Livinskij
cef5b2eb04 Some changes in saturation arithmetic 2014-02-10 12:40:53 +04:00
Vsevolod Livinskij
1c1614d207 Some errors in comments and code were fixed 2014-02-09 21:39:42 +04:00
Evghenii
70a9b286e5 added support for native and double precision trigonometry/transendentals 2014-02-07 15:28:39 +01:00
Evghenii
81aa19a8f0 added use of native_transendentals, need to add IR 2014-02-07 11:49:24 +01:00
evghenii
732a315a4b removed __declspec(safe) duplicate 2014-02-05 13:04:45 +01:00
Evghenii
686c1d676d improvements 2014-02-05 12:04:36 +01:00
Evghenii
d3a6693eef adding __have_native_{rsqrtd,rcpd} to select between native support for double precision reciprocals and using slower but safe version in stdlib 2014-02-04 16:29:23 +01:00
Evghenii
fe98fe8cdc added fast approximate rcp(double) accurate to 15 digits 2014-02-04 15:23:34 +01:00
Evghenii
eb1a495a7a added support for fast approximate rsqrt(double). Provide 16 digit accurancy but is over 3x faster than 1/sqrt(double) 2014-02-04 14:44:54 +01:00
Evghenii
b0753dc93d added double-version for rcp 2014-02-02 18:20:05 +01:00
evghenii
3a72e05c3e +1 2014-02-02 18:16:48 +01:00
Evghenii
5a6b650d8b restored nonptx atomic_*_local 2014-01-28 15:56:30 +01:00
Evghenii
a3b00fdcd6 added support for global atomics 2014-01-26 14:23:26 +01:00
Evghenii
a7d4a3f922 fix for __any 2014-01-26 13:15:13 +01:00
Evghenii
fcbdd93043 half/scan for 64 bit/clock/num_cores and other additions 2014-01-25 16:43:33 +01:00
Evghenii
be6ac0408a added compile-time constant __is_nvptx_traget that can be used with stdlib.ispc 2014-01-24 09:02:12 +01:00
Evghenii
1cf1dab649 fixed foreach_unique and local_atomics 2014-01-23 21:57:20 +01:00
Vsevolod Livinskij
da02236b3a Scalar realization of no-vec functions was replaced from builtins to stdlib.ispc. 2014-01-20 16:06:34 +04:00
Evghenii
f86de2be78 fix: laneIndex() must be varying 2014-01-09 09:41:57 +01:00
Evghenii
d77789d8fe +merged with master 2013-12-18 11:37:01 +01:00
Ilia Filippov
473f1cb4d2 packed_store_active2 2013-12-17 21:14:29 +04:00
Vsevolod Livinskij
9a135c48d9 Functions name change 2013-12-09 00:20:52 +04:00
Vsevolod Livinskij
65768c20ae Added tests for saturation and some fixes for generic and avx target 2013-12-05 00:34:14 +04:00
Vsevolod Livinskij
35a4d1b3a2 Add some AVX2 intrinsics 2013-11-27 00:55:57 +04:00
Vsevolod Livinskij
19f73b2ede uniform signed/unsigned int8/16 2013-11-25 19:16:02 +04:00
Evghenii
589538bf39 added stencil code 2013-11-18 12:04:00 +01:00
Evghenii
3dd6173a65 added packed_store_active that can be called with active flag 2013-11-11 12:25:15 +01:00
Evghenii
426afc7377 added workable .cu files for stencil & mandelbrot 2013-11-08 10:00:49 +01:00
egaburov
f19cf9274e Merge remote-tracking branch 'upstream/master' into nvptx 2013-10-29 15:24:40 +01:00
Evghenii
8391d05697 added blockIndex computations 2013-10-28 10:18:30 +01:00
james.brodman
4d289b16c2 Redesign after being hit with the KISS bat. 2013-10-23 14:25:43 -04:00
james.brodman
899f85ce9c Initial Support for new stdlib shift operator 2013-10-22 18:06:54 -04:00
Evghenii
6fd21d988d fixed lexer to properly read fortran-notation double constants 2013-09-16 17:15:02 +02:00
egaburov
e2a91e6de5 added support for "d"-suffix 2013-09-16 15:54:32 +02:00
Evghenii
36886971e3 revert lex.ll parse.yy stdlib.ispc to state when all constants are floats 2013-09-13 16:02:53 +02:00
Evghenii
a97eb7b7cb added clamp in double precision 2013-09-13 09:32:59 +02:00
egaburov
7364e06387 added mask64 2013-09-12 12:02:42 +02:00
egaburov
320c41ffcf added svml support. experimental. for some reason all sybmols are visible.. 2013-09-11 15:16:50 +02:00
james.brodman
8db378b265 Revert "Remove support for using SVML for math lib routines."
This reverts commit d9c38b5c1f.
2013-09-04 16:01:58 -04:00
Dmitry Babokin
e06267ef1b Fix for incorrect implementation of reduce_[min|max]_[float|double], it showed up as -O0 2013-08-29 16:16:02 +04:00
Matt Pharr
5b20b06bd9 Add avg_{up,down}_int{8,16} routines to stdlib
These compute the average of two given values, rounding up and down,
respectively, if the result isn't exact.  When possible, these are
mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US]
on NEON.)

A subsequent commit will add pattern-matching to generate calls to
these intrinsincs when the corresponding patterns are detected in the
IR.)
2013-08-06 08:41:12 -07:00
Matt Pharr
d9c38b5c1f Remove support for using SVML for math lib routines.
This path was poorly maintained and wasn't actually available on most
targets.
2013-07-31 06:56:48 -07:00
Matt Pharr
b6df447b55 Add reduce_add() for int8 and int16 types.
This maps to specialized instructions (e.g. PSADBW) when available.
2013-07-25 09:46:01 -07:00
Matt Pharr
f7f281a256 Choose type for integer literals to match the target mask size (if possible).
On a target with a 16-bit mask (for example), we would choose the type
of an integer literal "1024" to be an int16.  Previously, we used an int32,
which is a worse fit and leads to less efficient code than an int16
on a 16-bit mask target.  (However, we'd still give an integer literal
1000000 the type int32, even in a 16-bit target.)

Updated the tests to still pass with 8 and 16-bit targets, given this
change.
2013-07-23 17:24:50 -07:00
Matt Pharr
9ba49eabb2 Reduce estimated costs for 8 and 16-bit min() and max() in stdlib.
These actually compile to a single instruction.
2013-07-23 16:52:43 -07:00
Matt Pharr
e7abf3f2ea Add support for mask vectors of 8 and 16-bit element types.
There were a number of places throughout the system that assumed that the
execution mask would only have either 32-bit or 1-bit elements.  This
commit makes it possible to have a target with an 8- or 16-bit mask.
2013-07-23 16:50:11 -07:00