Commit Graph

73 Commits

Author SHA1 Message Date
Vsevolod Livinskij
cef5b2eb04 Some changes in saturation arithmetic 2014-02-10 12:40:53 +04:00
Vsevolod Livinskij
1c1614d207 Some errors in comments and code were fixed 2014-02-09 21:39:42 +04:00
evghenii
09e8381ec7 change {rsqrt,rcp}_double to {rsqrt,rcp}d_decl 2014-02-05 13:05:04 +01:00
Evghenii
d3a6693eef adding __have_native_{rsqrtd,rcpd} to select between native support for double precision reciprocals and using slower but safe version in stdlib 2014-02-04 16:29:23 +01:00
Evghenii
fe98fe8cdc added fast approximate rcp(double) accurate to 15 digits 2014-02-04 15:23:34 +01:00
Evghenii
eb1a495a7a added support for fast approximate rsqrt(double). Provide 16 digit accurancy but is over 3x faster than 1/sqrt(double) 2014-02-04 14:44:54 +01:00
evghenii
3a72e05c3e +1 2014-02-02 18:16:48 +01:00
Vsevolod Livinskij
da02236b3a Scalar realization of no-vec functions was replaced from builtins to stdlib.ispc. 2014-01-20 16:06:34 +04:00
Vsevolod Livinskij
323587f10f Scalar implementation and implementation for targets which don't have h/w instructions 2014-01-02 16:48:56 +04:00
Vsevolod Livinskij
07c6f1714a Some fixes in function names and more tests was added. 2013-12-22 19:28:26 +04:00
Dmitry Babokin
d666fc3f8f Merge pull request #686 from ifilippov/ttt
packed_store_active2() - tuned version of packed_store_active()
2013-12-17 09:23:39 -08:00
Ilia Filippov
473f1cb4d2 packed_store_active2 2013-12-17 21:14:29 +04:00
Dmitry Babokin
6d51987e67 Merge pull request #642 from egaburov/launch3d
concept of 3d tasking
2013-12-17 08:40:07 -08:00
evghenii
c06ec92d0d added commas, added multi-dimensional tasking to mandelbrot_tasks & removed mandelbrot_task3d. Also adjusted documentaiton a bit 2013-12-13 11:49:11 +01:00
Vsevolod Livinskij
65768c20ae Added tests for saturation and some fixes for generic and avx target 2013-12-05 00:34:14 +04:00
Vsevolod Livinskij
4faff1a63c structural change 2013-11-30 10:48:18 +04:00
Vsevolod Livinskij
4c330bc38b Add code generation of saturation 2013-11-29 18:40:04 +04:00
Dmitry Babokin
6585a925be Merge pull request #641 from jbrodman/stdlibshift
Add a "shift" operator to the stdlib.
2013-10-28 14:18:31 -07:00
james.brodman
4d289b16c2 Redesign after being hit with the KISS bat. 2013-10-23 14:25:43 -04:00
egaburov
f89bad1e94 launch now passes the right info into tasking 2013-10-23 12:51:06 +02:00
james.brodman
f97a2d68c8 Bugfix for non-const shift amt and unit tests. 2013-10-22 18:29:20 -04:00
james.brodman
899f85ce9c Initial Support for new stdlib shift operator 2013-10-22 18:06:54 -04:00
Ilia Filippov
92773ada6d fix for ISPC for compfails at sse4-i8 and sse4-i16 2013-10-11 15:23:40 +04:00
egaburov
7364e06387 added mask64 2013-09-12 12:02:42 +02:00
egaburov
320c41ffcf added svml support. experimental. for some reason all sybmols are visible.. 2013-09-11 15:16:50 +02:00
Matt Pharr
5b20b06bd9 Add avg_{up,down}_int{8,16} routines to stdlib
These compute the average of two given values, rounding up and down,
respectively, if the result isn't exact.  When possible, these are
mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US]
on NEON.)

A subsequent commit will add pattern-matching to generate calls to
these intrinsincs when the corresponding patterns are detected in the
IR.)
2013-08-06 08:41:12 -07:00
Matt Pharr
48ff03112f Remove __pause from stdlib_core() in utils.m4.
It wasn't ever being used, and was breaking compilation on ARM.
2013-07-30 08:44:22 -07:00
Matt Pharr
ab3b633733 Add 8-bit and 16-bit specialized NEON targets.
Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask
elements, respectively, and thus should generate the best code when used
for computation with datatypes of those sizes.
2013-07-30 08:44:16 -07:00
Matt Pharr
53414f12e6 Add SSE4 target optimized for computation with 8-bit datatypes.
This change adds a new 'sse4-8' target, where programCount is 16 and
the mask element size is 8-bits.  (i.e. the most appropriate sizing of
the mask for SIMD computation with 8-bit datatypes.)
2013-07-23 17:30:32 -07:00
Matt Pharr
15a3ef370a Use @llvm.readcyclecounter to implement stdlib clock() function.
Also added a test for the clock builtin.
2013-07-23 17:24:57 -07:00
Matt Pharr
e7abf3f2ea Add support for mask vectors of 8 and 16-bit element types.
There were a number of places throughout the system that assumed that the
execution mask would only have either 32-bit or 1-bit elements.  This
commit makes it possible to have a target with an 8- or 16-bit mask.
2013-07-23 16:50:11 -07:00
Dmitry Babokin
7bedb4a081 Add memory alignment dependant on the platform (16/32/64/etc) 2013-05-24 10:29:01 +04:00
Dmitry Babokin
630215f56f Defining memory routines completely separately for Windows/Unix 32/64 bit. 2013-05-24 10:29:01 +04:00
Dmitry Babokin
5362dade37 Fixing util.m4 to declare nothing unless some macro is instantiated 2013-05-24 10:29:00 +04:00
Dmitry Babokin
a47460b4c3 Efficient library implementation of broadcast 2013-05-02 00:12:16 +02:00
Dmitry Babokin
26bec62daf Removing duplicating free defintion on Linux 2013-04-27 00:29:51 +04:00
Dmitry Babokin
7497e86902 Adding Windows support for aligned memory allocation on Windows 2013-04-26 22:07:30 +02:00
Dmitry Babokin
95950885cf Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS. 2013-04-26 20:33:24 +04:00
Dmitry Babokin
d36ab4cc3c Adding noalias attribute to malloc return 2013-04-25 20:39:01 +04:00
james.brodman
3aaf2ef2d4 ToT Fixes / M4 macro fix 2013-01-14 14:55:10 -05:00
Matt Pharr
765a0d8896 Use puts() rather than printf() for printing assertion failure strings.
This way, we don't lose '%'s in the assertion strings.

Issue #342.
2012-08-03 11:31:38 -07:00
Matt Pharr
6a410fc30e Emit gather instructions for the AVX2 targets.
Issue #308.
2012-07-13 12:29:05 -07:00
Matt Pharr
984a68c3a9 Rename gen_gather() macro to gen_gather_factored() 2012-07-13 12:24:12 -07:00
Matt Pharr
2c640f7e52 Add support for RDRAND in IvyBridge.
The standard library now provides a variety of rdrand() functions
that call out to RDRAND, when available.

Issue #263.
2012-07-12 06:07:07 -07:00
Matt Pharr
c09c87873e Whitespace / indentation fixes. 2012-07-11 14:29:46 -07:00
Matt Pharr
10b79fb41b Add support for non-factored variants of gather/scatter functions.
We now have two ways of approaching gather/scatters with a common base
pointer and with offset vectors.  For targets with native gather/scatter,
we just turn those into base + {1/2/4/8}*offsets.  For targets without,
we turn those into base + {1/2/4/8}*varying_offsets + const_offsets,
where const_offsets is a compile-time constant.

Infrastructure for issue #325.
2012-07-11 14:29:42 -07:00
Matt Pharr
ec0280be11 Rename gather/scatter_base_offsets functions to *factored_based_offsets*.
No functional change; just preparation for having a path that doesn't
factor the offsets into constant and varying parts, which will be better
for AVX2 and KNC.
2012-07-11 14:16:39 -07:00
Matt Pharr
fb8b893b10 Fix incorrect LLVM_3_1svn tests.
1. For some time now, we provide the version without the 'svn'
2. We should be testing "not LLVM 3.0" in these cases, since they
   apply to LLVM 3.2 and beyond as well...
2012-07-09 07:09:25 -07:00
Matt Pharr
9ca80debb8 Remove stale LLVM 2.9 support from builtins/util.m4 2012-07-09 06:54:29 -07:00
Matt Pharr
d34a87404d Provide (undocumented for now) __pause() call to emit PAUSE inst. 2012-06-28 09:28:25 -07:00