Vsevolod Livinskij
cef5b2eb04
Some changes in saturation arithmetic
2014-02-10 12:40:53 +04:00
Vsevolod Livinskij
1c1614d207
Some errors in comments and code were fixed
2014-02-09 21:39:42 +04:00
evghenii
09e8381ec7
change {rsqrt,rcp}_double to {rsqrt,rcp}d_decl
2014-02-05 13:05:04 +01:00
Evghenii
d3a6693eef
adding __have_native_{rsqrtd,rcpd} to select between native support for double precision reciprocals and using slower but safe version in stdlib
2014-02-04 16:29:23 +01:00
Evghenii
fe98fe8cdc
added fast approximate rcp(double) accurate to 15 digits
2014-02-04 15:23:34 +01:00
Evghenii
eb1a495a7a
added support for fast approximate rsqrt(double). Provide 16 digit accurancy but is over 3x faster than 1/sqrt(double)
2014-02-04 14:44:54 +01:00
evghenii
3a72e05c3e
+1
2014-02-02 18:16:48 +01:00
Vsevolod Livinskij
da02236b3a
Scalar realization of no-vec functions was replaced from builtins to stdlib.ispc.
2014-01-20 16:06:34 +04:00
Vsevolod Livinskij
323587f10f
Scalar implementation and implementation for targets which don't have h/w instructions
2014-01-02 16:48:56 +04:00
Vsevolod Livinskij
07c6f1714a
Some fixes in function names and more tests was added.
2013-12-22 19:28:26 +04:00
Dmitry Babokin
d666fc3f8f
Merge pull request #686 from ifilippov/ttt
...
packed_store_active2() - tuned version of packed_store_active()
2013-12-17 09:23:39 -08:00
Ilia Filippov
473f1cb4d2
packed_store_active2
2013-12-17 21:14:29 +04:00
Dmitry Babokin
6d51987e67
Merge pull request #642 from egaburov/launch3d
...
concept of 3d tasking
2013-12-17 08:40:07 -08:00
evghenii
c06ec92d0d
added commas, added multi-dimensional tasking to mandelbrot_tasks & removed mandelbrot_task3d. Also adjusted documentaiton a bit
2013-12-13 11:49:11 +01:00
Vsevolod Livinskij
65768c20ae
Added tests for saturation and some fixes for generic and avx target
2013-12-05 00:34:14 +04:00
Vsevolod Livinskij
4faff1a63c
structural change
2013-11-30 10:48:18 +04:00
Vsevolod Livinskij
4c330bc38b
Add code generation of saturation
2013-11-29 18:40:04 +04:00
Dmitry Babokin
6585a925be
Merge pull request #641 from jbrodman/stdlibshift
...
Add a "shift" operator to the stdlib.
2013-10-28 14:18:31 -07:00
james.brodman
4d289b16c2
Redesign after being hit with the KISS bat.
2013-10-23 14:25:43 -04:00
egaburov
f89bad1e94
launch now passes the right info into tasking
2013-10-23 12:51:06 +02:00
james.brodman
f97a2d68c8
Bugfix for non-const shift amt and unit tests.
2013-10-22 18:29:20 -04:00
james.brodman
899f85ce9c
Initial Support for new stdlib shift operator
2013-10-22 18:06:54 -04:00
Ilia Filippov
92773ada6d
fix for ISPC for compfails at sse4-i8 and sse4-i16
2013-10-11 15:23:40 +04:00
egaburov
7364e06387
added mask64
2013-09-12 12:02:42 +02:00
egaburov
320c41ffcf
added svml support. experimental. for some reason all sybmols are visible..
2013-09-11 15:16:50 +02:00
Matt Pharr
5b20b06bd9
Add avg_{up,down}_int{8,16} routines to stdlib
...
These compute the average of two given values, rounding up and down,
respectively, if the result isn't exact. When possible, these are
mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US]
on NEON.)
A subsequent commit will add pattern-matching to generate calls to
these intrinsincs when the corresponding patterns are detected in the
IR.)
2013-08-06 08:41:12 -07:00
Matt Pharr
48ff03112f
Remove __pause from stdlib_core() in utils.m4.
...
It wasn't ever being used, and was breaking compilation on ARM.
2013-07-30 08:44:22 -07:00
Matt Pharr
ab3b633733
Add 8-bit and 16-bit specialized NEON targets.
...
Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask
elements, respectively, and thus should generate the best code when used
for computation with datatypes of those sizes.
2013-07-30 08:44:16 -07:00
Matt Pharr
53414f12e6
Add SSE4 target optimized for computation with 8-bit datatypes.
...
This change adds a new 'sse4-8' target, where programCount is 16 and
the mask element size is 8-bits. (i.e. the most appropriate sizing of
the mask for SIMD computation with 8-bit datatypes.)
2013-07-23 17:30:32 -07:00
Matt Pharr
15a3ef370a
Use @llvm.readcyclecounter to implement stdlib clock() function.
...
Also added a test for the clock builtin.
2013-07-23 17:24:57 -07:00
Matt Pharr
e7abf3f2ea
Add support for mask vectors of 8 and 16-bit element types.
...
There were a number of places throughout the system that assumed that the
execution mask would only have either 32-bit or 1-bit elements. This
commit makes it possible to have a target with an 8- or 16-bit mask.
2013-07-23 16:50:11 -07:00
Dmitry Babokin
7bedb4a081
Add memory alignment dependant on the platform (16/32/64/etc)
2013-05-24 10:29:01 +04:00
Dmitry Babokin
630215f56f
Defining memory routines completely separately for Windows/Unix 32/64 bit.
2013-05-24 10:29:01 +04:00
Dmitry Babokin
5362dade37
Fixing util.m4 to declare nothing unless some macro is instantiated
2013-05-24 10:29:00 +04:00
Dmitry Babokin
a47460b4c3
Efficient library implementation of broadcast
2013-05-02 00:12:16 +02:00
Dmitry Babokin
26bec62daf
Removing duplicating free defintion on Linux
2013-04-27 00:29:51 +04:00
Dmitry Babokin
7497e86902
Adding Windows support for aligned memory allocation on Windows
2013-04-26 22:07:30 +02:00
Dmitry Babokin
95950885cf
Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS.
2013-04-26 20:33:24 +04:00
Dmitry Babokin
d36ab4cc3c
Adding noalias attribute to malloc return
2013-04-25 20:39:01 +04:00
james.brodman
3aaf2ef2d4
ToT Fixes / M4 macro fix
2013-01-14 14:55:10 -05:00
Matt Pharr
765a0d8896
Use puts() rather than printf() for printing assertion failure strings.
...
This way, we don't lose '%'s in the assertion strings.
Issue #342 .
2012-08-03 11:31:38 -07:00
Matt Pharr
6a410fc30e
Emit gather instructions for the AVX2 targets.
...
Issue #308 .
2012-07-13 12:29:05 -07:00
Matt Pharr
984a68c3a9
Rename gen_gather() macro to gen_gather_factored()
2012-07-13 12:24:12 -07:00
Matt Pharr
2c640f7e52
Add support for RDRAND in IvyBridge.
...
The standard library now provides a variety of rdrand() functions
that call out to RDRAND, when available.
Issue #263 .
2012-07-12 06:07:07 -07:00
Matt Pharr
c09c87873e
Whitespace / indentation fixes.
2012-07-11 14:29:46 -07:00
Matt Pharr
10b79fb41b
Add support for non-factored variants of gather/scatter functions.
...
We now have two ways of approaching gather/scatters with a common base
pointer and with offset vectors. For targets with native gather/scatter,
we just turn those into base + {1/2/4/8}*offsets. For targets without,
we turn those into base + {1/2/4/8}*varying_offsets + const_offsets,
where const_offsets is a compile-time constant.
Infrastructure for issue #325 .
2012-07-11 14:29:42 -07:00
Matt Pharr
ec0280be11
Rename gather/scatter_base_offsets functions to *factored_based_offsets*.
...
No functional change; just preparation for having a path that doesn't
factor the offsets into constant and varying parts, which will be better
for AVX2 and KNC.
2012-07-11 14:16:39 -07:00
Matt Pharr
fb8b893b10
Fix incorrect LLVM_3_1svn tests.
...
1. For some time now, we provide the version without the 'svn'
2. We should be testing "not LLVM 3.0" in these cases, since they
apply to LLVM 3.2 and beyond as well...
2012-07-09 07:09:25 -07:00
Matt Pharr
9ca80debb8
Remove stale LLVM 2.9 support from builtins/util.m4
2012-07-09 06:54:29 -07:00
Matt Pharr
d34a87404d
Provide (undocumented for now) __pause() call to emit PAUSE inst.
2012-06-28 09:28:25 -07:00