2938 Commits

Author SHA1 Message Date
93c563e073 Add missing NULL check in CanBeType 2017-05-02 22:26:21 -04:00
b3b02df569 [WIP] add check for polymorphic functions 2017-05-02 14:59:04 -04:00
0887760de1 [WIP] typechecking for casts to polymorphic types 2017-04-29 15:56:43 -04:00
148c333943 Merge branch 'master' into parse 2017-04-29 11:11:39 -04:00
98e5809d24 Ignore vscode and osx build stuff 2017-04-29 11:10:36 -04:00
87b6ed7f4c [WIP] slowly getting typechecking to work 2017-04-28 23:37:06 -04:00
259f092143 [WIP] Add quantifier to polymorphic types 2017-04-28 14:04:26 -04:00
108c9c6fb5 [WIP] parses polymorphic types 2017-04-27 14:17:47 -04:00
128b40ce3c Add test for polymorphic non-exported function 2017-04-26 22:33:03 -04:00
717aec388b Merge branch 'concept-check' 2017-04-26 11:39:59 -04:00
f2287d2cd7 Added tests for typechecking 2017-04-26 11:38:17 -04:00
4182fa2967 python regex-based preprocessor proof of concept 2017-04-18 22:28:48 -04:00
d6cf38a929 Ignore llvm build directory 2017-04-11 14:08:00 -04:00
Dmitry Babokin
8c97883317 Merge pull request #1264 from dbabokin/attributelist
Renaming AttributeSet to AttributeList to follow trunk changes.
2017-03-28 17:02:28 -07:00
Dmitry Babokin
455a29c491 Renaming AttributeSet to AttributeList to follow trunk changes. 2017-03-28 16:58:49 -07:00
Dmitry Babokin
a618ad45bf Merge pull request #1263 from dbabokin/cbackend
Better fix for cbackend.
2017-03-22 13:29:06 -07:00
Dmitry Babokin
0ff8ae4596 Better fix cbackend. 2017-03-22 13:27:26 -07:00
Dmitry Babokin
a5b689439b Merge pull request #1262 from dbabokin/trunk_fix
Trunk fix
2017-03-22 12:33:39 -07:00
Dmitry Babokin
f9947541a1 Whitespace fixes 2017-03-22 12:32:26 -07:00
Dmitry Babokin
c2b2b38081 Fix for trunk. Probably it's temporary if they fix -- operator for arg_iterator in trunk. 2017-03-22 12:31:41 -07:00
Dmitry Babokin
7884c7da04 Merge pull request #1260 from dbabokin/alloy
For naming folders with llvm use dot instead of underscore.
2017-03-02 13:25:21 -08:00
Dmitry Babokin
b471e97a10 For naming folders with llvm use dot instead of underscore. 2017-03-02 13:24:15 -08:00
Dmitry Babokin
611fe0bc42 Merge pull request #1259 from dbabokin/llvm50
Enabling LLVM 5.0 and making fixes to track changes in LLVM for the past couple months.
2017-03-01 11:39:53 -08:00
Dmitry Babokin
6d649e1dff Enabling LLVM 5.0 and making fixes to track changes in LLVM for the past
couple months.
The changes are tested with LLVM 3.9, 4.0 and trunk on MacOS (sse4,
avx2, skx).
2017-03-01 11:10:34 -08:00
Dmitry Babokin
d0bfe7738a Merge pull request #1245 from dbabokin/git
Adding support for git repository instead of svn.
2016-12-01 22:11:58 +03:00
Dmitry Babokin
95d33554db Merge pull request #1244 from dbabokin/trunk_fix
Fix for trunk - change in DIBuilder interface
2016-12-01 22:10:45 +03:00
Dmitry Babokin
4298e3d0cd Fix for trunk - change in DIBuilder interface 2016-12-01 22:00:36 +03:00
Dmitry Babokin
a7fd70fa21 Adding support for using git repository instead of svn.
This is experimental for now, but going forward this will become
primary way of working with LLVM, as they are going to switch to
git in not too distant future.
2016-12-01 18:10:57 +03:00
Dmitry Babokin
60dc47e0a6 Merge pull request #1242 from dbabokin/fixes
SVML support for AVX512 and a couple of script fixes
2016-12-01 00:33:34 +03:00
Dmitry Babokin
ff298f21b7 Adding SVML support to AVX512 targets 2016-11-30 05:27:10 +03:00
Dmitry Babokin
f04a04a7e3 Set sysroot for CMake build on MacOS 2016-11-29 21:04:46 +03:00
Dmitry Babokin
39e7f0c2d4 3.9.0 is better choice for us. 3.9.1 has couple regressions 2016-11-29 21:03:53 +03:00
Dmitry Babokin
726b260cd5 Merge pull request #1236 from suluke/llvm_change/BitcodeWriter
Support llvm 4.0: Bitcode/ReaderWriter.h -> BitCode/BitcodeWriter.h
2016-11-15 02:02:54 +03:00
Lukas Böhm
6a8ce4b412 Apply Bitcode/ReaderWriter renaming in builtins.cpp
This also fixes usage of parseBitcodeFile after [r286752](https://reviews.llvm.org/D26562)
2016-11-14 23:13:08 +01:00
Lukas Böhm
32626ea9e3 Support llvm 4.0: Bitcode/ReaderWriter.h -> BitCode/BitcodeWriter.h 2016-11-14 21:38:25 +01:00
Dmitry Babokin
d4a8afd6e8 Merge pull request #1230 from Shishpan/trunkFix
Trunk fix for Rev.283004
2016-10-05 14:29:21 +03:00
Andrey Shishpanov
8acfd92f92 Trunk fix for Rev.283004 2016-10-05 14:17:14 +03:00
Dmitry Babokin
7fb4188f51 Merge pull request #1229 from Shishpan/trunkFix
Trunk fix for Rev.281284-281285.
2016-09-26 20:47:23 +03:00
Andrey Shishpanov
8b525bb8bc Trunk fix for Rev.281284-281285. 2016-09-26 20:24:36 +03:00
Dmitry Babokin
a86a16600b Merge pull request #1228 from Shishpan/trunkFix
Trunk fix for Rev.280686.
2016-09-07 14:11:37 +03:00
Andrey Shishpanov
d0341754d6 Trunk fix for Rev.280686. 2016-09-07 13:08:04 +03:00
Dmitry Babokin
f968bc1b2a Merge pull request #1227 from ned14/arm-neon-code-quality-fix
Fix ARM NEON output not always being inlined. Also improved scope for ARM NEON optimisation by LLVM, gained about 2% on my code here.
2016-09-05 18:07:46 +03:00
Niall Douglas (s [underscore] sourceforge {at} nedprod [dot] com)
7af7659ac2 Fix ARM NEON output not always being inlined. Also improved scope for ARM NEON optimisation by LLVM, gained about 2% on my code here. 2016-09-05 15:56:25 +01:00
Dmitry Babokin
a6952fd651 Merge pull request #1226 from dbabokin/test-fix
Fixing off by 1 access to local array.
2016-08-31 19:47:51 +03:00
Dmitry Babokin
4c7fb35f57 Fixing off by 1 access to local array. 2016-08-31 19:38:33 +03:00
Dmitry Babokin
87efb27dc5 Merge pull request #1225 from dbabokin/llvm40
Adding support for LLVM 4.0 (trunk)
2016-07-20 22:19:56 +03:00
Dmitry Babokin
45b306480e -Adding support for LLVM 4.0
-Switching 3.9 support to branch/release_39
-Switching 3.8 support to tags/release_381
2016-07-20 22:16:50 +03:00
Dmitry Babokin
2a68fc6c48 Merge pull request #1222 from dbabokin/192dev
Bumping version to 1.9.2dev
2016-07-08 22:03:49 +03:00
Dmitry Babokin
30d88e1683 Bumping version to 1.9.2dev 2016-07-08 21:44:59 +03:00
Dmitry Babokin
a97a69c96e Typo in Release Notes 2016-07-08 19:52:22 +03:00
Dmitry Babokin
87d0c9a2ed Merge pull request #1221 from dbabokin/191
1.9.1 release
2016-07-08 14:13:52 +03:00
Dmitry Babokin
147d16d49e Bumping version to 1.9.1 2016-07-08 14:11:33 +03:00
Dmitry Babokin
70f411148e Doc update for 1.9.1 2016-07-08 14:03:26 +03:00
Dmitry Babokin
6e8be22c8b Moving SKX_AVX512 under LLVM 3.8 2016-07-08 13:15:55 +03:00
Dmitry Babokin
e0ff380810 Merge pull request #1220 from dbabokin/skx
Adding SKX target to Windows build.
2016-07-06 11:58:46 +03:00
Dmitry Babokin
fb7b38ab59 Adding SKX target to Windows build. 2016-07-06 11:56:53 +03:00
Dmitry Babokin
1f7a7bef60 Merge pull request #1219 from dbabokin/knl_int8_mask
AVX-512 int8 mask
2016-07-04 17:17:56 +03:00
Dmitry Babokin
15cbb6fc95 Mark new intrinsic as internal functions. 2016-07-04 17:01:47 +03:00
Dmitry Babokin
f34e1093fb Take into account target data type width instead of mask bit width, when
deciding on the type of small integers data type (int8/int16).
2016-07-04 17:01:46 +03:00
Vsevolod Livinskiy
f47e1d5cae [AVX-512] Replace i1 mask with i8 2016-07-04 17:01:46 +03:00
Dmitry Babokin
bc8bd3a189 Merge pull request #1218 from dbabokin/dwarf
Do not add DWARF version before LLVM 3.5
2016-07-04 16:59:45 +03:00
Dmitry Babokin
c6ae79cbc8 Do not add DWARF version before LLVM 3.5 2016-07-04 16:57:31 +03:00
Dmitry Babokin
528f6cef1c Merge pull request #1217 from dbabokin/dwarf
Adding --dwarf-version=X option to control emitted DWARF version.
2016-07-04 16:23:36 +03:00
Dmitry Babokin
d8b353ac98 Adding --dwarf-version=X option to control emitted DWARF version. 2016-07-04 16:13:06 +03:00
Dmitry Babokin
e3837dcc63 Merge pull request #1215 from julupu/master
Added HPX task system option
2016-06-20 12:19:37 +03:00
Julian Wolf
f9127b2d16 Added HPX task system option 2016-06-18 16:36:41 +02:00
Dmitry Babokin
e05039d623 Merge pull request #1214 from Shishpan/old38Fix
A small vital addition for r252219.
2016-06-17 17:04:42 +03:00
Andrey Shishpanov
cd7806e9c0 A small vital addition for r252219. 2016-06-17 16:55:27 +03:00
Dmitry Babokin
aea10892d3 Merge pull request #1209 from dbabokin/build_fix
Fix for LLVM trunk - TargetMachine interface has changed.
2016-05-25 14:50:32 +03:00
Dmitry Babokin
d6bacd0c7b Fix for LLVM trunk - TargetMachine interface has changed and now
requires Optional<Reloc::Model> instead of just Reloc::Model.
2016-05-25 14:46:53 +03:00
Dmitry Babokin
c36095197d Merge pull request #1207 from Shishpan/trunkFix
Fix for LLVM 3.9 trunk, lStripUnusedDebugInfo() made empty.
2016-05-13 16:06:35 +03:00
Andrey Shishpanov
27d183d42f Fix for LLVM 3.9 trunk, lStripUnusedDebugInfo() made empty. 2016-05-13 16:02:43 +03:00
Dmitry Babokin
60d27ce4be Merge pull request #1205 from Shishpan/trunkFix
Fix for Revision 269218
2016-05-12 17:02:22 +03:00
Andrey Shishpanov
f0cd6d6902 Fix for Revision 269218 2016-05-12 16:24:14 +03:00
Dmitry Babokin
b2c9e43f04 Merge pull request #1204 from dbabokin/build_fix
Fix for one of build problems - clang complains on enum used in va-arg function.
2016-05-12 14:49:25 +03:00
Dmitry Babokin
de59f2e077 Fix for one of build problems - clang complains on enum used in va-arg
function.
2016-05-12 14:37:22 +03:00
Dmitry Babokin
2ae7e07466 Merge pull request #1203 from dbabokin/make_doc
Updating Makefile warning message to use LLVM 3.8.
2016-05-10 23:05:06 +03:00
Dmitry Babokin
89fa6a4e75 Updating Makefile warning message to use LLVM 3.8. 2016-05-10 23:03:35 +03:00
Dmitry Babokin
8a92d557f4 Merge pull request #1199 from jkozdon/vim_syntax
Adding taskCount and taskIndex vim syntax file
2016-05-10 22:06:58 +03:00
Jeremy Edward Kozdon
99d1d2ffe3 [vim] Adding new and delete for allocation 2016-05-04 09:06:28 -07:00
Jeremy Edward Kozdon
0fdfaad053 Adding taskCount[0-3] 2016-04-29 13:37:37 -07:00
Jeremy Edward Kozdon
5db1e5fa39 Fixing task classification 2016-04-29 11:48:52 -07:00
Jeremy Edward Kozdon
b2d98b9979 Adding taskCount and taskIndex vim syntax file 2016-04-29 11:37:54 -07:00
Dmitry Babokin
8258163c31 Merge pull request #1195 from Twinklebear/master
Fix C compatibility of ISPCInstrument declaration in export header
2016-04-20 04:32:56 +04:00
Will Usher
c8b45093f0 Fix C compatibility of ISPCInstrument declaration 2016-04-19 18:16:04 -06:00
Dmitry Babokin
3d9c0ecd14 Merge pull request #1194 from Shishpan/foreach_tiled_fix
Fix for a bug in foreach_tiled, which caused iterations with full off mask.
2016-04-12 19:34:35 +04:00
Andrey Shishpanov
ab4cd1e8a2 Fixed the last useless step in foreach_tiled in some cases 2016-04-12 00:29:17 +03:00
Dmitry Babokin
2fab324924 Merge pull request #1193 from Shishpan/skx_patches
Replaced SKX patches with cumulative one, added some changes, …
2016-04-08 12:00:56 +04:00
Andrey Shishpanov
e04d21b997 Replaced SKX patches with cumulative one, added some changes, switched to CMAKE configuration for LLVM 3.8 and newer versions. 2016-04-07 15:56:06 +03:00
Dmitry Babokin
1efc4c8f1f Merge pull request #1191 from Shishpan/skx_patches
Added patches for SKX capability with LLVM 3.8.
2016-03-31 12:41:24 +04:00
Andrey Shishpanov
7af8f96021 Added patches for SKX capability with LLVM 3.8. 2016-03-25 18:11:28 +03:00
Dmitry Babokin
3c5b88dabf Merge pull request #1190 from pSub/fix-build-with-gcc5
Fix build with gcc5
2016-03-24 20:11:36 +03:00
Pascal Wittmann
13b748a57c Fix build with gcc5 2016-03-24 18:06:22 +01:00
Dmitry Babokin
3957dccb51 Merge pull request #1189 from jkozdon/master
Adding foreach* for vim syntax file
2016-03-24 19:21:42 +03:00
Jeremy Edward Kozdon
49919aac70 Updating last modified in vim syntax file 2016-03-24 09:19:08 -07:00
Jeremy Edward Kozdon
8ec078f3e1 Adding foreach* for vim syntax file 2016-03-24 09:14:25 -07:00
Dmitry Babokin
d80a3ed4b7 Merge pull request #1188 from Shishpan/nightly_fixes
Removed redundant check for knl, skx targets' "-march" key.
2016-03-17 13:55:31 +03:00
Andrey Shishpanov
6a01712cfd Removed redundant check for knl, skx targets' "-march" key. 2016-03-17 13:11:45 +03:00
Dmitry Babokin
0d9b120b02 Merge pull request #1187 from Shishpan/nightly_fixes
Hotfix for nightly fails with clang3.4.
2016-03-14 13:54:02 +03:00
Andrey Shishpanov
c867a87727 Hotfix for nightly fails with clang3.4. 2016-03-14 13:23:45 +03:00
Dmitry Babokin
3ea77d7ab9 Merge pull request #1185 from Shishpan/nightly_fixes
Nightly fixes
2016-03-13 21:24:01 +03:00
Dmitry Babokin
533db89c76 Merge pull request #1186 from Shishpan/trunkFix
Fix for Revision 263208
2016-03-13 21:19:36 +03:00
Andrey Shishpanov
c7e6b8302e For GCC -march=knl,-march=skx are not nesessary. Moreover, not all GCC versions support this. 2016-03-13 15:18:14 +03:00
Andrey Shishpanov
d88240d227 Without this fix, the testing of SDE targets with GCC is impossible. 2016-03-13 15:07:36 +03:00
Andrey Shishpanov
5a2f0dcb4a Fix for Revision 263208 2016-03-12 21:27:58 +03:00
Dmitry Babokin
306f3468c3 Merge pull request #1184 from Shishpan/fix_knl_skx_target_ifelse
Fixed ifelse in rsqrt, rcp def. for knl, skx (compfails with old LLVM).
2016-03-11 16:18:15 +03:00
Andrey Shishpanov
7691d961c1 Fixed ifelse in rsqrt, rcp def. for knl, skx (compfails with old LLVM). 2016-03-11 15:32:59 +03:00
Dmitry Babokin
4f49ac4cb0 Merge pull request #1183 from dbabokin/master
Switching to LLVM 3.8 to official release
2016-03-09 17:30:56 +03:00
Dmitry Babokin
58bf32c27f Switching to LLVM 3.8 to official release 2016-03-09 17:30:17 +03:00
Dmitry Babokin
87d079bd30 Merge pull request #1182 from Shishpan/targetFix
Fixed nightly strange targets
2016-03-01 23:59:43 +03:00
Andrey Shishpanov
0061ecf325 Fixed nightly strange targets 2016-03-01 21:11:10 +03:00
Dmitry Babokin
1709dbe9d6 Merge pull request #1174 from dbabokin/examples_avx512knl
Adding avx512knl-i32x16 target to examples Makefiles
2016-02-25 17:17:53 +03:00
Dmitry Babokin
69dd01fc69 Adding avx512skx-i32x16 target to examples 2016-02-25 17:17:17 +03:00
Dmitry Babokin
1a0f5a0510 Merge pull request #1179 from Shishpan/SKX_def
Added SKX target for testing in alloy.py
2016-02-25 16:39:20 +03:00
Andrey Shishpanov
0ad8dfe20d Added SKX target for testing in alloy.py 2016-02-25 16:36:14 +03:00
Dmitry Babokin
1415f5b0b1 Adding avx512knl-32x16 target to examples Makefiles 2016-02-25 14:16:31 +03:00
Dmitry Babokin
722c65d7b6 Merge pull request #1172 from Shishpan/SKX_def
added SKX target definition
2016-02-25 14:01:35 +03:00
Dmitry Babokin
b25dbfa667 Merge pull request #1178 from Shishpan/update_fail_db.txt
Updated fail_db.txt according to the old one.
2016-02-25 12:48:16 +03:00
Andrey Shishpanov
5f6e6e708c Updated fail_db.txt according to the old one. 2016-02-25 01:32:40 +03:00
Andrey Shishpanov
1324e6cdd5 added SKX target definition 2016-02-25 00:43:58 +03:00
Dmitry Babokin
28227f1a8b Merge pull request #1176 from Shishpan/trunkFix
Fix for Revision 261498
2016-02-24 22:29:36 +03:00
Andrey Shishpanov
b4aa685d52 Fix for Revision 261498 2016-02-23 17:35:26 +03:00
Dmitry Babokin
5862c9e6a5 Merge pull request #1173 from Shishpan/trunkFix
Fix for Revision 261203.
2016-02-19 14:37:35 +03:00
Andrey Shishpanov
1f87784f05 Fix for Revision 261203. 2016-02-19 14:24:11 +03:00
Dmitry Babokin
e2bf021bf7 Merge pull request #1171 from dbabokin/v191dev
Bumping to v1.9.1dev
2016-02-12 20:06:52 +03:00
Dmitry Babokin
33054bcacd Bumping to v1.9.1dev 2016-02-12 19:57:34 +03:00
Dmitry Babokin
89dfbf2125 Merge pull request #1170 from dbabokin/v190_docs
v1.9.0
2016-02-12 18:36:29 +03:00
Dmitry Babokin
e47c9152ec Bumping version to v1.9.0 2016-02-12 16:41:55 +03:00
Dmitry Babokin
e37f4691b0 Release Notes for v1.9.0 2016-02-12 16:41:55 +03:00
Dmitry Babokin
320a54b023 Docs update:
- info on v1.9.0 compatibilty
 - KNL support info
 - copyright update
2016-02-12 16:41:55 +03:00
Dmitry Babokin
65f290b699 Correcting Target::ISAToString() to report AVX512KNL as "avx512knl", but not "avx512knl-i32x16".
It affects reporting, mangling and ISPC_TARGET_* macro.
2016-02-12 16:12:33 +03:00
Dmitry Babokin
1c0f0920f9 Merge pull request #1167 from Vsevolod-Livinskij/fix-var-struct
Fix for struct with undound members
2016-02-06 20:00:47 +03:00
Vsevolod Livinskiy
6ea03bb3af Fix for struct with undound members 2016-02-06 17:08:17 +03:00
Dmitry Babokin
e1cb6ca15b Merge pull request #1166 from Vsevolod-Livinskij/varying-struct
Fix cast from uniform struct to varying struct.
2016-02-05 14:57:57 +03:00
Vsevolod Livinskiy
0ef3d3b429 Define dereferencing varying pointer to uniform struct with 'bound uniform' member. 2016-02-05 14:49:42 +03:00
Vsevolod Livinskiy
243d6c2625 Fix cast from uniform struct to varying struct. 2016-02-05 08:32:35 +03:00
Dmitry Babokin
4c65b4886f Merge pull request #1165 from dbabokin/copyright
Updating copyright dates for recently modified files
2016-02-04 15:15:04 +03:00
Dmitry Babokin
f6dbffd58c Updating copyright dates for recently modified files 2016-02-04 15:14:19 +03:00
Dmitry Babokin
094d6f6d2e Merge pull request #1163 from Vsevolod-Livinskij/revert-setcc
Revert SETCC patch and update faildb
2016-02-03 14:57:08 +03:00
Vsevolod Livinskiy
57588857b3 Revert SETCC patch and update faildb 2016-02-03 14:50:28 +03:00
Dmitry Babokin
6160091afd Merge pull request #1160 from Shishpan/genericImpl
Impl. for knl-generic target (fix compfails)
2016-02-03 12:03:46 +03:00
Dmitry Babokin
1c8f606fd2 Merge pull request #1162 from dbabokin/win_build
Win build
2016-02-02 17:37:37 +03:00
Dmitry Babokin
9ae39d8361 Adding information about VS support in --version output. 2016-02-02 16:31:11 +03:00
Dmitry Babokin
988e506b1c Adding ispc_version.h to VS project 2016-02-02 15:51:38 +03:00
Andrey Shishpanov
3920874a35 Impl. for knl-generic target (fix compfails) 2016-02-02 12:11:13 +03:00
Dmitry Babokin
9eedbeeb72 Merge pull request #1159 from Vsevolod-Livinskij/error-check
Add error check during multi-target compilation.
2016-02-01 16:53:52 +03:00
Vsevolod Livinskiy
cf3cf01a6c Add error check during multi-target compilation. 2016-02-01 16:45:33 +03:00
Dmitry Babokin
90bed24b99 Merge pull request #1158 from Vsevolod-Livinskij/setcc-lowering
Add patch for SETCC lowering
2016-02-01 14:15:09 +03:00
Vsevolod Livinskiy
9ed0bb240d Add patch for SETCC lowering 2016-02-01 14:09:36 +03:00
Dmitry Babokin
b0cd3924e5 Merge pull request #1157 from dbabokin/38_branch
Switching back to 3.8 branch
2016-02-01 12:45:11 +03:00
Dmitry Babokin
b41f53b333 Switching back to 3.8 branch 2016-02-01 12:40:24 +03:00
Dmitry Babokin
4ae2333c93 Merge pull request #1156 from dbabokin/cmake2
Adding support for gcc toolchain in LLVM cmake build (alloy.py)
2016-01-30 01:58:44 +03:00
Dmitry Babokin
d59b70d0ea Adding support for gcc toolchain in LLVM cmake build 2016-01-30 01:24:05 +03:00
Dmitry Babokin
4e24a94ff4 Merge pull request #1155 from dbabokin/cmake
improvements in cmake build support
2016-01-29 23:29:39 +03:00
Dmitry Babokin
40718c0ad3 Adding LLVM_ENABLE_ASSERTIONS and list of targets to cmake build. 2016-01-29 23:25:46 +03:00
Dmitry Babokin
06b9b46fb3 Adding support for using LLVM build with -DNDEBUG 2016-01-29 23:23:59 +03:00
Dmitry Babokin
1370596c5e Merge pull request #1154 from Shishpan/fixNewLLVMBuildSystem
temp fix for new llvm cmake build system, without gcc-toolchain & some keys
2016-01-29 21:47:54 +03:00
Andrey Shishpanov
4e3d8540ff temp fix for new llvm cmake build system, without gcc-toolchain & some keys 2016-01-29 15:32:51 +03:00
Dmitry Babokin
ccd8592e46 Merge pull request #1153 from ncos/trunk-fixing
Remove passes compfails; Black out avx-512 gather/scatter llc fails
2016-01-27 19:17:37 +03:00
Anton Mitrokhin
afb1132748 Remove passes compfails; Black out avx-512 gather/scatter llc fails (tracked in https://llvm.org/bugs/show_bug.cgi?id=26338) 2016-01-27 19:13:02 +03:00
Dmitry Babokin
d82a1d018e Merge pull request #1152 from Vsevolod-Livinskij/omit-frame-ptr
Add no-frame-pointer-elim attribute to all functions
2016-01-27 15:59:56 +03:00
Vsevolod Livinskiy
f6773a318f Add no-frame-pointer-elim attribute to all functions 2016-01-27 15:54:40 +03:00
Dmitry Babokin
7fc57abb23 Merge pull request #1151 from dbabokin/win_build
Win build - fixing to make it work with 3.8
2016-01-27 15:11:08 +03:00
Dmitry Babokin
e7b411f5e9 Fixing Windows build with 3.8:
- Removing LLVMipa.lib
 - Adding LLVMIRReader.lib
 - Adding -DNDEBUG to release build
2016-01-27 14:13:15 +03:00
Dmitry Babokin
c82cfc7f4a Fixing build warnings on Windows 2016-01-27 11:55:31 +03:00
Dmitry Babokin
e6edf80aa4 Merge pull request #1150 from dbabokin/comment_fix
Fix --no-omit-frame-pointer help message
2016-01-26 22:12:30 +03:00
Dmitry Babokin
5b682b36f1 Fix --no-omit-frame-pointer help message 2016-01-26 22:09:27 +03:00
Dmitry Babokin
4f6c70aee0 Merge pull request #1148 from Vsevolod-Livinskij/frame-ptr
Add --no-omit-frame-pointer option.
2016-01-22 16:43:06 +03:00
Vsevolod Livinskiy
e4a672483f Add --no-omit-frame-pointer option 2016-01-22 16:32:29 +03:00
Dmitry Babokin
b2471decb7 Merge pull request #1149 from ncos/trunk-fixing
[KNC]: Add gather_base_offsets64_double function to knc.h
2016-01-22 15:44:34 +03:00
Anton Mitrokhin
eff3ae65d8 [KNC]: Add gather_base_offsets64_double function to knc.h 2016-01-22 11:30:49 +03:00
Dmitry Babokin
9dbb839146 Merge pull request #1147 from dbabokin/38_rc1
Moving 3.8 from branch to rc1
2016-01-20 21:45:36 +03:00
Dmitry Babokin
6633d69e65 Moving 3.8 from branch to rc1 2016-01-20 21:44:58 +03:00
Dmitry Babokin
b4bbc7b1f3 Merge pull request #1146 from Vsevolod-Livinskij/intr_pass
Rearrange IntrinsicsOpt pass. This fixes bunch of fails, which show up from time to time due to memory corruption.
2016-01-20 20:25:36 +03:00
Vsevolod Livinskiy
6f886b2457 Rearrange IntrinsicsOpt pass 2016-01-20 20:21:39 +03:00
Dmitry Babokin
042c662d9d Merge pull request #1145 from dbabokin/master
LLVM 3.9 version support
2016-01-14 17:29:21 +03:00
Dmitry Babokin
536b30e12c LLVM 3.9 version support 2016-01-14 17:26:38 +03:00
Dmitry Babokin
44d8a2e01f Merge pull request #1143 from dbabokin/fix_trunk
Track changes in createFunctionAttrsPass().
2016-01-12 13:55:20 +03:00
Dmitry Babokin
b471cdb56f Track changes in createFunctionAttrsPass(). 2016-01-12 13:54:12 +03:00
Dmitry Babokin
b8ad55019c Merge pull request #1142 from dbabokin/linker_fix
Move only unused declaration, the rest will be moved by Linker.
2015-12-25 14:25:29 +03:00
Dmitry Babokin
b67d7f0b6a Move only unused declaration, the rest will be moved by Linker. 2015-12-25 14:05:21 +03:00
Dmitry Babokin
5ac46e5134 Merge pull request #1141 from dbabokin/master
Fix to make Linker work as it used to before making it too smart.
2015-12-24 20:44:04 +03:00
Dmitry Babokin
bdae4d5e23 Fix to make Linker work as it used to before making it too smart. 2015-12-24 20:34:49 +03:00
Dmitry Babokin
35c97512b5 Merge pull request #1139 from ncos/unmasked
Fix up return statements in unmasked regions
2015-12-21 22:22:35 +03:00
Anton Mitrokhin
d862672664 Fix up return statements in unmasked regions 2015-12-21 20:13:47 +03:00
Dmitry Babokin
10143eb716 Merge pull request #1138 from ncos/trunk-fixing
Fix ISPC after LLVM r255842. Fix for build problem, though there's runtime problem still.
2015-12-21 15:06:12 +03:00
Anton Mitrokhin
a6f0aade5c Fix ISPC after LLVM r255842 2015-12-21 14:59:14 +03:00
Dmitry Babokin
ac8fef0f22 Merge pull request #1133 from dbabokin/build_fix
Fix for change in LLVM in llvm::Linker::linkModules()
2015-12-11 18:06:31 +03:00
Dmitry Babokin
839c51ed5b Fix for change in LLVM in llvm::Linker::linkModules() 2015-12-11 18:05:14 +03:00
Dmitry Babokin
129f06488b Merge pull request #1131 from aristidb/no-isa-crash-reorder
Fix crash when not specifying --target option
2015-12-08 16:10:18 +03:00
Aristid Breitkreuz
fe9b4a6ac5 move strcascmp on isa to _after_ comparing isa against NULL 2015-12-05 11:58:40 +01:00
Dmitry Babokin
2ac8ad2063 Merge pull request #1129 from Shishpan/trunkFix
Fix for rev. 254449. Changed linkModules() API
2015-12-03 13:59:23 +03:00
Andrey Shishpanov
da7c2740f4 Fix for rev. 254449. Changed linkModules() API 2015-12-03 13:24:33 +03:00
Dmitry Babokin
bb5cbf420c Merge pull request #1128 from ncos/knc-fixes
Fix fabs intrinsic in generic target (fail after LLVM r249702)
2015-12-01 10:48:22 +03:00
Anton Mitrokhin
12ac782b13 fix fabs intrinsic in generic target (fail after LLVM r249702) 2015-11-30 16:43:45 +03:00
Dmitry Babokin
d3020580ff Merge pull request #1127 from sisupov/branch0
Fix issue #1122
2015-11-24 15:30:11 +03:00
Sergey Isupov
c83674433b Fix issue #1122 2015-11-24 15:27:07 +03:00
Dmitry Babokin
0e58415bdb Merge pull request #1126 from ncos/native-knl
Fix bug in cbackend
2015-11-23 21:26:42 +03:00
Anton Mitrokhin
9a477ee926 Fix bug in cbackend 2015-11-23 16:52:07 +03:00
Dmitry Babokin
2c51fa54d0 Merge pull request #1125 from thebusytypist/fix-win32-debug-build
Try to fix unresolved symbols in Win32 Debug build.
2015-11-23 14:52:33 +03:00
thebusytypist
e23821f282 Fix unresolved symbol in Win32 Debug build.
Add an additional dependency LLVMProfileData.lib.
2015-11-20 16:48:54 -08:00
Dmitry Babokin
5afb671764 Merge pull request #1123 from Shishpan/fix_knl_generic_comp
Remove __abs_i32i64() which is heritage from knc.h and crashes compilation for knl.
2015-11-19 18:29:03 +03:00
Andrey Shishpanov
5884e7c3f1 deleted useless __abs_i32i64() used in knc.h 2015-11-19 18:18:28 +03:00
Dmitry Babokin
d4ebb3544d Merge pull request #1121 from ncos/native-knl
[AVX512]: Make blend optimizations possible for avx512
2015-11-11 17:46:28 +03:00
Anton Mitrokhin
ef51f8c648 [AVX512]: Make blend optimizations possible for avx512 2015-11-10 18:16:43 +03:00
Dmitry Babokin
66b1499ee0 Merge pull request #1120 from ncos/trunk-fixing
ISPC trunk fix after LLVM commits 252219 and 252380
2015-11-10 16:09:03 +03:00
Dmitry Babokin
48ffdcf169 Merge pull request #1119 from Vsevolod-Livinskij/di_fix
Remove obsolete cast.
2015-11-10 16:07:01 +03:00
Anton Mitrokhin
1948a93584 Fix ISPC build fail after LLVM commit 252219 2015-11-10 15:19:38 +03:00
Vsevolod Livinskiy
e7fd7dd6fd Remove obsolete cast. 2015-11-10 14:49:02 +03:00
Anton Mitrokhin
e2efcb50f3 Fix ISPC build fail after LLVM commit 252380 2015-11-10 14:44:17 +03:00
Dmitry Babokin
a3799fd5d0 Merge pull request #1117 from ncos/native-knl
[AVX-512]: transcendentals: add exp() implementation, no functional change
2015-11-03 13:41:17 +03:00
Anton Mitrokhin
28b402a778 [AVX-512]: transcendentals: add exp() implementation, TODO: log() and pow() 2015-11-03 13:34:11 +03:00
Dmitry Babokin
fb30a56cc3 Minor fixes 2015-11-02 23:20:15 +03:00
jbrodman
d8fd909d4f Merge pull request #1116 from jbrodman/master
Only use SROA for llvm 3.7 or newer.
2015-11-02 15:14:07 -05:00
jbrodman
37d67549bd Only use SROA for llvm 3.7 or newer. 2015-11-02 16:02:43 -05:00
jbrodman
00077390ed Replace old ScalarReplAggregates pass with no longer new SROA pass. 2015-11-02 15:26:28 -05:00
jbrodman
8f7f7d2cb7 Merge pull request #1115 from jbrodman/master
Replace old ScalarReplAggregates pass with no longer new SROA pass.
2015-11-02 14:38:25 -05:00
Dmitry Babokin
6732d9cb64 Merge pull request #1114 from Vsevolod-Livinskij/patch_3_7
Apply R251275 on 3.7
2015-10-28 20:54:41 +03:00
Vsevolod Livinskiy
b8413900bb Apply R251275 on 3.7 2015-10-28 19:35:59 +03:00
Dmitry Babokin
56dc2279b1 Merge pull request #1113 from ncos/stmt_zeroptr
Use 'dyn_cast_or_null' instead of 'dyn_cast'
2015-10-27 17:04:53 +03:00
Anton Mitrokhin
ed71f684be Use 'dyn_cast_or_null' instead of 'dyn_cast' 2015-10-27 17:46:47 +03:00
Dmitry Babokin
16ae04c7f9 Merge pull request #1112 from ncos/buildfails
IR change for x86 mask load/store instructions in LLVM 3.8 (r250817)
2015-10-21 17:51:19 +03:00
Anton Mitrokhin
434aa30d12 IR change for x86 mask load/store instructions in LLVM 3.8 (r250817) 2015-10-21 17:41:13 +03:00
Dmitry Babokin
df643cecbd Merge pull request #1110 from dbabokin/dispatch_module
Set target tripple and data layout for dispatch module.
2015-10-20 19:04:26 +03:00
Dmitry Babokin
f0a0e2d75d Set target tripple and data layout for dispatch module. 2015-10-20 18:59:09 +03:00
Dmitry Babokin
4a3f2016d4 Merge pull request #1109 from Shishpan/trunkFix
new createSubroutineType() API
2015-10-15 15:15:24 +03:00
Andrey Shishpanov
3dbdeee5f6 new createSubroutineType() API 2015-10-15 15:45:15 +03:00
Dmitry Babokin
374e445071 Merge pull request #1107 from ncos/knc-fixes
[KNC:] shadow test fails caused by vec16_i64 perverted memory layout
2015-10-14 10:58:22 +03:00
Anton Mitrokhin
44f06304d2 - [KNC:] shadow test fails caused by vec16_i64 perverted memory layout 2015-10-14 10:48:14 +03:00
Dmitry Babokin
88461fb527 Merge pull request #1104 from pavolzetor/master
Use find() method on string to run on both Python 2 and 3
2015-10-05 11:51:25 +03:00
Pavol Klacansky
41199179de Use find() method on string to run on both Python 2 and 3 2015-10-03 11:56:00 +00:00
Dmitry Babokin
1d08598f6e Merge pull request #1102 from ncos/knc-fixes
Removed dirty workaround that fixed 'soa-18.ispc' test. It broke badly in a different scenario
2015-09-30 18:31:11 +03:00
Anton Mitrokhin
68e686df9f Removed dirty workaround that fixed 'soa-18.ispc' test. It broke badly in a different scenario 2015-09-30 17:20:38 +03:00
Dmitry Babokin
8bf8911893 Merge pull request #1101 from ncos/fail_db-update
Removed passes compfails/runfails
2015-09-21 10:54:41 +03:00
Anton Mitrokhin
b7839f6242 Removed passes compfails/runfails 2015-09-21 11:37:20 +03:00
Dmitry Babokin
c9de707455 Merge pull request #1100 from Shishpan/fixIssue#957
replaced useless bitSize with explicit byteSize - fix for #957
2015-09-17 15:29:41 +03:00
Andrey Shishpanov
293e4bccae replaced useless bitSize with explicit byteSize 2015-09-17 15:53:50 +03:00
Dmitry Babokin
f9cccad370 Merge pull request #1098 from ncos/alloy_modifications
LLVM 3.7 rc4 --> final in alloy.py
2015-09-15 10:48:45 +03:00
Anton Mitrokhin
d1d5a42be0 LLVM 3.7 rc4 --> final in alloy.py 2015-09-15 11:22:32 +03:00
Dmitry Babokin
c9c9b24beb Merge pull request #1097 from Shishpan/trunkFix
added some new names for new API
2015-09-14 10:23:28 +03:00
Andrey Shishpanov
b718ae3a52 added some new names for new API 2015-09-10 15:33:17 +03:00
Dmitry Babokin
b674a5f0ed Merge pull request #1095 from ncos/alloy_modifications
Switched 3.7 branch in alloy.py from rc2 to rc4
2015-09-01 23:06:31 +03:00
Anton Mitrokhin
3340d2add2 Switched 3.7 branch in alloy.py to from rc2 to rc4 2015-09-01 20:05:46 +03:00
Dmitry Babokin
c7cdf66d8e Merge pull request #1093 from Shishpan/trunkFix
trunk 3.8 fix
2015-08-24 11:38:57 +03:00
Andrey Shishpanov
9a50b60b0c trunk 3.8 fix: all functions from llvm::initializeIPA() already included in the function llvm::initializeAnalysis() 2015-08-24 01:02:21 +03:00
Dmitry Babokin
e164804f43 Merge pull request #1092 from Shishpan/ternaryFix
Added unsafe case of division with "all on" mask, fix for #1083.
2015-08-21 19:14:31 +03:00
Andrey Shishpanov
bcdef9ea71 added unsafe case of division with "all off" mask 2015-08-21 19:52:34 +03:00
Dmitry Babokin
5d509b69c8 Merge pull request #1091 from ncos/knl-rename
Fixed AVX-512 IR incompatibility issue
2015-08-17 18:22:23 +03:00
Anton Mitrokhin
7448ee97f2 Fixed AVX-512 IR incompatibility issue 2015-08-17 19:06:57 +03:00
Dmitry Babokin
4c3e5c0d2b Merge pull request #1090 from ncos/knl-rename
Rewriting target-avx512-common.ll
2015-08-16 23:56:43 +03:00
Dmitry Babokin
729cc48603 Merge pull request #1089 from ncos/trunk-fixing
Trunk fixing
2015-08-16 23:56:15 +03:00
Anton Mitrokhin
d37455925f [AVX-512]: Scatters for i32/float 2015-08-16 21:50:44 +03:00
Anton Mitrokhin
d2720e2490 [AVX-512]: Gathers for float, fixed i32 gathers 2015-08-16 21:50:44 +03:00
Andrey Shishpanov
e11022c95a replaced gathers for i32 2015-08-16 21:50:44 +03:00
Anton Mitrokhin
3fff68b9c0 removed 30 AVX-512 passes compfails (with duplicatees) 2015-08-16 16:58:30 +03:00
Anton Mitrokhin
15b121039f fix ISPC trunk after LLVM commit 245019 2015-08-16 16:50:16 +03:00
Dmitry Babokin
a6c85fd0e1 Merge pull request #1088 from ncos/alloy_modifications
Changed LLVM 3.7 download path
2015-08-14 16:06:36 +03:00
Anton Mitrokhin
5e0bbf0081 Changed LLVM 3.7 download path; Fixed a small issue which might provoke alloy to icc when not necessary 2015-08-14 16:12:34 +03:00
Dmitry Babokin
ab2d14189e Merge pull request #1087 from ncos/fail_db-update
Update fail_db.txt
2015-08-12 14:43:41 +03:00
Anton Mitrokhin
197c13b3b9 Update fail_db.txt: [AVX-512]: 7 compfails on x86-64, 36 compfails on x86; Add anonymous namespace compfails 2015-08-12 14:06:10 +03:00
Dmitry Babokin
ef1d3270a7 Merge pull request #1086 from Shishpan/trunkFix
Added new header BasicAliasAnalysis.h, which appeared in 3.8.
2015-08-10 16:11:01 +03:00
Andrey Shishpanov
327af85e3e Added new header BasicAliasAnalysis.h, which appeared in 3.8. 2015-08-10 16:37:19 +03:00
Dmitry Babokin
d295356a39 Merge pull request #1082 from ncos/regular-fixes
Fix sincos issue
2015-08-03 12:15:38 +03:00
Dmitry Babokin
a80706af2b Merge pull request #1081 from ncos/trunk-fixing
Fix ISPC buildfail after LLVM API change in r243764
2015-08-03 12:13:47 +03:00
Anton Mitrokhin
9bfc3850c7 new test file for double sin/cos/sincos 2015-08-02 15:46:15 +03:00
Anton Mitrokhin
db1d817dee Remove 'readnone' attribute away from non-readnone sincos() 2015-08-02 15:04:29 +03:00
Anton Mitrokhin
f3548037fe Fix ISPC buildfail after LLVM API change in r243764 2015-08-02 13:44:42 +03:00
Dmitry Babokin
ab0e8bba00 Merge pull request #1077 from Shishpan/trunkFix
Fix for LLVM trunk
2015-07-29 03:23:51 +03:00
Andrey Shishpanov
90c6e0172a some fixes for new LLVM API 2015-07-28 18:51:13 +03:00
Dmitry Babokin
2c4c79d55e Merge pull request #1072 from Shishpan/asinFix
solution for asin trouble from report
2015-07-23 17:55:04 +03:00
Dmitry Babokin
023ba72a9a Merge pull request #1074 from ncos/trunk-fixing
fail_db.txt update
2015-07-23 12:52:11 +03:00
Anton Mitrokhin
3f9e64e0ec Removed 6 passes runfails; Duplicated 8 llvm 3.7 runfails for llvm 3.8; Several entries reordered 2015-07-23 13:35:03 +03:00
Andrey Shishpanov
3bc99c1da1 solution for asin trouble from report 2015-07-22 14:26:51 +03:00
Dmitry Babokin
bd8d78d204 Merge pull request #1073 from Shishpan/fixIssue#1068
Fixing for problem with casting reference types, issue #1068
2015-07-22 14:01:39 +03:00
Andrey Shishpanov
22a0596fca fixing trouble with casting from float reference to int, issue #1068 2015-07-22 00:22:25 +03:00
Dmitry Babokin
dff48639e9 Merge pull request #1069 from Shishpan/trunk38fix
some fixes for trunk 3.8
2015-07-19 16:21:59 +03:00
Andrey Shishpanov
d6bb5e2264 some fixes for trunk 3.8 2015-07-17 16:57:17 +03:00
Dmitry Babokin
56f83aeb38 Merge pull request #1067 from Shishpan/branchForTrunk38Fix
Trunk38 fixes
2015-07-17 14:23:12 +03:00
Andrey Shishpanov
415f2e938c some fixes for trunk 3.8 2015-07-17 14:27:28 +03:00
Dmitry Babokin
73f7f583e4 Merge pull request #1066 from ncos/native-knl
Native knl
2015-07-16 18:06:32 +03:00
Anton Mitrokhin
04987422e7 Update fail_db for native knl 2015-07-16 18:40:13 +03:00
Anton Mitrokhin
f864338ce2 Changed GEP calls in target-avx512.ll to work with old LLVM 2015-07-16 15:32:50 +03:00
Anton Mitrokhin
23ebdbe1f2 Merge branch 'master' of https://github.com/ncos/ispc into native-knl 2015-07-16 15:01:07 +03:00
Dmitry Babokin
ef5dafb745 Merge pull request #1065 from ncos/ispc-classof
Fix ordered/unordered fcmp error in cbackend.cpp
2015-07-14 11:55:09 +03:00
Anton Mitrokhin
016781027c Add a test file for ispc's isnan() function 2015-07-14 09:58:07 +03:00
Anton Mitrokhin
f187671a89 fix ordered/unordered fcmp error in cbackend.cpp 2015-07-13 18:23:36 +03:00
Dmitry Babokin
ea89cb65bb Merge pull request #1064 from egaburov/ptxfix
ptx build fix
2015-07-10 14:48:23 +03:00
Evghenii Gaburov
ad2238e880 fix ptx 2015-07-10 13:46:54 +02:00
Dmitry Babokin
cb416611ce Merge pull request #1063 from ncos/ispc-classof
Removing RTTI from ispc - it causes problem when linking with clang libraries, which are no-rtti.
2015-07-10 13:55:11 +03:00
Anton Mitrokhin
9f083f99ac classof implementations for all classes 2015-07-10 12:27:16 +03:00
Vsevolod Livinskiy
21da408832 [AVX512]: bugfixing 2015-07-10 11:34:16 +03:00
Vsevolod Livinskiy
ba10b91648 [AVX-512]: masked_store was replaced 2015-07-09 15:36:03 +03:00
Anton Mitrokhin
8217448ee5 Id's for Stmt-inherited classes 2015-07-09 14:45:33 +03:00
Vsevolod Livinskiy
25aeedb003 [AVX-512]: masked_load_float/double was replaced 2015-07-09 13:45:14 +03:00
Anton Mitrokhin
26a93bc733 Id's fo Expr-inherited classes 2015-07-09 12:38:58 +03:00
Vsevolod Livinskiy
b6d2d8dd4c [AVX-512]: rsqrt and rcp were replaced 2015-07-08 19:12:38 +03:00
Vsevolod Livinskiy
8c1bd4ec32 [AVX-512]: replace with avx512 intrinsics 2015-07-08 16:54:27 +03:00
Dmitry Babokin
2aedc1fd2b Merge pull request #1062 from Vsevolod-Livinskij/fix_for_asm
Fix for assembler file generation.
2015-07-08 15:36:34 +03:00
Vsevolod Livinskiy
b805609986 Fix for assembler file generation. 2015-07-08 14:56:18 +03:00
Anton Mitrokhin
c50ce30b00 [AVX-512]: fixed a couple of tests 2015-07-08 10:20:57 +03:00
Dmitry Babokin
3ec674a820 Merge pull request #1061 from ncos/knl-rename
Fixed default compiler value from ' ' to None
2015-06-24 11:20:10 +03:00
Anton Mitrokhin
f00b5abc40 fixed default compiler value from '' to None 2015-06-24 10:46:03 +03:00
Dmitry Babokin
0885152713 Merge pull request #1060 from ncos/trunk-fixing
Fix trunk buildfail after LLVM revision 239858
2015-06-23 18:28:44 +03:00
Anton Mitrokhin
a2dd501c11 Fix trunk buildfail after LLVM revision 239858 2015-06-23 18:19:39 +03:00
Dmitry Babokin
8de5047035 Merge pull request #1059 from ncos/knl-rename
Fix unitialized default compiler_exe
2015-06-23 17:47:10 +03:00
Anton Mitrokhin
b20d814a24 fix unitialized default compiler_exe 2015-06-23 17:35:54 +03:00
Dmitry Babokin
bf4ca847a5 Merge pull request #1057 from ncos/ispc-versions
New LLVM version macro
2015-06-17 17:50:36 +03:00
Anton Mitrokhin
ebc47d00a1 remove several redundant #if - clauses 2015-06-17 17:40:02 +03:00
Dmitry Babokin
308ccae0c5 Merge pull request #1058 from ncos/knl-rename
Allow x86 testing for native knl target; rename knc target to 'knc-generic' in alloy.py
2015-06-17 16:09:24 +03:00
Anton Mitrokhin
c5207da9e7 allow x86 testing for native knl target; rename knc target to 'knc-generic' 2015-06-17 16:01:37 +03:00
Anton Mitrokhin
0afa3f5713 New LLVM version macro 2015-06-17 10:14:28 +03:00
Dmitry Babokin
b394f05835 Merge pull request #1056 from aguskov/master
Fixed MAXINT array size limitation. Fix for #1055.
2015-06-16 20:30:45 +03:00
Andrey Guskov
73dc899ff1 Fixed MAXINT array size limitation 2015-06-06 01:16:56 +03:00
Dmitry Babokin
0b62c28436 Merge pull request #1054 from aguskov/master
Fixed compiling for Win32
2015-06-03 19:30:43 +03:00
Andrey Guskov
742870285a Fixed compiling for Win32 2015-06-03 15:53:12 +03:00
Dmitry Babokin
6ae27009c1 Merge pull request #1053 from dbabokin/v183dev
Bumping version 1.8.3dev
2015-05-30 01:05:54 +03:00
Dmitry Babokin
33771c2163 Bumping version 1.8.3dev 2015-05-30 01:05:15 +03:00
Dmitry Babokin
603d3a879c Merge pull request #1052 from dbabokin/v182
V182
2015-05-29 21:35:03 +03:00
Dmitry Babokin
cbd8afabb7 Merge pull request #1051 from dbabokin/fail_db
Fail db: KNL fails.
2015-05-29 21:34:42 +03:00
Dmitry Babokin
284d59904e Linux knl fails update 2015-05-29 21:33:13 +03:00
Dmitry Babokin
50345012c9 News update 2015-05-29 21:31:04 +03:00
Dmitry Babokin
bf0055f045 Bumping version to v1.8.2 2015-05-29 21:15:24 +03:00
Dmitry Babokin
dd81f13238 Win knl fails update 2015-05-29 21:10:13 +03:00
Dmitry Babokin
c30f08b474 Mac knl fails update 2015-05-29 20:56:33 +03:00
Dmitry Babokin
478d7b1b06 Release notes, docs update 2015-05-29 19:59:49 +03:00
Dmitry Babokin
697082e84e Merge pull request #1050 from dbabokin/dll
Moving --dllexport to proper help section under Windows ifdef
2015-05-29 18:15:40 +03:00
Dmitry Babokin
0ed674296f Moving --dllexport to proper help section under Windows ifdef 2015-05-29 18:14:38 +03:00
Dmitry Babokin
4d96244b29 Merge pull request #1049 from ncos/knl-rename
ISPC target rename plus insignificant changes in test system
2015-05-28 18:42:21 +03:00
Anton Mitrokhin
33d8daf933 fix check_isa.cpp's flag for knl 2015-05-28 10:54:59 +03:00
Anton Mitrokhin
f1d72a7e2c target order/fail_db/sde in run_tests 2015-05-28 07:17:27 +03:00
Anton Mitrokhin
a12f3ec84b make icpc not required for native knl target 2015-05-28 06:37:04 +03:00
Anton Mitrokhin
5a5cb82043 changed taret name for native knl (knl-avx512 -> avx512knl-i32x16) 2015-05-28 06:19:45 +03:00
Dmitry Babokin
289c3804a3 Merge pull request #1048 from dbabokin/alloy
Fix sde handling, when it's in path, but not in SDE_HOME, move to LLVM 3.6.1
2015-05-27 13:11:50 +03:00
Dmitry Babokin
e531864424 Moving LLVM 3.6.0 version to 3.6.1 2015-05-27 13:10:11 +03:00
Dmitry Babokin
5a2c937025 Fix sde handling, when it's in path, but not in SDE_HOME 2015-05-26 17:26:51 +03:00
Dmitry Babokin
9bc04d26fc Merge pull request #1047 from dbabokin/knl-win
Knl win build fix
2015-05-26 15:38:43 +03:00
Dmitry Babokin
31612a7e17 Merge pull request #1046 from dbabokin/out_file_fix
Fix for the bug, which crashed ispc during object file write.
2015-05-26 15:38:26 +03:00
Dmitry Babokin
1a1bed04f2 fail_db.txt update for Windows 2015-05-26 15:36:00 +03:00
Dmitry Babokin
c986ffad4f Adding knl target to build. 2015-05-26 15:34:49 +03:00
Dmitry Babokin
891c1cf1f4 Fix for the bug, which crashed ispc during file write (showed up on
MacOS and Windows with LLVM trunk).
2015-05-26 15:24:18 +03:00
Dmitry Babokin
c251194b15 Merge pull request #1039 from aguskov/master
Fix for use-after-free bug during dispatch module generation (shows up with 3.7).
2015-05-21 23:30:35 +03:00
Andrey Guskov
50d84e0884 Re-added name/type checking for globals 2015-05-21 19:59:48 +03:00
Dmitry Babokin
65507ea66e Merge pull request #1045 from ncos/alloy_modifications
Add 'knl-avx512' flag to 'alloy.py' and 'run_tests.py'
2015-05-21 18:13:16 +03:00
Anton Mitrokhin
b6a7ae6dca refresh fail_db.txt for knl-avx512 fails 2015-05-21 18:06:22 +03:00
Anton Mitrokhin
7f281a4642 add knl-generic and knl-avx512 targets in alloy.py 2015-05-21 17:16:14 +03:00
Dmitry Babokin
94a4793a71 Merge pull request #1044 from Vsevolod-Livinskij/sse4-fix
round2to16 was added
2015-05-21 16:59:44 +03:00
Vsevolod Livinskiy
28b49837fc round2to16 was added 2015-05-21 16:53:18 +03:00
Dmitry Babokin
149052b195 Merge pull request #1043 from Vsevolod-Livinskij/multitarget
Prohibit short vectors in knl-generic multitarget
2015-05-21 16:18:51 +03:00
Vsevolod Livinskiy
d6380ca8a1 Prohibit short vectors in knl-generic multitarget 2015-05-21 16:11:45 +03:00
Dmitry Babokin
a70dcb13d1 Merge pull request #1042 from ncos/sg_native_knl
'Native' KNL support (AVX512 target)
2015-05-21 15:44:06 +03:00
Anton Mitrokhin
5ec16356d0 [AVX512]: copyright update 2015-05-21 15:29:04 +03:00
Vsevolod Livinskiy
f5e7165537 [AVX512]: packed_load/store 2015-05-21 15:27:27 +03:00
Dmitry Babokin
838c38c91a Merge pull request #1041 from ncos/sg_trunk_fixing
Several fixes for trunk build
2015-05-21 15:05:01 +03:00
Vsevolod Livinskiy
db29cbe851 [AVX512]: knl arch for clang 2015-05-21 14:52:03 +03:00
Vsevolod Livinskiy
d7cd5986db [AVX512]: disable prefetch 2015-05-21 14:51:51 +03:00
Vsevolod Livinskiy
3514e03327 [AVX512]: disable Transcendentals and Trigonometry 2015-05-21 14:51:50 +03:00
Anton Mitrokhin
ef9c98fba8 [AVX512]: uniform float/double round/ceil/floor 2015-05-21 14:51:50 +03:00
Vsevolod Livinskiy
2110708c8e [AVX512]: sqrt/rsqrt/rcp 2015-05-21 14:51:50 +03:00
Vsevolod Livinskiy
82f5716362 [AVX512]: max/min functions 2015-05-21 14:51:50 +03:00
Anton Mitrokhin
a6b7e717f5 [AVX512]: gathers/scatters 2015-05-21 14:51:50 +03:00
Anton Mitrokhin
66b94fc37c [AVX512]: add default -sde- wrapexe to runtests.py for knl-avx512 target; float/double varying rounding 2015-05-21 14:51:50 +03:00
Anton Mitrokhin
f2743a6dc5 [AVX512]: masked_load_i8/16/32/64 2015-05-21 14:51:50 +03:00
Anton Mitrokhin
28fda1a013 [AVX512]: movmsk/any/all/none 2015-05-21 14:51:43 +03:00
Vsevolod Livinskiy
7c9d9f6ee6 [AVX512]: reduce operations was added 2015-05-21 14:51:32 +03:00
Anton Mitrokhin
2549fa12c9 [AVX512]: masked load-store (not all loads) 2015-05-21 14:51:26 +03:00
Vsevolod Livinskiy
bea7cc9a81 [AVX512]: half/float conversions 2015-05-21 14:51:21 +03:00
Vsevolod Livinskiy
9a03cd3590 [AVX512]: definitions through util.m4 was added 2015-05-21 14:51:16 +03:00
Anton Mitrokhin
46528caa5a [AVX512]: add avx-based ll file 2015-05-21 14:51:08 +03:00
Anton Mitrokhin
7628f2a6c9 [AVX512]: try gemeric-16 like builtins 2015-05-21 14:51:03 +03:00
Vsevolod Livinskiy
d01718aa91 [AVX512]: avx512 common file was added 2015-05-21 14:50:56 +03:00
Anton Mitrokhin
3eccce5e4f [AVX512]: new .ll file for knl target 2015-05-21 14:50:51 +03:00
Anton Mitrokhin
92650bdff0 [AVX512]: add knl cpu to the list of knl backwards compatibility list 2015-05-21 14:50:41 +03:00
Anton Mitrokhin
e0eac74f83 [AVX512]: separated knl from avx2 2015-05-21 14:50:29 +03:00
Vsevolod Livinskiy
35222694e5 [AVX512]: knl target was added 2015-05-21 14:49:43 +03:00
Anton Mitrokhin
92388e8bad ISPC build fix for LLVM trunk revision 237059 2015-05-21 14:42:53 +03:00
Vsevolod Livinskiy
c9424a9989 Fix for debug information in trunk 2015-05-21 14:42:53 +03:00
Dmitry Babokin
f5c90dbd43 Merge pull request #1040 from Vsevolod-Livinskij/print-target
Debug flag for target's info was added
2015-05-21 12:13:54 +03:00
Vsevolod Livinskiy
bd65df8ad4 Debug flag for target's info was added 2015-05-21 11:33:05 +03:00
Andrey Guskov
e39bb485ea Fixed global variable handling 2015-05-19 22:16:47 +03:00
Dmitry Babokin
ebe165dff8 Merge pull request #1038 from Vsevolod-Livinskij/round_var_double
Add roundings for varying double in knc.h and knl.h
2015-05-14 18:19:15 +03:00
Vsevolod Livinskiy
12ff721ec1 Add roundings for varying double in knc.h and knl.h 2015-05-14 16:26:26 +03:00
Dmitry Babokin
8fdd9a7ef7 Merge pull request #1036 from ncos/multitarget
Forbid using --cpu in multitarget mode
2015-05-10 23:54:23 +03:00
Anton Mitrokhin
8891534a3c Forbid using --cpu in multitarget mode 2015-05-10 17:27:25 +03:00
Dmitry Babokin
be2adf40d8 Merge pull request #1035 from Vsevolod-Livinskij/multitarget
Replace static const variables in knl.h
2015-05-08 17:25:05 +03:00
Vsevolod Livinskiy
57512c1e28 Replace static const variables in knl.h 2015-05-08 17:22:12 +03:00
Dmitry Babokin
f3b8ce8cbc Merge pull request #1034 from dbabokin/static
Generic target: declare in anonymous namespace only l_array_*
2015-05-08 16:08:25 +03:00
Dmitry Babokin
14da0a7f54 Merge pull request #1033 from kku1993/fix-null-terminator
Fixed string argument missing null-terminator.
2015-05-08 16:05:14 +03:00
Kevin Ku
1c039ab96a Fixed string argument missing null-terminator. 2015-05-07 23:00:14 -04:00
Dmitry Babokin
256e8325e3 Generic target: declare in anonymous namespace only l_array_* 2015-05-08 01:10:46 +03:00
Dmitry Babokin
c76f03a263 Merge pull request #1032 from ncos/namespaces
Put struct/class definitions into anonymous namespaces (MIC, generic).
Fix is required for generic compilation with multiple ispc files.
2015-05-06 18:02:26 +03:00
Anton Mitrokhin
e0e579bd99 put struct/class definitions into anonymous namespaces (MIC, generic) 2015-05-06 13:59:23 +03:00
Dmitry Babokin
e513fc7884 Merge pull request #1031 from jbrodman/master
Ptr Diff should ignore const
2015-04-30 13:38:59 +03:00
jbrodman
c8e492d82f Ptr Diff should ignore const 2015-04-29 18:30:21 -04:00
Dmitry Babokin
aeb97374bf Merge pull request #1030 from aguskov/master
Removed unused variable causing fatal warn on GCC build
2015-04-29 19:41:10 +03:00
Andrey Guskov
2a2bfc49f1 Removed unused variable causing fatal warn on GCC build 2015-04-29 19:24:45 +03:00
Dmitry Babokin
958143a8c5 Merge pull request #1029 from aguskov/master
Typo fix in CXX output
2015-04-29 15:33:13 +03:00
Andrey Guskov
66ad192266 Typo fix in CXX output 2015-04-29 15:25:31 +03:00
Dmitry Babokin
728c55fd6f Merge pull request #1028 from aguskov/master
Fixed llvm.dbg.value mishandling in CWriter
2015-04-28 20:59:02 +03:00
Andrey Guskov
43f432c67a Fixed llvm.dbg.value mishandling in CWriter 2015-04-28 20:55:33 +03:00
Dmitry Babokin
1328efd027 Merge pull request #1027 from dbabokin/osxsave
Adding check for OSXSAVE before checking xgetbv in cpu detection code.
2015-04-24 22:04:39 +03:00
Dmitry Babokin
ad97d70a43 Adding check for OSXSAVE before checking xgetbv in cpu detection code.
Fix for #1026
2015-04-24 21:28:06 +03:00
Dmitry Babokin
44cf9866cb Merge pull request #1025 from aguskov/master
Fix typos in the previous LLVM debug info commit
2015-04-24 18:17:12 +03:00
Dmitry Babokin
0a8fab0737 Merge pull request #1024 from Vsevolod-Livinskij/multitarget
Support of *-generic-16 target
2015-04-24 18:16:45 +03:00
Vsevolod Livinskiy
c2039da7b8 Support of *-generic-16 target 2015-04-24 15:56:06 +03:00
Andrey Guskov
0842b82eca Fix typos in the previous LLVM debug info commit 2015-04-24 15:51:55 +03:00
Dmitry Babokin
03d3f7179b Merge pull request #1023 from aguskov/master
LLVM debug info fix, again
2015-04-23 22:50:12 +03:00
Andrey Guskov
5defbf25f1 LLVM debug info fix, again 2015-04-23 19:00:54 +03:00
Dmitry Babokin
a0cbd7e33a Merge pull request #1022 from aguskov/master
Another version of multitarget return types fix
2015-04-22 18:14:39 +03:00
Andrey Guskov
0fd170f47a Another version of multitarget return types fix 2015-04-22 18:09:31 +03:00
Dmitry Babokin
0b52022c13 Merge pull request #1021 from ispc/revert-1019-master
Revert "Multitaget function return types fixed"
2015-04-22 17:39:39 +03:00
Dmitry Babokin
0f1358b69f Revert "Multitaget function return types fixed" 2015-04-22 17:38:05 +03:00
Dmitry Babokin
8040154b2f Merge pull request #1020 from dbabokin/copyright
Copyright refresh
2015-04-22 16:40:23 +03:00
Dmitry Babokin
8e47273186 Copyright refresh 2015-04-22 16:39:11 +03:00
Dmitry Babokin
4518bce71b Merge pull request #1013 from egaburov/armfix
Added fix for ispc on ARM
2015-04-22 16:09:50 +03:00
Dmitry Babokin
8c1d7c6dc5 Merge pull request #1018 from Vsevolod-Livinskij/knl_fix
rcp23->rcp28 fix for knl.h
2015-04-22 16:07:05 +03:00
Dmitry Babokin
2983722926 Merge pull request #1019 from aguskov/master
Fix for an error, when a type, involving (uniform) structure type, is returned from export function and the compilation is done for multi-target with targets of different width.
2015-04-22 15:56:24 +03:00
Andrey Guskov
7f096f23e3 Multitaget function return types fixed 2015-04-22 15:13:56 +03:00
Vsevolod Livinskiy
858a97a485 New accuracy for knl.h 2015-04-20 22:47:00 +03:00
Dmitry Babokin
1d40f0a930 Merge pull request #1014 from Vsevolod-Livinskij/isa_update
Check for new isa for KNL and SKX was added.
2015-04-17 12:24:25 +03:00
Vsevolod Livinskiy
7729070481 Check for new isa for KNL and SKX was added. 2015-04-17 12:00:36 +03:00
Dmitry Babokin
e6ba7eb337 Merge pull request #1016 from aguskov/master
Recurring LLVM debug info fix
2015-04-16 19:38:41 +03:00
Andrey Guskov
189e892b96 Recurring LLVM debug info fix 2015-04-16 19:31:16 +03:00
Dmitry Babokin
f62ad11fb0 Merge pull request #1012 from ncos/x86-64_2_x86_64
Add x86_64 to be alias of x86-64
2015-04-16 15:19:30 +03:00
Anton Mitrokhin
e754cb99c7 allow x86_64 in alloy.py (run_tests has no problems with it) 2015-04-16 14:56:40 +03:00
Evghenii Gaburov
7e06973054 added rtc() to arm examples as well 2015-04-16 11:54:06 +00:00
Evghenii Gaburov
38375b49e3 added unsaturated ingtegers stubs for arm IR 2015-04-16 11:48:26 +00:00
Anton Mitrokhin
b5293067fc add x86_64 to be alias of x86-64 2015-04-16 14:42:34 +03:00
Dmitry Babokin
d1d59a2a01 Merge pull request #1011 from ncos/multitarget
Fix assertion fail when specify illegal/unsupported target to ISPC
2015-04-16 13:51:33 +03:00
Dmitry Babokin
1a9d08a325 Merge pull request #1010 from Vsevolod-Livinskij/fix_for_nvptx
Typo fix in util-nvptx.m4
2015-04-16 13:37:40 +03:00
Anton Mitrokhin
1948d8ea42 If you want to have 'CPU_None' as CPU type, than you have to have a name for it (even if that's an empty string) 2015-04-16 13:31:14 +03:00
Vsevolod Livinskiy
41750a8336 Reorder available targets 2015-04-16 13:11:44 +03:00
Vsevolod Livinskiy
8336615eae Typo fix in util-nvptx.m4 2015-04-16 13:02:24 +03:00
Dmitry Babokin
af18f7cc97 Merge pull request #1009 from aguskov/master
Fix for LLVM trunk (debug info changes)
2015-04-16 11:55:00 +03:00
Andrey Guskov
824e47ece7 Fix for LLVM trunk 2015-04-15 16:31:23 +03:00
Dmitry Babokin
f73091f391 Merge pull request #1007 from Vsevolod-Livinskij/fix_for_trunk
Fix for trunk
2015-04-10 15:25:17 +03:00
Vsevolod Livinskiy
ac4d0bf7ed Fix for debug information in trunk. 2015-04-10 15:15:16 +03:00
Vsevolod Livinskiy
391b59930b Fix for r234201. IT SHOULD BE REVERTED IN FUTURE 2015-04-10 15:15:16 +03:00
Dmitry Babokin
15535cfb64 Merge pull request #1006 from dbabokin/tinfo
Typo fix in Makefile
2015-04-09 14:02:10 +03:00
Dmitry Babokin
82d425da04 Typo fix in Makefile 2015-04-09 14:02:49 +03:00
Dmitry Babokin
d90011a535 Merge pull request #1004 from nguillemot/patch-1
Fixed some typos
2015-04-06 14:29:02 +03:00
Nicolas Guillemot
b4af33327f Fixed some typos 2015-04-03 22:25:17 -07:00
Dmitry Babokin
8cc9d08f46 Merge pull request #1003 from ncos/knl-support
KNL support via generic target.
2015-04-03 15:47:44 +03:00
Anton Mitrokhin
74c1feed08 fixed ptr-22.ispc on knl 2015-04-03 15:40:38 +03:00
Dmitry Babokin
5535bfc83f Merge pull request #1002 from dbabokin/tinfo
Addressing build issue coming from LLVM bug 16902
2015-04-02 19:24:33 +03:00
Dmitry Babokin
3e8f33da68 Addressing build issue coming from LLVM bug 16902 2015-04-02 19:19:57 +03:00
Anton Mitrokhin
a3737e2b81 i64 store fix for knl 2015-04-02 18:28:14 +03:00
Anton Mitrokhin
81251115af packed load/store for knl 2015-04-02 17:24:57 +03:00
Anton Mitrokhin
c22ee2b0a0 some minor fixes in knl.h, updated fail_db.txt 2015-04-02 16:13:41 +03:00
Anton Mitrokhin
0292730cea 64-bit min-max functions for knl.h 2015-04-02 14:56:20 +03:00
Anton Mitrokhin
29c73f242c cast s(z)ext 64-bit function fix for knl target 2015-04-02 14:40:24 +03:00
Anton Mitrokhin
13084ece5f add knl fails on 3.4 for -O2 (except foreach-unique-6.ispc, which ends up in infinite loop) 2015-04-02 13:00:42 +03:00
Anton Mitrokhin
704c83bcd6 Merge branch 'knl-support' of https://github.com/ncos/ispc into knl-support 2015-04-02 12:42:29 +03:00
Anton Mitrokhin
bf4376be1b remove knl fails in fail_db to change 3.6 to 3.4 later 2015-04-02 12:42:17 +03:00
Anton Mitrokhin
2534cd1439 knl/knc supports only x86-64, so adding that in alloy.py 2015-04-02 11:53:04 +03:00
Anton Mitrokhin
39abb6f621 Merge branch 'master' of https://github.com/ncos/ispc into knl-support 2015-04-02 09:07:55 +03:00
Dmitry Babokin
93a7c14e95 Merge pull request #1001 from aguskov/master
Fixed alloy.py exiting on Windows
2015-03-27 18:35:08 +03:00
Andrey Guskov
25988947e4 Fixed alloy.py exiting on Windows 2015-03-27 18:26:50 +03:00
Dmitry Babokin
108b7dced5 Merge pull request #1000 from aguskov/master
Fixed ISPC debug info generator, added debug tests
2015-03-27 18:18:17 +03:00
Anton Mitrokhin
fc1e12fb2f fixed knc target 2015-03-27 17:34:48 +03:00
Anton Mitrokhin
53f5ec09de Merge branch 'knl-support' of https://github.com/Vsevolod-Livinskij/ispc into knl-support 2015-03-27 15:22:35 +03:00
Anton Mitrokhin
aa94969472 rotate and shift for i32 not supporting negative index 2015-03-27 15:22:25 +03:00
Andrey Guskov
49a0a08805 Changed alloy default to 'nodebug' from 'nodebug+debug' 2015-03-27 15:20:09 +03:00
Vsevolod Livinskiy
e780372884 cast_trunk from i64 2015-03-27 15:15:45 +03:00
Andrey Guskov
dd567654e7 Fixed ISPC debug info generator, added debug tests 2015-03-27 14:52:18 +03:00
Anton Mitrokhin
4d371c0f21 update fail_db to track fails 2015-03-27 11:59:54 +03:00
Anton Mitrokhin
03211d5543 cast_uitofp typo fix; helper printf_v for vector types add 2015-03-27 09:58:59 +03:00
Anton Mitrokhin
dac1ba44e4 Merge branch 'knl-support' of https://github.com/Vsevolod-Livinskij/ispc into knl-support 2015-03-26 17:46:30 +03:00
Anton Mitrokhin
d33100a11c Since is no _MM_HINT_T2 for prefetch on KNL, changing that to T1 2015-03-26 17:45:20 +03:00
Anton Mitrokhin
17e41970d3 Immediate parameter for '_mm512_alignr_epi32/64 intrinsic' 2015-03-26 17:37:05 +03:00
Vsevolod Livinskiy
89ded54929 DOWNCONW 2015-03-26 16:34:15 +03:00
Vsevolod Livinskiy
1d9d989d8d round_ps 2015-03-26 16:18:04 +03:00
Anton Mitrokhin
9526094272 purge _MM_DOWNCONV... enum from knl.h 2015-03-26 16:12:10 +03:00
Anton Mitrokhin
81cb374084 loop execution for i8/16 32-addr-bit gathers/scatters 2015-03-26 14:53:19 +03:00
Anton Mitrokhin
a5b2695771 Merge branch 'knl-support' of https://github.com/Vsevolod-Livinskij/ispc into knl-support 2015-03-26 14:42:30 +03:00
Anton Mitrokhin
b16aee93f9 gather/scatter i8/16 loop implementations 2015-03-26 14:41:30 +03:00
Vsevolod Livinskiy
cd6f8249bf cast_fptosi/ui for double to i32 2015-03-26 14:30:57 +03:00
Vsevolod Livinskiy
af9d6a2d05 Merge branch 'knl-support' of https://github.com/ncos/ispc into ncos-knl 2015-03-26 12:46:15 +03:00
Anton Mitrokhin
4c96fd3279 cast_fptou(s)i compfail fix 2015-03-26 12:38:53 +03:00
Anton Mitrokhin
0a166f245c merged in brodman's version of knl.h 2015-03-26 12:22:03 +03:00
Anton Mitrokhin
b79bd65c4e Merge branch 'master' of https://github.com/ncos/ispc into knl-support 2015-03-26 09:42:24 +03:00
Vsevolod Livinskiy
5050874007 Merge branch 'knl-support' of https://github.com/ncos/ispc into ncos-knl 2015-03-26 08:24:19 +03:00
Anton Mitrokhin
5c523332af extpackstorehi/lo_epi32 (without i8 and i16) 2015-03-25 20:09:45 +03:00
Anton Mitrokhin
de0d69ab26 extpackstorehi/lo_pd 2015-03-25 19:54:19 +03:00
Anton Mitrokhin
aefcea95cc shift mask left like (mask << 8). may cause errors 2015-03-25 19:48:20 +03:00
Dmitry Babokin
ef826603a7 Merge pull request #999 from ncos/minor-ispc-fixes
fix for mishandled __ISPC_NO_EXTERN_C var being defined as zero
2015-03-25 19:20:08 +03:00
Anton Mitrokhin
1aa8309a9e store_ps fixed 2015-03-25 19:12:28 +03:00
Anton Mitrokhin
fc8a32425c cast_fptosi/ui 2015-03-25 18:58:36 +03:00
Anton Mitrokhin
49b9297166 cast s(z)ext -> add avx and sse, + started ext..load rwriting 2015-03-25 18:46:47 +03:00
Dmitry Babokin
30b45731ae Merge pull request #998 from aguskov/master
Fix for LLVM trunk (rL232885)
2015-03-25 18:05:45 +03:00
Anton Mitrokhin
50f716f3d8 fix for mishandled __ISPC_NO_EXTERN_C var being defined as zero 2015-03-25 16:57:30 +03:00
Andrey Guskov
a66fab4cea Fix for LLVM trunk (rL232885) 2015-03-25 16:11:50 +03:00
Dmitry Babokin
603355f6d6 Merge pull request #997 from jbrodman/master
Fix to support function pointers in header file generation.
2015-03-21 13:07:37 +03:00
jbrodman
637a4cc812 Fix to support function pointers in header file generation. 2015-03-20 14:49:03 -07:00
Dmitry Babokin
663306b8f6 Merge pull request #996 from Vsevolod-Livinskij/fix_for_trunk
Fix for trunk revision #232397
2015-03-20 16:10:55 +03:00
Anton Mitrokhin
0ffea6832d add knl.h to test knl target (need to be rewritten with AVX-512F intrinsics) 2015-03-20 15:56:57 +03:00
Anton Mitrokhin
8cb5f311c6 knl (through sde) target support for alloy.py 2015-03-20 15:55:00 +03:00
Anton Mitrokhin
292962e8b0 modified run_tests.py to support knl compilation 2015-03-20 15:19:26 +03:00
Vsevolod Livinskiy
55cfc6f78e Fix for trunk revision #232397 2015-03-20 08:55:49 +03:00
Dmitry Babokin
eb907dc4ba Merge pull request #994 from ncos/faildb-update
Removed 180 generic-target fails (which do pass now) from 'fail_db.txt'
2015-03-19 20:48:46 +03:00
Anton Mitrokhin
bf50d8f951 removed 180 generic target fails (which do pass now) from 'fail_db.txt' 2015-03-19 20:30:54 +03:00
Dmitry Babokin
7125026f42 Merge pull request #991 from aguskov/element_ptr_inst
Added explicit types for llvm::GetElementPtrInst::Create()
2015-03-18 20:24:55 +03:00
Andrey Guskov
97eff6a185 Added explicit types for llvm::GetElementPtrInst::Create() 2015-03-18 19:01:24 +03:00
Dmitry Babokin
fec8390f7e Merge pull request #992 from aguskov/master
Switched LLVM 3.6 from branches to tags
2015-03-18 18:43:53 +03:00
Andrey Guskov
30def668f0 Switched LLVM 3.6 from branches to tags 2015-03-18 18:36:49 +03:00
Dmitry Babokin
9e1ec40bad Merge pull request #990 from jbrodman/helpmsg
Change opt help message to be more clear.
2015-03-16 22:58:57 +03:00
jbrodman
3d35d4485d Change opt help message to be more clear. 2015-03-16 12:11:46 -07:00
Dmitry Babokin
6d43d3c180 Merge pull request #989 from aguskov/master
Lowering the severity level for Debug Info Version mismatch
2015-03-12 17:20:21 +03:00
Dmitry Babokin
e6c827bf11 Merge pull request #986 from ncos/faildb-update
Removed fail_db.txt entries for fixed tests on 'generic' and 'knc' targets
2015-03-12 17:19:57 +03:00
Andrey Guskov
75ec582a1d Lowering the severity level for Debug Info Version mismatch 2015-03-12 14:49:32 +03:00
Dmitry Babokin
c5916d58b8 Merge pull request #988 from aguskov/master
Substituted LLVM-specific IsNAN() and IsInf() with their math.h versions
2015-03-11 18:36:19 +03:00
Andrey Guskov
dba22837f6 Substituted LLVM-specific IsNAN() and IsInf() with their math.h versions 2015-03-11 18:33:49 +03:00
Dmitry Babokin
b43678b2e3 Merge pull request #987 from aguskov/master
Windows build fix
2015-03-11 18:20:13 +03:00
Andrey Guskov
972c6906d5 Fix Alloy.py + MSVC 2015-03-11 17:34:12 +03:00
Andrey Guskov
522d80bd2c Windows build fix 2015-03-11 15:58:36 +03:00
Anton Mitrokhin
38fb4cc181 removed recently fixed fails on knc from fail_db.txt 2015-03-07 08:18:20 +03:00
Dmitry Babokin
dfcdd84dc6 Merge pull request #975 from jbrodman/windll
Change dll export feature to a switch.
2015-03-06 23:56:45 +03:00
Anton Mitrokhin
97e9977aef removed fixed 'generic' target fails from fail_db.txt (appeared after knc fixing) 2015-03-06 23:54:06 +03:00
jbrodman
a54c0db457 Added missing newline. 2015-03-06 10:36:07 -08:00
Dmitry Babokin
e8ae16a7f3 Merge pull request #979 from Vsevolod-Livinskij/knc_header_long_int
Knc header long int
2015-03-06 19:10:35 +03:00
Dmitry Babokin
64dfa6182b Merge pull request #985 from ncos/soa-18-test
Fixed soa-18 test on KNC
2015-03-06 18:41:39 +03:00
Anton Mitrokhin
bead780f22 fixed soa-18 test 2015-03-06 12:32:58 +03:00
Dmitry Babokin
4b5141c0fb Merge pull request #984 from ncos/knc-backend-merge
Added '__ordered_float_and_mask' function to 'knc.h'
2015-03-06 10:20:26 +03:00
Anton Mitrokhin
6621efe139 added '__ordered_float_and_mask' function to 'knc.h' 2015-03-06 09:36:48 +03:00
Dmitry Babokin
ae0c6dc62a Merge pull request #980 from Vsevolod-Livinskij/fix_for_trunk
Fix for trunk
2015-03-05 18:32:11 +03:00
Vsevolod Livinskiy
f92d351cf0 Some codestyle changes 2015-03-05 18:04:39 +03:00
Vsevolod Livinskiy
f0aa481a2a DataLayoutPass was removed 2015-03-05 17:09:39 +03:00
Vsevolod Livinskiy
a216b2bb9c New LLVM IR load instruction 2015-03-05 16:00:30 +03:00
Vsevolod Livinskiy
29859e81ba New LLVM IR for getelementptr instruction 2015-03-05 16:00:30 +03:00
Dmitry Babokin
0b1323e070 Merge pull request #983 from ncos/cbackend-funcptr-segfault
Fixed funcptr-varying-6/7/8 tests on knc
2015-03-05 13:34:18 +03:00
Anton Mitrokhin
3a18a28001 fixed funcptr-varying-6/7/8 tests on knc 2015-03-05 13:19:10 +03:00
Dmitry Babokin
c5e11468f3 Merge pull request #981 from jbrodman/maskoffunif
Add assignments to Uniform as an unsafe thing when checking to run with ...
2015-03-03 21:47:13 +03:00
James Brodman
ab5bf7e48d Add assignments to Uniform as an unsafe thing when checking to run with mask all off 2015-03-02 10:55:12 -08:00
Vsevolod Livinskiy
19d18b6e4e Some changes was made to support older llvm versions. 2015-02-27 16:02:55 +03:00
Vsevolod Livinskiy
4c629d0a7c Small codestyle changes 2015-02-27 12:35:31 +03:00
Vsevolod Livinskiy
3718abc3d2 iN class was moved to cbackend 2015-02-27 12:30:04 +03:00
Vsevolod Livinskiy
a98bfdf011 Some changes to hadle different vector width 2015-02-27 10:34:45 +03:00
Vsevolod Livinskiy
0644b4a7fd iN calss was changed 2015-02-27 10:33:26 +03:00
Vsevolod Livinskiy
a3bf0b2406 Constructor from string and operator& 2015-02-27 10:33:26 +03:00
Vsevolod Livinskiy
5302cfe062 New align structure 2015-02-27 10:33:26 +03:00
Vsevolod Livinskiy
8090285d42 Fix for cast 2015-02-27 10:33:25 +03:00
Vsevolod Livinskiy
1abce94803 cast inst 2015-02-27 10:33:25 +03:00
Vsevolod Livinskiy
e3a78ad150 Tmp commit to save progress 2015-02-27 10:31:33 +03:00
Dmitry Babokin
261dd70b6f Merge pull request #977 from ncos/knc-backend-merge
Update knc.h to work with icpc v13
2015-02-26 19:40:27 +03:00
Anton Mitrokhin
6fa75fb4b1 update knc.h to work wit icpc v13 2015-02-26 19:23:23 +03:00
Dmitry Babokin
432570c98f Merge pull request #976 from Vsevolod-Livinskij/new_unordered_func
New unordered function
2015-02-26 09:35:57 +03:00
Vsevolod Livinskiy
78bd4debc6 New unordered function 2015-02-26 09:02:40 +03:00
Dmitry Babokin
f14897101b Merge pull request #974 from egaburov/sm37
added support for multiple GPU targets: tested with sm_35 (K20/K40) and sm_37 (K80)
2015-02-24 16:06:13 +03:00
Evghenii Gaburov
7bdb2b967d added NVARCH options 2015-02-24 13:54:33 +01:00
Evghenii Gaburov
f481a51f39 remove std::vector in favor of std::initializor_list in GPUTargets.h 2015-02-24 12:54:06 +01:00
Evghenii Gaburov
b35b931e3b added const 2015-02-24 12:49:48 +01:00
Evghenii Gaburov
19105494e1 added GPUTargets.h dependency in Makefile 2015-02-24 12:49:35 +01:00
jbrodman
9baade2cb5 Change dll export feature to a switch. 2015-02-23 11:43:06 -08:00
Evghenii Gaburov
fc717ebada identation fix in usage 2015-02-22 12:26:29 +01:00
Evghenii Gaburov
ddc9d4d885 Merge branch 'sm37' of github.com:egaburov/ispc into sm37 2015-02-22 12:18:52 +01:00
Evghenii Gaburov
795f592013 added support for multiple architectures. right now, support is tested only for sm_35 and sm_37 2015-02-22 12:18:38 +01:00
Evghenii Gaburov
3d086ca6b9 added support for K80/sm_37 2015-02-22 12:18:38 +01:00
Evghenii Gaburov
66f306f325 added support for multiple architectures. right now, support is tested only for sm_35 and sm_37 2015-02-22 12:17:37 +01:00
Evghenii Gaburov
bf3b15b744 added support for K80/sm_37 2015-02-21 14:28:47 +01:00
jbrodman
24fb2b483b Merge pull request #973 from Vsevolod-Livinskij/smear_opt
Smear opt
2015-02-20 13:34:44 -08:00
Dmitry Babokin
6c5ed87c59 Merge pull request #972 from aguskov/master
Fixed CPU name retrieval
2015-02-20 18:13:22 +03:00
Vsevolod Livinskiy
f17deafc0a Some little bug was fixed 2015-02-20 17:46:06 +03:00
Andrey Guskov
803b0a2811 Fixed CPU name retrieval 2015-02-20 17:36:13 +03:00
Vsevolod Livinskiy
8c4d339f25 Some codestyle changes 2015-02-20 16:50:53 +03:00
Vsevolod Livinskiy
7b0eb0e4ad Smear optimization was changed 2015-02-20 16:36:50 +03:00
Dmitry Babokin
86ba817445 Merge pull request #971 from dbabokin/docs
Typo fix
2015-02-20 11:17:56 +03:00
Dmitry Babokin
3e511b588b Typo fix 2015-02-20 11:17:10 +03:00
Dmitry Babokin
c16f9880c5 Merge pull request #970 from ncos/knc-backend-merge
Knc 'fail_db.txt' update
2015-02-19 15:46:05 +03:00
Anton Mitrokhin
719279f71d removed obsolete icpc13 knc entries in fail_db.txt 2015-02-19 15:32:09 +03:00
Anton Mitrokhin
5489894c3e update fail_db.txt to hide passes compfails 2015-02-19 15:12:57 +03:00
Dmitry Babokin
127a5c50b8 Merge pull request #964 from ncos/cbackend-opbc
fix for broken i32/64 to vec1_i32/64 bitcast
2015-02-19 13:21:18 +03:00
Dmitry Babokin
ca866c6d54 Merge pull request #969 from ncos/ispc_build_fails
Fix buildfail after LLVM trunk commit 229094
2015-02-19 13:08:52 +03:00
Anton Mitrokhin
4dff88d4c5 fix buildfail after LLVM trunk commit 229094 2015-02-19 13:01:50 +03:00
Dmitry Babokin
372b4583c2 Merge pull request #968 from aguskov/struct_align
Added structure alignment in headers; extended the test system
2015-02-17 19:05:51 +03:00
Andrey Guskov
ef9315200c Added structure alignment in headers; extended the test system to support alignment tests 2015-02-17 17:58:34 +03:00
Dmitry Babokin
7477a95a59 Merge pull request #967 from ncos/ispc_build_fails
Repair ispc build fail after LLVM commit 228961
2015-02-17 17:22:06 +03:00
Anton Mitrokhin
bbec080bb9 new interface for 'setAsmVerbosityDefault' (appeared in commit 228961) in LLVM 3.7 2015-02-17 17:09:23 +03:00
Dmitry Babokin
452c7ebdea Merge pull request #965 from Vsevolod-Livinskij/knc_header_fix
Code generator was changed for support of llvm.umul.with.overflow
2015-02-13 20:24:35 +03:00
Dmitry Babokin
bc1dc73cfc Merge pull request #966 from dbabokin/docs
Typo fix
2015-02-13 20:09:30 +03:00
Dmitry Babokin
e60572f62c Typo fix 2015-02-13 20:08:52 +03:00
Anton Mitrokhin
33d47c2b53 update fail_db.txt 2015-02-13 16:15:07 +03:00
Anton Mitrokhin
0bb27d839a i32/64 to <1xi32/64> bitcast 2015-02-13 15:47:41 +03:00
Vsevolod Livinskiy
1916153509 Code generator was changed for support of llvm.umul.with.overflow 2015-02-13 12:56:13 +03:00
Dmitry Babokin
08b4e03f62 Merge pull request #962 from ncos/cbackend-cmpxchg
Fix llvm->c++ translation of cmxchg instruction
2015-02-12 19:10:04 +03:00
Anton Mitrokhin
3dbc5ba6c2 update fail_db.txt 2015-02-12 18:19:35 +03:00
Anton Mitrokhin
8cf7445b93 add ifdefs for pre-3.5 LLVM versions (where cmpxchg had different defenition) 2015-02-12 17:59:44 +03:00
Dmitry Babokin
53d918ae9e Merge pull request #961 from Vsevolod-Livinskij/knc_header_fix
Codegenerator fix for compability with generic
2015-02-12 16:16:06 +03:00
Vsevolod Livinskiy
f84f359d8b Codegenerator fix for compability with generic 2015-02-12 16:00:22 +03:00
Anton Mitrokhin
280675eb80 add comments regarding constant 'true' field 2015-02-12 15:10:38 +03:00
Anton Mitrokhin
8eeeebf091 fixed cmpexchg bug by passing in const bool field 2015-02-12 13:44:45 +03:00
Dmitry Babokin
ab6c7e42d6 Merge pull request #958 from Vsevolod-Livinskij/fix_for_trunk
Fix for ispc trunk.
2015-02-05 12:32:22 +03:00
Vsevolod Livinskiy
ecd9a3a79c Missed header was added. 2015-02-05 09:07:37 +03:00
Dmitry Babokin
b27ff432ee Merge pull request #956 from ncos/ispc_build_fails
Fix build fail after commit c377d: make code C++ 03 friendly
2015-02-04 15:07:56 +03:00
Anton Mitrokhin
bf87da7496 add spaces between '> >' 2015-02-04 14:43:08 +03:00
Dmitry Babokin
c377d8465e Merge pull request #953 from aguskov/master
New CPU naming model
2015-02-03 15:46:25 +03:00
Dmitry Babokin
c0edb37000 Merge pull request #955 from ncos/knc-backend-merge
Fixed 64-bit __ashr intrinsic
2015-02-03 15:25:29 +03:00
Dmitry Babokin
b7f4324b8f Merge pull request #954 from ncos/ispc_build_fails
ISPC build fix on LLVM 3.7 (failed after rev.227685)
2015-02-03 15:03:47 +03:00
Anton Mitrokhin
85578f0462 fixed 64-bit __ashr intrinsic 2015-02-03 14:47:37 +03:00
Anton Mitrokhin
01fb69798f fix after 3.7 buildfail on rev.227685 2015-02-03 10:28:41 +03:00
Andrey Guskov
382aacd710 Minor CPU info fix 2015-02-02 21:33:57 +03:00
Dmitry Babokin
ec2fc0cf0b Merge pull request #951 from Vsevolod-Livinskij/knc_header_fix
Knc header fix
2015-02-02 14:15:50 +03:00
Andrey Guskov
ace3f20a22 Added Broadwell architecture 2015-01-30 19:20:44 +03:00
Vsevolod Livinskiy
5b28d0e703 Fail_db update 2015-01-30 18:39:12 +03:00
Vsevolod Livinskiy
7816fae331 Fix for __vec16_i64 element extracting 2015-01-30 18:23:35 +03:00
Dmitry Babokin
64bc48863d Merge pull request #950 from ncos/knc-backend-merge
Update 'fail_db.txt' - removing knc fails
2015-01-30 12:22:35 +03:00
Anton Mitrokhin
1412f663f6 update 'fail_db.txt' after the previous fix 2015-01-30 11:33:09 +03:00
Andrey Guskov
d983203b52 Updated ISPC CPU naming scheme, added synonyms 2015-01-29 21:33:44 +03:00
Dmitry Babokin
c461049896 Merge pull request #948 from egaburov/ptxgen-fix
changed pxtgen.cpp to be real C++ program.
2015-01-29 20:49:54 +03:00
Dmitry Babokin
54dcfdca99 Merge pull request #949 from Vsevolod-Livinskij/knc_header_fix
Fix for equal_i1
2015-01-29 19:37:51 +03:00
Vsevolod Livinskiy
d5cd049d8f Fix for equal_i1 2015-01-29 18:44:17 +03:00
evghenii
c1777bff3b some code improvement 2015-01-29 13:16:58 +01:00
evghenii
008fc6c51e fix for LIBNVVM_HOME error 2015-01-29 13:08:55 +01:00
Dmitry Babokin
933f78de7b Merge pull request #947 from ncos/knc-backend-merge
Added 'aos_to_soa' and 'soa_to_aos' implementations (gather/scatter)
2015-01-29 15:07:01 +03:00
Anton Mitrokhin
eeba937282 added 'aos_to_soa' and 'soa_to_aos' implementations (gather/scatter) 2015-01-29 14:26:01 +03:00
evghenii
24acd3f492 first commit for modified ptxgen 2015-01-29 11:38:50 +01:00
Dmitry Babokin
5d7c147a6a Merge pull request #946 from egaburov/nvptx_cuda7
a fix for LLVM 3.5 when used with CUDA 7
2015-01-27 19:37:37 +03:00
Dmitry Babokin
f49875d5cf Merge pull request #945 from aguskov/master
LLVM trunk fix: added getTargetLowering()
2015-01-27 17:52:12 +03:00
Andrey Guskov
7e88b42107 LLVM trunk fix: added getTargetLowering() 2015-01-27 17:41:27 +03:00
evghenii
4698e665e4 LLVM 3.5 fix for CUDA 7 2015-01-27 15:35:27 +01:00
Dmitry Babokin
115c8020b1 Merge pull request #944 from egaburov/nvptx_cuda7
Change name mangling to support for NVVM in CUDA7.
2015-01-27 11:02:33 +03:00
evghenii
bd28502d2c revert disable NVPTX by default in Makefile 2015-01-27 07:02:49 +01:00
evghenii
d4a77e1b44 changed function mangling to be compatible with NVVM in CUDA7 2015-01-27 07:02:12 +01:00
Dmitry Babokin
f291d90271 Merge pull request #943 from aguskov/master
Tracking LLVM trunk - changed llvm::TargetLibraryInfoWrapperPass()
2015-01-26 16:50:12 +03:00
Andrey Guskov
3f13af8e62 Fixed LLVM trunk compatibility 2015-01-26 16:36:19 +03:00
Dmitry Babokin
27ff7ac8d7 Merge pull request #942 from aguskov/master
Slight example code cleanup
2015-01-26 16:04:57 +03:00
Andrey Guskov
d359503ad8 Made AO_INSTRUMENTED example compile on Linux 2015-01-26 15:44:01 +03:00
Andrey Guskov
ed53df90a4 Eliminated MSVC warnings 2015-01-26 16:26:07 +04:00
Andrey Guskov
05a64f1302 RTC+Win32 fix that preserves SORT correctness 2015-01-26 15:23:14 +04:00
Dmitry Babokin
a6206398fc Merge pull request #941 from dbabokin/rtc
Fixing examples on Windows to compile correctly.
2015-01-23 16:29:25 +03:00
Dmitry Babokin
f065b060bd Fixing examples on Windows to compile correctly. 2015-01-23 16:29:12 +03:00
Dmitry Babokin
122142e439 Merge pull request #940 from aguskov/master
Suppressed unneeded bitcode export warnings
2015-01-21 18:55:45 +03:00
Andrey Guskov
d0673231d4 Suppressed unneeded bitcode export warnings 2015-01-21 18:52:00 +03:00
Dmitry Babokin
732d6a9e2c Merge pull request #939 from dbabokin/knc_fails
Adding KNC fails with LLVM 3.7, also adding 6 new fails (due to long ints).
2015-01-21 16:33:36 +03:00
Dmitry Babokin
00f03292e6 Adding KNC fails with LLVM 3.7, also adding 6 new fails (due to long ints). 2015-01-21 16:33:19 +03:00
Dmitry Babokin
625a249b65 Merge pull request #938 from dbabokin/llvm351
Moving from LLVM 3.5 to LLVM 3.5.1
2015-01-21 12:39:38 +03:00
Dmitry Babokin
7601532de8 Moving from LLVM 3.5 ro LLVM 3.5.1 2015-01-21 12:40:04 +03:00
Dmitry Babokin
11e9f4e3e0 Merge pull request #937 from dbabokin/fails_llvm37
Adding 3.7 fails and 6 generic fails on 3.6 caused by long integers.
2015-01-20 15:49:56 +03:00
Dmitry Babokin
d01e4fbe7a Adding 3.7 fails and 6 generic fails on 3.6 caused by long integers. 2015-01-20 15:45:52 +03:00
Dmitry Babokin
09d770f833 Merge pull request #936 from aguskov/master
3.7-related copyright update
2015-01-20 15:11:48 +03:00
Andrey Guskov
2f2af816e7 3.7-related copyright update 2015-01-20 14:56:58 +03:00
Dmitry Babokin
e70379e04f Merge pull request #935 from aguskov/master
Added LLVM 3.7 support
2015-01-19 22:33:44 +03:00
Andrey Guskov
ae8b724d92 Added LLVM 3.7 support 2015-01-19 17:30:59 +03:00
Dmitry Babokin
abeda29087 Merge pull request #933 from dbabokin/v182dev
Bumping version to 1.8.2dev
2015-01-12 14:49:27 +03:00
Dmitry Babokin
9c0b4cb8b3 Bumping version to 1.8.2dev 2015-01-12 14:49:48 +03:00
Dmitry Babokin
c2016af162 Merge pull request #932 from dbabokin/v181
v1.8.1
2014-12-31 14:44:29 +03:00
Dmitry Babokin
cfd3418653 News and ReleaseNotes update. 2014-12-31 14:43:45 +03:00
Dmitry Babokin
a095fd622f Bumping version to 1.8.1 2014-12-31 14:20:38 +03:00
Dmitry Babokin
73141e58ed Merge pull request #931 from dbabokin/examples
Minor fix for examples to compile with strict C++ compiler.
2014-12-31 12:36:30 +03:00
Dmitry Babokin
c560a1b10e Merge pull request #930 from dbabokin/autodispatch
Fixing problem with autodispatch when compiled with NVPTX target
2014-12-31 12:35:43 +03:00
Dmitry Babokin
82187f8372 Minor fix for examples to compile with strict C++ compiler. 2014-12-31 12:35:11 +03:00
Dmitry Babokin
aeaef1bedf Fixing problem with autodispatch when compiled with NVPTX target 2014-12-31 12:34:12 +03:00
Dmitry Babokin
7d01e4254b Merge pull request #929 from dbabokin/deps
Allowing deps generation for multi-target.
2014-12-30 21:16:15 +03:00
Dmitry Babokin
b31dbcf756 Allowing deps generation for multi-target. 2014-12-30 21:06:53 +03:00
Dmitry Babokin
314289c52f Merge pull request #928 from dbabokin/master
Adding missing AddressOfExpr::GetLValueType()
2014-12-30 18:15:47 +03:00
Dmitry Babokin
da83196996 Adding missing AddressOfExpr::GetLValueType() 2014-12-30 18:11:39 +03:00
Dmitry Babokin
fb535fdfd0 Merge pull request #927 from dbabokin/metadata
Fixing cbackend after LLVM changes which split Value and Metadata classes.
2014-12-29 20:53:36 +03:00
Dmitry Babokin
3b1eaa65e4 Fixing cbackend after LLVM changes which split Value and Metadata
classes.
2014-12-29 20:44:54 +03:00
Dmitry Babokin
08e246c628 Merge pull request #926 from ncos/knc-backend-merge
Further improvements in knc.h
2014-12-26 17:10:53 +03:00
Anton Mitrokhin
4b48bfa87a upate fail_db (fixes 279 compfails and 4 runfails). New runfails: soa-18, pmulus_vi64, funcptr-varying-7, funcptr-varying-8 2014-12-26 16:52:49 +03:00
Anton Mitrokhin
948e75dcb5 examined soa-18 runfail 2014-12-26 16:21:23 +03:00
Anton Mitrokhin
192aeb0ae3 add 'examples/intrinsics/known_fails.txt' to track difficult runfails/compfails 2014-12-26 13:58:49 +03:00
Anton Mitrokhin
2d403cf258 add __masked_load_i64 2014-12-26 02:06:16 +03:00
Anton Mitrokhin
5dfb96a310 add cast_bits(i64 i64) 2014-12-26 01:39:43 +03:00
Anton Mitrokhin
134accaf46 add cast_bits(i32 i32) 2014-12-26 01:38:25 +03:00
Anton Mitrokhin
3f607ade14 add cast_sext plus shl/ashr function fix 2014-12-26 01:12:14 +03:00
Anton Mitrokhin
300ff7be75 add insert_element(vec1_ ... 2014-12-25 23:11:21 +03:00
Dmitry Babokin
9a56f8fb9a Merge pull request #923 from Vsevolod-Livinskij/fix_223802
Fix for trunk after rev 223802
2014-12-25 14:52:43 +03:00
Vsevolod Livinskiy
8672aff298 New approach to fix 2014-12-25 14:31:46 +03:00
Anton Mitrokhin
83b65ef534 add scatter_int16 2014-12-24 19:15:14 +03:00
Anton Mitrokhin
515c5f76e0 fixed 9 runfails caused by wromg permutation masks 2014-12-24 19:02:23 +03:00
Anton Mitrokhin
b1fdbd63ec add scatter64_i32/float 2014-12-24 18:23:46 +03:00
Anton Mitrokhin
d90f677cb4 add gather64_i16 2014-12-24 18:06:09 +03:00
Anton Mitrokhin
797238464a add gather64_i8 2014-12-24 17:59:17 +03:00
Anton Mitrokhin
cd53e6abed add gather_base_offsets32_i16 2014-12-24 17:51:39 +03:00
Anton Mitrokhin
1476d45536 add gather64_double 2014-12-24 17:47:46 +03:00
Anton Mitrokhin
476f186e8e add gather64_i32 2014-12-24 17:36:21 +03:00
Dmitry Babokin
8fa45f20dc Merge pull request #924 from ncos/knc-backend-merge
Fix errors generated by knc.h present in icpc 13.1
2014-12-11 18:15:12 +03:00
Vsevolod Livinskiy
24e8c33506 Fix for trunk after rev 223802 2014-12-11 17:53:51 +03:00
Anton Mitrokhin
b17fcdeba2 added entries for icpc13 (the same as for icpc15) 2014-12-11 18:38:59 +04:00
Anton Mitrokhin
ea2764b345 removed fixed fails (fixed by Vsevolod) One new runfail 2014-12-11 18:35:42 +04:00
Anton Mitrokhin
4d4a512f72 fixed warnings produced by 64 bit reduce functions without affecting the functionality 2014-12-11 17:58:51 +04:00
Anton Mitrokhin
91538b5366 fixed noisy warnings in icpc13 produced by unsupported 64 bit 'reduce' functions 2014-12-11 17:31:48 +04:00
Anton Mitrokhin
e4c79418c8 '__scatter64_i64' function fixed for icc 13 2014-12-11 14:13:30 +04:00
Anton Mitrokhin
dc2e174d4f added compiler version guards, '__scatter64_i64' is not implemented 2014-12-11 13:13:16 +04:00
Dmitry Babokin
521aca9f09 Merge pull request #922 from jbrodman/windll
Add DLL storage class for export functions on windows
2014-12-11 11:21:10 +03:00
Dmitry Babokin
0ac7ed3c97 Merge pull request #920 from ncos/ispc_build_fails
Fixing 40 sse2 compfails
2014-12-07 15:37:36 +03:00
Anton Mitrokhin
91ee4ea7b6 add guards to prevent unfolded 'ConstantExpr' from sneaking into 'lConstantElementsToMask' - we handle only Ints and Doubles there 2014-12-07 15:24:09 +03:00
Dmitry Babokin
98acce1185 Merge pull request #921 from dbabokin/alloy
Fix in run_tests.py to make it work on Windows
2014-12-05 15:09:59 +03:00
Dmitry Babokin
4da0381aae Fix in run_tests.py to make it work on Windows
(it was broken by nvxptx target).
2014-12-05 16:05:56 +04:00
Dmitry Babokin
dc0db2d1cb Merge pull request #919 from Vsevolod-Livinskij/knc_header_fix
Knc header fix
2014-12-05 12:43:21 +03:00
Vsevolod Livinskiy
44ee5737a7 Changes in cast_zext 2014-12-04 23:04:25 +04:00
Vsevolod Livinskiy
9022806fc0 Changes in cast 2014-12-04 19:53:23 +04:00
Vsevolod Livinskiy
91866396ef Fix for half to float 2014-12-04 19:40:30 +04:00
Vsevolod Livinskiy
ff9380b248 Fix for count_leading_zeros 2014-12-04 19:30:30 +04:00
Vsevolod Livinskiy
b00575e95c Fix for __cast_zext(__vec16_i64, __vec16_i1) 2014-12-04 18:59:40 +04:00
jbrodman
ac524eff0d Add DLL storage class for export functions on windows 2014-12-04 01:58:42 -08:00
Vsevolod Livinskiy
df9f839ed4 Merge 2014-12-04 13:45:47 +04:00
Vsevolod Livinskiy
8d51e0620e Fix for int64 comparison 2014-12-04 13:37:24 +04:00
Vsevolod Livinskiy
70adb5d283 Fix for unsigned cmp for int64 2014-12-04 13:36:19 +04:00
Vsevolod Livinskiy
7e63862dff __sqrt_varying_double 2014-12-04 13:35:03 +04:00
Vsevolod Livinskiy
190d7957d4 Fix for __shuffle2_float 2014-12-04 13:35:03 +04:00
Vsevolod Livinskiy
307c825c4e Fix for __shift_i32 2014-12-04 13:35:03 +04:00
Vsevolod Livinskiy
7e39ed115d Fix for shuffle2_double 2014-12-04 13:35:03 +04:00
Vsevolod Livinskiy
52eddcdcfd Some changes in max/min_varying_uint/int64 2014-12-04 13:35:03 +04:00
Vsevolod Livinskiy
0d9eceb668 Fix for prefetch 2014-12-04 13:35:02 +04:00
Vsevolod Livinskiy
7084432a0e Fix for max/min_varying_uint/int64 2014-12-04 13:35:02 +04:00
Dmitry Babokin
b02f30c822 Merge pull request #918 from dbabokin/trunk
Removing -debug-ir functionality for 3.6, as it was removed from LLVM.
2014-12-03 18:31:19 +03:00
Dmitry Babokin
701bd9b029 Removing -debug-ir functionality for 3.6, as it was removed from LLVM. 2014-12-03 18:28:32 +03:00
Dmitry Babokin
fb2c22c32d Merge pull request #917 from dbabokin/selfbuild
Fix for selfbuild to use both CC and CXX, but not only CC.
2014-12-03 18:27:03 +03:00
Dmitry Babokin
ddc25dffcf Fix for selfbuild to use both CC and CXX, but not only CC. 2014-12-03 15:36:36 +03:00
Dmitry Babokin
1211aa95fd Merge pull request #914 from hansbogert/master
Added cstdlib include
2014-12-01 18:43:28 +03:00
Hans van den Bogert
d1b4ad6aca Added cstdlib include
the atoi and other functions were not transitively included under clang/llvm3.5
2014-12-01 14:09:13 +01:00
Dmitry Babokin
d18b85f8ca Merge pull request #913 from Vsevolod-Livinskij/fail_db_update
Fail_db update
2014-11-27 22:34:03 +03:00
Vsevolod Livinskiy
5d2ccfbe75 Fail_db update 2014-11-27 21:09:18 +03:00
Dmitry Babokin
f1e68b6bef Merge pull request #912 from ncos/knc-backend-merge
Knc backend merge
2014-11-27 19:47:18 +03:00
Anton Mitrokhin
51a3d71fea fial_db.txt refresh 2014-11-27 19:22:22 +04:00
Anton Mitrokhin
6f518dfcc9 fixed lshr (i64 ...) function implementation 2014-11-27 18:33:44 +04:00
Anton Mitrokhin
0a987ad06f reduce_add_int8/16 implementation fixed 2014-11-27 17:41:21 +04:00
Anton Mitrokhin
7333e6ab7a Merge branch 'knc-backend-merge' of https://github.com/ncos/ispc into knc-backend-merge 2014-11-27 15:57:37 +03:00
Anton Mitrokhin
296b057a0a added debug helpers for knc-i1x16.h 2014-11-27 16:54:46 +04:00
Anton Mitrokhin
5f3128bbb2 cast_sext/sext, i64 (l)shl/r and mul 2014-11-27 16:53:25 +04:00
Dmitry Babokin
f0a5db068f Merge pull request #911 from aguskov/master
Fix for DWARF debug info generation pipeline
2014-11-26 23:56:07 +03:00
Dmitry Babokin
265d5b7924 Merge pull request #910 from dbabokin/trunk
Build fix for LLVM trunk - changed include structure
2014-11-26 14:24:54 +03:00
Dmitry Babokin
0bb5404e6b Build fix for LLVM trunk - changed include structure 2014-11-26 14:21:50 +03:00
Andrey Guskov
82cf349bf9 Fix for DWARF debug info generation pipeline
Applies to LLVM-3.4, -3.5 and -trunk builds of ISPC
2014-11-25 22:27:23 +03:00
Dmitry Babokin
707056f825 Merge pull request #908 from dbabokin/trunk
Tracking LLVM trunk: SmallPtrSet changes
2014-11-21 16:40:54 +03:00
Dmitry Babokin
22dcc9a651 Tracking LLVM trunk: SmallPtrSet changes 2014-11-21 16:37:47 +03:00
Dmitry Babokin
56c78e55d5 Merge pull request #906 from ncos/ispc_build_fails
fixed ISPC compilation fail caused by LLVM DIBuilder API change
2014-11-19 18:20:45 +03:00
Anton Mitrokhin
2e3af7f474 fixed ISPC compilation fail caused by the change of llvm::Value to llvm::Constant in DIBuilder member functions (LLVM commit 222070) 2014-11-19 13:24:46 +03:00
Dmitry Babokin
3290ccd020 Merge pull request #903 from ncos/knc-backend-merge
Add several functions to knc.h
2014-11-14 11:03:29 -08:00
Anton Mitrokhin
1934a09a3d updated fail_db.txt to hide 9 new runfails and tests that have been fixed 2014-11-14 17:22:23 +04:00
Anton Mitrokhin
7233e25907 removed unnecessary &255's (knc.h) 2014-11-14 17:17:24 +04:00
Anton Mitrokhin
615d36b236 Merge branch 'master' of https://github.com/ncos/ispc into knc-backend-merge
Conflicts:
	examples/intrinsics/knc.h
2014-11-14 11:54:34 +04:00
Dmitry Babokin
e834703bb7 Merge pull request #904 from jbrodman/master
Disable optimization to eliminate illegal results
2014-11-13 16:13:15 -08:00
Dmitry Babokin
2b956e9cad Merge pull request #902 from Vsevolod-Livinskij/knc_header_fix
Knc header fix
2014-11-13 15:26:32 -08:00
Dmitry Babokin
d3158d1efe Merge pull request #901 from ncos/ispc_build_fails
LLVM trunk build fail fix (MDNode interface change)
2014-11-13 15:25:35 -08:00
Anton Mitrokhin
b21043c309 - fixed 'cast_trunc', 'cast_zext' and 'cast_sext' implementations (the previous was faulty)
- added '__select (i8/16)', '__equal_i8/16_and_mask', '__not_equal_i8/16_and_mask' functions
2014-11-13 22:41:55 +04:00
Vsevolod Livinskiy
37a4362417 Fix for __cast_uitofp 2014-11-13 20:21:02 +04:00
jbrodman
c565ec08ac Disable optimization to eliminate illegal results 2014-11-13 07:23:00 -08:00
Vsevolod Livinskiy
8f68769af8 cast double/float to/from int64 2014-11-13 18:10:31 +04:00
Anton Mitrokhin
79fa1c3d4d Added some more functions to knc.h:
- __scatter64_i64
- __scatter_base_offsets32_double
- __scatter_base_offsets32_i64
2014-11-13 17:18:47 +04:00
Vsevolod Livinskiy
6a2cb442ee __reduce_add/min/max_int64 2014-11-13 14:21:10 +04:00
Vsevolod Livinskiy
6606d20a47 insert/extract_element_i8/16 and broadcast_i8/16 2014-11-13 13:48:01 +04:00
Anton Mitrokhin
5cb50eca30 LLVM trunk build fail fix (MDNode interface change) 2014-11-13 12:02:58 +03:00
Vsevolod Livinskiy
67243f3550 Merge commit 2014-11-13 13:00:17 +04:00
Vsevolod Livinskiy
308746c7fb Minor fixes to remove copy-paste 2014-11-13 12:45:16 +04:00
Vsevolod Livinskiy
28e9032b10 __not_equal_i8/16 was added 2014-11-13 12:44:41 +04:00
Vsevolod Livinskiy
44bcce3cc8 __smear_i8/16 was added 2014-11-13 12:44:41 +04:00
Dmitry Babokin
4351fc600b Merge pull request #899 from ncos/knc-backend-merge
Added several i8/i16 functions to knc.h
2014-11-11 22:48:16 -08:00
Dmitry Babokin
a1bcbed837 Merge pull request #897 from Vsevolod-Livinskij/knc_header_fix
Knc header fix
2014-11-10 16:25:32 -08:00
Anton Mitrokhin
4152a0f9ed rewrite masked_load_i8/16 for it not to segfault 2014-11-09 22:06:10 +04:00
Anton Mitrokhin
d8f4635366 rewrittem __masked_store_i8/16 functions so they won't segfault 2014-11-09 20:01:58 +04:00
Anton Mitrokhin
e7717e58b5 - added 'cast_trunk' and 'cast_zext' sets of functions
- commented out fault [] overload for vec16_i64
- added loop based function macros for vec16_i8/16 data types
2014-11-09 11:59:19 +04:00
Vsevolod Livinskiy
83f3ee7cfa __shuffle2_i8/16/32/64 was added 2014-11-09 00:21:20 +04:00
Vsevolod Livinskiy
1b8afb73ad __rotate_i8/16/64 was added 2014-11-09 00:21:10 +04:00
Vsevolod Livinskiy
67cc62d619 Minor fixes to remove copy-paste 2014-11-09 00:21:08 +04:00
Vsevolod Livinskiy
12376e6a0c __shuffle_i8/16 was added. __reduce* functions were fixed. 2014-11-09 00:21:04 +04:00
Vsevolod Livinskiy
9316dd65c0 __not_equal_i8/16 was added 2014-11-09 00:21:01 +04:00
Vsevolod Livinskiy
3b1445c660 __smear_i8/16 was added 2014-11-09 00:20:58 +04:00
Vsevolod Livinskiy
39b1e4a204 masked_load/store_i16 was added 2014-11-09 00:20:24 +04:00
Dmitry Babokin
df7d65b076 Merge pull request #891 from ncos/knc-backend-merge
Add missing _cast_uitofp/fptoui, setzero and min_varying functions
2014-11-06 09:03:54 -08:00
Dmitry Babokin
247dff8ab3 Merge pull request #898 from Vsevolod-Livinskij/fix_for_trunk
Fix for trunk (rev. 221167 and rev. 221375)
2014-11-06 08:57:57 -08:00
Vsevolod Livinskiy
fcc1090595 Fix for rev.221375 2014-11-06 17:06:37 +03:00
Vsevolod Livinskiy
9e184e65cf Fix for rev.221167 2014-11-06 16:52:47 +03:00
Anton Mitrokhin
15e1f86711 - Removed the debug code occasionally sneaked to the repo
- Put 12 knc runfails to the fail_db.txt (were invisible due to debug code)
- Several minor fixes not affecting functionality
2014-11-06 16:48:51 +04:00
Dmitry Babokin
6bc4aee029 Merge pull request #896 from jbrodman/master
Fix nested structs being expanded in headers
2014-11-05 16:21:49 -08:00
Dmitry Babokin
b45429fc44 Merge pull request #894 from Vsevolod-Livinskij/fix_221024
Fix for rev. 221024 and 221027
2014-11-05 16:18:07 -08:00
jbrodman
6cfe21094b Fix nested structs being expanded in headers 2014-11-03 06:14:47 -08:00
Vsevolod Livinskiy
90d9b91007 Fix for rev. 221024 and 221027 2014-11-03 01:29:06 +03:00
Dmitry Babokin
89f6fdd29c Merge pull request #892 from Vsevolod-Livinskij/knc_header_fix
Fix for load/store i16
2014-10-30 15:39:00 -07:00
Dmitry Babokin
b07bf9f5c7 Merge pull request #890 from Vsevolod-Livinskij/fix_for_trunk
Fix for trunk
2014-10-30 12:25:09 -07:00
Vsevolod Livinskiy
9a8782d122 Fix for load/store i16 2014-10-30 18:37:38 +04:00
Anton Mitrokhin
e2249365f3 setzero functions for i8 and i16 2014-10-30 18:36:39 +04:00
Anton Mitrokhin
1f2079f2a8 fixed __cast_fpto<...> set of functions 2014-10-30 17:49:57 +04:00
Vsevolod Livinskiy
a239f37302 Fix for missing declaration of IntrinsicInst 2014-10-30 16:29:48 +03:00
Vsevolod Livinskiy
215ca762d0 Fix for rev. 220741. LinkModiles emits error by itself 2014-10-29 22:52:52 +04:00
Anton Mitrokhin
a4bcb1e1e2 __min_varying_<...> intrinsics added 2014-10-29 20:16:34 +04:00
Dmitry Babokin
0c0f69e3bf Merge pull request #887 from ncos/knc-backend-merge
Updated fail_db.txt to remove new passes fails
2014-10-21 17:48:29 +04:00
Anton Mitrokhin
0fa17b1385 updated fail_db.txt 2014-10-21 17:43:42 +04:00
Dmitry Babokin
92e86ac5b7 Merge pull request #885 from ncos/knc-backend-merge
Further improvements in knc.h
2014-10-21 17:18:48 +04:00
Anton Mitrokhin
8b34fb853e Commented out the faulty functions. Our code generator does not provide the necessary alligning to use them. 2014-10-21 15:42:14 +04:00
Anton Mitrokhin
09b8f65246 changed the names of 'reduce' functions (ixx -> intxx) to match the names generated by code generator 2014-10-21 15:26:07 +04:00
Dmitry Babokin
9a3e4d8788 Merge pull request #884 from ncos/alloy_modifications
Add the '--notify-subject' option to alloy.py
2014-10-21 14:29:48 +04:00
Anton Mitrokhin
577f5e368a -Add the '--notify-subject' option to alloy.py to allow non-standard notification message subjects and simplify message
filtering
2014-10-21 14:10:54 +04:00
Anton Mitrokhin
da281a99ec Merge branch 'master' of https://github.com/ncos/ispc into knc-backend-merge 2014-10-21 12:37:18 +04:00
Dmitry Babokin
1671430b50 Merge pull request #883 from dbabokin/alloy_version
Update warning
2014-10-20 21:11:21 +04:00
Dmitry Babokin
6c766be68c Update warning 2014-10-20 21:10:40 +04:00
Dmitry Babokin
883963d90d Merge pull request #881 from dbabokin/typos
typos
2014-10-17 03:42:17 +04:00
Dmitry Babokin
b84fdb1f3c typos 2014-10-17 03:40:22 +04:00
Dmitry Babokin
8645ee5546 Merge pull request #880 from dbabokin/v180
Bumping version to 1.8.1dev
2014-10-17 03:26:03 +04:00
Dmitry Babokin
ffa4b9e65c Bumping version to 1.8.1dev 2014-10-17 03:25:14 +04:00
Dmitry Babokin
c6dbb2c066 Merge pull request #879 from dbabokin/v180
v1.8.0
2014-10-17 03:22:26 +04:00
Dmitry Babokin
0af6f6460b ReleaseNotes and news update. 2014-10-17 01:24:36 +04:00
Dmitry Babokin
156f4e0fd8 Bumping version to 1.8.0 2014-10-16 23:34:38 +04:00
Dmitry Babokin
e97601d3a8 Merge pull request #878 from dbabokin/make
Typo. Keep LDFLAGS commented.
2014-10-16 20:26:29 +04:00
Dmitry Babokin
100285a325 Typo. Keep LDFLAGS commented. 2014-10-16 20:25:31 +04:00
Dmitry Babokin
9796674218 Merge pull request #877 from dbabokin/make
Fixing comment in Makefile and adding tips for building binaries for distribution on Linux.
2014-10-16 20:18:52 +04:00
Dmitry Babokin
8283cfc3ff Fixing comment in Makefile and adding tips for building binaries for
distribution on Linux.
2014-10-16 20:17:02 +04:00
Dmitry Babokin
d62fe78e6e Merge pull request #876 from dbabokin/ptx_off
PTX support refinement
2014-10-16 17:28:28 +04:00
Dmitry Babokin
5715350d96 Build clang with NVPTX support. 2014-10-16 17:24:43 +04:00
Dmitry Babokin
17ee085396 PTX support is off by default.
Fix to make it compile with PTX support off by default.
2014-10-16 17:20:26 +04:00
Dmitry Babokin
c5cf05e085 Merge pull request #875 from Vsevolod-Livinskij/fail_db_update
Fail_db update.
2014-10-16 16:36:03 +04:00
Vsevolod Livinskiy
f59c6223b4 Fail_db update for icc.v15 2014-10-16 16:28:14 +04:00
Dmitry Babokin
30270584aa Merge pull request #749 from egaburov/nvptx_clean
Experimental support for PTX with examples
2014-10-16 15:56:02 +04:00
evghenii
92377426bd changed progress bar implementation 2014-10-16 11:53:05 +02:00
Anton Mitrokhin
3d71932ca6 knc.h: added [] overloads to type defenitions 2014-10-16 12:57:55 +04:00
Dmitry Babokin
ccf4d00385 Merge pull request #874 from dbabokin/alloy-gnu-toolchain
alloy.py script upgrade
2014-10-15 23:09:50 +04:00
Dmitry Babokin
f7390aaec9 Adding default sysroot detection for MacOS 10.9 (and newer), which
allows working out of the box. Otherwise C standard headers are not
found.
2014-10-15 23:02:59 +04:00
Dmitry Babokin
d79529cada Adding --with-gcc-toolchain alloy switch, which is passed to configure.
This is the only way to build functional clang on old Linux system,
which has old system gcc and new gcc (4.7+) installed in non-system
location.

Also adding switch to build only x86 target in clang.
2014-10-15 20:40:37 +04:00
Dmitry Babokin
8f9a935132 Merge pull request #873 from dbabokin/knc
KNC prefetch:
2014-10-14 20:59:48 +04:00
Dmitry Babokin
3f24c8dedc KNC prefetch:
- make L3 prefetch to hit L2$, instead of being nop.
- fix vector prefetch to use single intrinsic, instead of two (this is
  caused by bug in Composer 14.0 documentation).
2014-10-14 20:24:16 +04:00
Dmitry Babokin
741df13eb8 Merge pull request #871 from ncos/knc-backend-merge
Replaced vec16_i1 type definition in knc.h from typedef to struct version, adding a constructor
2014-10-14 19:24:53 +04:00
Dmitry Babokin
8e31d7e99c Merge pull request #872 from ncos/alloy_modifications
Changed the 'newest_LLVM' variable to 3.5 and fixed an error in help message
2014-10-14 19:24:38 +04:00
Anton Mitrokhin
7ac8a9ea04 changed vec16_i1 type definition in knc.h from typedef to struct 2014-10-14 18:52:03 +04:00
Anton Mitrokhin
dcf70d887b Changed the 'newest_LLVM' variable to 3.5 and fixed an error in help message 2014-10-14 18:23:47 +04:00
evghenii
558d7ee1d3 fix compilation with gcc481 2014-10-14 15:49:50 +02:00
Dmitry Babokin
2e49ecd56f Merge pull request #870 from aguskov/master
Added a patch for LLVM 3.5 to enable proper debugging under Windows
2014-10-14 17:31:55 +04:00
evghenii
47b1f2182f fix documentation typo 2014-10-14 15:24:30 +02:00
aguskov
efee86fa2e Update 3_5_win_coff_debug_info.patch 2014-10-14 17:17:49 +04:00
evghenii
4e7ae5269b added pseudo_prefetch definitions 2014-10-14 14:48:02 +02:00
evghenii
9238c72e08 Merge branch 'master' into nvptx_clean_master 2014-10-14 14:27:00 +02:00
evghenii
83a863ea83 fix compilation for llvm 3.5 2014-10-14 14:17:36 +02:00
Andrey Guskov
75759254c3 Added a patch for LLVM 3.5 to enable proper debugging under Windows 2014-10-14 15:58:23 +04:00
Dmitry Babokin
44ed10b6f2 Merge pull request #869 from dbabokin/typo
Typo fix
2014-10-14 14:54:05 +04:00
Dmitry Babokin
82ed2ce416 Typo fix 2014-10-14 14:51:12 +04:00
Dmitry Babokin
29c0f75306 Merge pull request #868 from dbabokin/win_build
Adding missing dependency for LLVM 3.6 (Windows specific problem) and fi...
2014-10-13 19:36:37 +04:00
Dmitry Babokin
481224bbcd Adding missing dependency for LLVM 3.6 (Windows specific problem) and fixing end-of-lines in ispc.vcxproj 2014-10-13 19:31:11 +04:00
Dmitry Babokin
743f89f93f Merge pull request #867 from jbrodman/master
Fix warnings in knc.h with appropriate casting
2014-10-09 11:52:10 +04:00
James Brodman
3aa2cce504 Fix warnings in knc.h with appropriate casting 2014-10-08 17:21:36 -07:00
Dmitry Babokin
a83fec3dd0 Merge pull request #865 from ncos/ispc_build_fails
Build fail fix for ISPC with LLVM 3.6 (current trunk)
2014-10-02 18:29:20 +04:00
Dmitry Babokin
b8a9139f8e Merge pull request #864 from Vsevolod-Livinskij/opt_prefetch
Optimized prefetch
2014-10-02 17:16:51 +04:00
Vsevolod Livinskiy
eb61d5df72 Support for cache 2/3 and all targets 2014-10-02 16:25:23 +04:00
Anton Mitrokhin
57fb2a75ec new interface for 'DIBuilder::insertDeclare' and 'DIBuilder::createGlobalVariable' in LLVM 3.6 (ids: 076fd5dfc1f0600183bbc7db974dc7b39086136d and bc88cfc3512326d6c8f8dbf3c15d880c0e21feb0 correspondingly 2014-10-02 15:35:19 +04:00
Vsevolod Livinskiy
0a6eb61ad0 Extend gather-scatter optimization with prefetch optimization 2014-10-02 15:21:43 +04:00
Dmitry Babokin
df38b862c2 Merge pull request #863 from ncos/knc-fails
Changed the default knc header file from 'knc-i1x16.h' to 'knc.h'
2014-10-02 13:57:59 +04:00
Dmitry Babokin
ddc6e33bc0 Merge pull request #862 from ncos/knc-backend-merge
Modification of 'knc.h'
2014-10-02 13:57:45 +04:00
Anton Mitrokhin
700fe244e7 removed debug macros 2014-10-02 13:08:47 +04:00
Anton Mitrokhin
c934a68bc4 changed 'const int' to 'const int8_t' in '__vec16_i8' constructor 2014-10-02 13:00:55 +04:00
Anton Mitrokhin
8295df5a1e fixed header in 'knc.h' 2014-09-25 22:54:29 +04:00
Anton Mitrokhin
3b16cd8c56 fixed 'knc-i1x16.h' to compile with icc v.15 beta 2014-09-25 21:29:49 +04:00
Anton Mitrokhin
4560df284b changed codestyle to 2 spaces in 'knc.h' 2014-09-25 21:29:49 +04:00
Anton Mitrokhin
832aff5d76 commented out '__vec16_i1' type as there is no '__cast_uitofp' which produces compfails 2014-09-25 21:29:49 +04:00
Anton Mitrokhin
46bd353027 fixed 'INT_MIN' bug in '__gather64_i64' 2014-09-25 21:29:49 +04:00
Anton Mitrokhin
1a2979aa7f start fixing gather/scatter functions (INT_MIN fix) 2014-09-25 21:29:49 +04:00
Anton Mitrokhin
8b8e313dc6 starded work on gather/scatter instructions 2014-09-25 21:29:48 +04:00
Anton Mitrokhin
0881463d69 changed '__vec16_i1 __equal_i64(const __vec16_i64 &a, const __vec16_i6_i64 &b, __vec16_i1 mask)' function 2014-09-25 21:29:48 +04:00
Anton Mitrokhin
2e92989101 changed '__vec16_i64 __load(const __vec16_i64 *p)' 2014-09-25 21:29:48 +04:00
Anton Mitrokhin
1d69b954bd added several memory functions for i8 2014-09-25 21:29:48 +04:00
Anton Mitrokhin
90843b3bff changed a constructor of '__vec16_i8' 2014-09-25 21:29:48 +04:00
Anton Mitrokhin
78a7ef9fc5 added several math functions, new runfails: gather-int8-2/4; ldexp-double (previously compfailed) 2014-09-25 21:29:47 +04:00
Anton Mitrokhin
efa0ea01f3 add 'void __masked_store_i64(void *p, const __vec16_i64 &v, __vec16_i1 mask)' function. now 450 compfails and 12 runfails 2014-09-25 21:29:47 +04:00
Anton Mitrokhin
ddf5df6193 made 'int64_t __extract_element(const __vec16_i64 &v, uint32_t index)' function 2014-09-25 21:29:47 +04:00
Anton Mitrokhin
4fff0ab571 added __vec16_i32 __cast_trunc(__vec16_i32, const __vec16_i64 i64) function. ptr-diff-3/5/6 tests are apparently runfailing because of it 2014-09-25 21:29:47 +04:00
Anton Mitrokhin
85b703981d added several helper functions 2014-09-25 21:29:46 +04:00
Anton Mitrokhin
45114d3283 No actual code change: changed the code style for '__vec16_i8' struct in 'knc.h' 2014-09-25 21:29:46 +04:00
Anton Mitrokhin
723baca4c2 added several useful defines and a warning that '__vec16_i1' might not be working with embree 2014-09-25 21:29:46 +04:00
Vsevolod Livinskiy
0a5b16dbe6 some minor fixes for knc.h 2014-09-25 21:29:46 +04:00
Anton Mitrokhin
3f1362f744 changed the default knc header file from 'knc-i1x16.h' to 'knc.h' 2014-09-25 15:58:56 +04:00
Dmitry Babokin
8ff187a7b7 Merge pull request #860 from Vsevolod-Livinskij/fix_217548
LLVM trunk: DataLayoutPass has new constructor.
2014-09-12 12:09:57 +04:00
Vsevolod Livinskiy
6a99135fa2 DataLayoutPass has new constructor 2014-09-12 11:56:47 +04:00
Dmitry Babokin
c12d6ec82c Merge pull request #859 from ncos/llvm-release
Changed LLVM 3.5 svn download path to 'final'
2014-09-09 14:54:19 +04:00
Anton Mitrokhin
bf94e51418 changed 3.5 svn path to 'final' 2014-09-09 14:44:21 +04:00
Dmitry Babokin
c522aa87a8 Merge pull request #856 from jbrodman/master
Fix issue where break stmts would in executing with mask all off.
2014-09-04 18:04:41 +04:00
jbrodman
7c9235559d Remove #include included for testing. 2014-09-03 06:59:12 -07:00
Dmitry Babokin
be681118f5 Merge pull request #855 from dbabokin/tests
Fail_db.txt update
2014-09-01 13:50:28 +04:00
Dmitry Babokin
e3bb2b423b Fail_db.txt update 2014-09-01 13:49:51 +04:00
Dmitry Babokin
47e00ed50c Merge pull request #854 from Vsevolod-Livinskij/fail_db_update
Fail_db update
2014-08-29 19:11:32 +04:00
Vsevolod Livinskiy
fd6c0d2704 Fail_db update 2014-08-29 19:03:47 +04:00
Dmitry Babokin
b1245ef28d Merge pull request #853 from Vsevolod-Livinskij/fix_216488
Fix for llvm revision 216488
2014-08-28 13:11:08 +04:00
Vsevolod Livinskiy
e0f0520c1f Fix for llvm revision 216488 2014-08-28 12:59:03 +04:00
Dmitry Babokin
1c14cfc342 Merge pull request #852 from Vsevolod-Livinskij/fix_icc_15
Fix to support icc v.15 beta
2014-08-26 16:37:19 +04:00
Dmitry Babokin
fe797352a0 Merge pull request #851 from Vsevolod-Livinskij/fix_216393
Fix for llvm revision 216393
2014-08-26 15:29:04 +04:00
Vsevolod Livinskiy
57f29e5035 Fix to support icc v.15 beta 2014-08-26 15:20:10 +04:00
Vsevolod Livinskiy
12d0cb2037 Fix for llvm revision 216393 2014-08-26 14:20:12 +04:00
Dmitry Babokin
e33cf51b28 Merge pull request #850 from Vsevolod-Livinskij/fix_216239
Fix for llvm revision 216239
2014-08-24 22:32:09 +04:00
Vsevolod Livinskiy
93c7ddbac5 Added supported LLVM versions 2014-08-24 21:37:42 +04:00
Vsevolod Livinskiy
5ba7d2696d Fix for llvm revision 216239 2014-08-24 17:35:28 +04:00
Dmitry Babokin
2351f0898a Merge pull request #847 from ncos/knc_faildb
Modifications of fail_db.txt: ptr-22 fixed; runfails #20472; generic compfails 3.5->3.6
2014-08-17 23:26:47 +04:00
Anton Mitrokhin
81091d4be8 llvm 3.6 generic-4/16 compfails (already added for llvm 3.5) 2014-08-16 21:37:26 +04:00
Anton Mitrokhin
09ac9c5d00 12 runfails tracked in #20472 (LLVM trunk commit 213898 by Chandler) 2014-08-16 13:35:22 +04:00
Anton Mitrokhin
f8b6ca2649 ptr-22 test fixed on knc. removing from fail_db.txt 2014-08-16 13:05:24 +04:00
Dmitry Babokin
f5b373b818 Merge pull request #846 from ncos/knc_faildb
Reverted fail_db modifications after LLVM sse4 fix - see LLVM bug 20540.
2014-08-15 18:03:47 +04:00
Anton Mitrokhin
fbfcc28753 Revert "added 3.6 sse4-i16x8 compfails to fail_db"
This reverts commit e3c476112c.
2014-08-15 17:48:16 +04:00
Anton Mitrokhin
9e84c0ef0c Revert "add 12 runfails first appeared in LLVM 3.5 trunk on 26/07/14. these are now in trunk 3.6. erroneous commit 213898"
This reverts commit a9954dd8e7.
2014-08-15 17:47:51 +04:00
Anton Mitrokhin
88f39db2eb Revert "compfails also appeared after LLVM commit 214628 (tracker id 20540)"
This reverts commit 68577ae8a3.
2014-08-15 17:47:25 +04:00
Dmitry Babokin
1aaf623138 Merge pull request #845 from ncos/knc-fails
Fixed 64 bit gather/scatter functions in knc.h and knc-i1x16.h (fixes stability problem on ptr-22.ispc at -O0, which appears as a segfault from time to time)
2014-08-15 16:21:55 +04:00
Anton Mitrokhin
77dc94ab22 undefined printf functions in knc-i1x16.h 2014-08-15 16:18:40 +04:00
Anton Mitrokhin
bd8d02527b removed ugly INT32_MIN define (included limits.h) and updated the copyright 2014-08-15 16:18:29 +04:00
Anton Mitrokhin
7adacf5a7b 64 bit gather/scatter fix for knc.h 2014-08-15 16:04:31 +04:00
Anton Mitrokhin
6b5b547e2f modified all gather/scatter instructions in 'knc-i1x16.h' 2014-08-15 16:04:31 +04:00
Anton Mitrokhin
9c9c77d2db changes in __scatter_base_offsets64_float 2014-08-15 16:04:31 +04:00
Dmitry Babokin
b56dd9df21 Merge pull request #844 from aguskov/build_flags
- build fix on Windows for 3.5 and 3.6
- ispc flags in examples in Debug mode where switched to -O0 -g from -O2.
2014-08-13 16:14:18 +04:00
Andrey Guskov
22b7f548ad made ISPC + LLVM-trunk compile on Windows 2014-08-13 15:46:56 +04:00
Andrey Guskov
c2a75231d6 changed examples` debug flags for Windows: added -g, -O2 switched to -O0 2014-08-13 15:11:20 +04:00
Dmitry Babokin
dd4f9c86c0 Merge pull request #843 from ncos/knc_faildb
compfails also appeared after LLVM commit 214628 (tracker id 20540)
2014-08-13 14:12:07 +04:00
Anton Mitrokhin
68577ae8a3 compfails also appeared after LLVM commit 214628 (tracker id 20540) 2014-08-13 14:07:34 +04:00
Dmitry Babokin
4a963315e5 Merge pull request #842 from ncos/knc_faildb
add 12 runfails first appeared in LLVM 3.5 trunk on 26/07/14
2014-08-13 13:48:41 +04:00
Anton Mitrokhin
a9954dd8e7 add 12 runfails first appeared in LLVM 3.5 trunk on 26/07/14. these are now in trunk 3.6. erroneous commit 213898 2014-08-13 13:14:17 +04:00
Dmitry Babokin
1588e8f7ca Merge pull request #841 from ncos/knc_faildb
added 3.6 sse4-i16x8 compfails to fail_db
2014-08-13 11:34:25 +04:00
Anton Mitrokhin
e3c476112c added 3.6 sse4-i16x8 compfails to fail_db 2014-08-13 09:43:03 +04:00
evghenii
8745888ce9 merged with master 2014-08-11 10:04:54 +02:00
Dmitry Babokin
7c9c2b25b0 Merge pull request #840 from ncos/knc_faildb
fail_db update for generic/3.5/clang
2014-08-11 12:01:21 +04:00
Anton Mitrokhin
6c622eeeee generic on LLVM 3.5 with clang faildb update 2014-08-11 11:52:29 +04:00
Dmitry Babokin
9b053c5518 Merge pull request #839 from ncos/log_beautifier
'notify.log' trashing removed
2014-08-11 11:43:41 +04:00
Anton Mitrokhin
8a72e23b29 'notify.log' trashing removed 2014-08-11 11:11:30 +04:00
Dmitry Babokin
538d9c4dc8 Merge pull request #838 from aguskov/rt_thr_win
Fixed multithreading on Windows (alloy.py)
2014-08-07 19:56:00 +04:00
Andrey Guskov
f87ecf0573 added comments to the fix 2014-08-07 19:28:15 +04:00
Andrey Guskov
fbccf0f8b0 made multithreading work correctly on Windows by waiting on a queue instead of thread join() loop 2014-08-06 20:23:54 +04:00
Dmitry Babokin
96d3363cb3 Merge pull request #837 from ncos/ispc_build_fails
Fix for a build fail after llvm commit 214781
2014-08-06 13:01:18 +04:00
Anton Mitrokhin
bfad04d648 fix for build fail after llvm commit 214781 2014-08-06 12:57:13 +04:00
Dmitry Babokin
8c3808c540 Merge pull request #836 from ncos/log_beautifier
Make e-mail messages look more beautiful
2014-08-06 12:48:10 +04:00
Anton Mitrokhin
d0f4ece58a merge with master 2014-08-06 12:44:47 +04:00
Dmitry Babokin
258ad709d4 Merge pull request #835 from ncos/alloy_cleanup
Removed several ugly constructions from alloy.py
2014-08-06 12:30:56 +04:00
Anton Mitrokhin
ab91a90734 some minor changes 2014-08-06 12:28:59 +04:00
Anton Mitrokhin
3ebc80ade6 fixed unresolved class use 2014-08-06 10:56:37 +04:00
Anton Mitrokhin
898a56c53b added fail counters to the top of the message body 2014-08-06 10:52:23 +04:00
Anton Mitrokhin
b2928df4f8 merge with master 2014-08-06 09:41:07 +04:00
Anton Mitrokhin
f98038a03c removed some dead code 2014-08-06 09:36:47 +04:00
Dmitry Babokin
dd07e2a457 Merge pull request #834 from ncos/log_beautifier
Make test fail counters count fails for each LLVM revision separately.
2014-08-05 17:03:32 +04:00
Anton Mitrokhin
3a2db12e77 Merge branch 'master' of https://github.com/ncos/ispc into log_beautifier 2014-08-05 16:19:55 +04:00
Anton Mitrokhin
c1f5a8d8ae made text look more beautiful 2014-08-05 16:19:16 +04:00
Anton Mitrokhin
6f3f4ff4a1 add separate test fail count for each LLVM revision 2014-08-05 16:18:11 +04:00
Dmitry Babokin
0571ca54f6 Merge pull request #833 from ncos/ispc_build_fails
Changed LLVM 3.5 repository path to http://llvm.org/svn/llvm-project/llvm/branches/release_35
2014-08-05 14:06:33 +04:00
Anton Mitrokhin
d44c4c05aa removed leading slash in a comment 2014-08-05 13:32:04 +04:00
Dmitry Babokin
00aa11f620 Merge pull request #832 from ncos/ispc-mail-tail
tail the run_tests.log (100 lines)
2014-08-05 12:46:09 +04:00
Anton Mitrokhin
02e08892b3 tail the run_tests.log (100 lines) 2014-08-05 12:39:46 +04:00
Anton Mitrokhin
b292cffa4a changed 3.5 repo path from tag to branch until rc2 is out 2014-08-05 12:31:08 +04:00
Dmitry Babokin
ef949e14bb Merge pull request #831 from ncos/remove-support-for-old-llvm
Remove support for llvm 3.1
2014-08-04 20:07:32 +04:00
Anton Mitrokhin
3dafbc4516 remover LLVM 3_1 from cbackend.cpp 2014-08-01 17:30:30 +04:00
Anton Mitrokhin
ee256cd3c6 Merge branch 'master' of https://github.com/ncos/ispc into remove-support-for-old-llvm 2014-08-01 17:28:14 +04:00
Anton Mitrokhin
10f00c6e4c removed LLVM 3.1 option from python scripts 2014-08-01 17:12:21 +04:00
Anton Mitrokhin
02e584d932 fixed broken Makefile (failed to build 3.5) 2014-08-01 17:06:34 +04:00
Anton Mitrokhin
d64e1bcd82 updated Makefile for reverce LLVM_3_6 flag 2014-08-01 16:07:52 +04:00
Anton Mitrokhin
60fa76ccc1 reversed macros LLVM_3_6 to LLVM_3_5+ in .cpp and .h files 2014-08-01 15:40:48 +04:00
Dmitry Babokin
b51d9bde91 Merge pull request #830 from ncos/ispc_build_fails
removed 'llvm/Config/config.h' from cbackend.cpp build for LLVM 3.5+
2014-08-01 15:14:05 +04:00
Dmitry Babokin
290b8b9cc0 Merge pull request #829 from ncos/knc_faildb
update fail_db for 3.6
2014-08-01 15:12:35 +04:00
Anton Mitrokhin
d0c9b7c9b5 wiped out all LLVM 3.1 support 2014-08-01 14:54:08 +04:00
Anton Mitrokhin
10c6f840c4 removed 'llvm/Config/config.h' from cbackend.cpp build for LLVM 3.5+ 2014-08-01 12:30:57 +04:00
Anton Mitrokhin
ce7a00d174 update fail_db for 3.6 2014-08-01 12:21:50 +04:00
Dmitry Babokin
9189bbb7d3 Merge pull request #828 from ncos/ispc_build_fails
Add support for LLVM 3.6 build
2014-07-30 16:56:02 +04:00
Anton Mitrokhin
368d2f18f9 rewritten comment for util.m4 2014-07-30 16:43:41 +04:00
Anton Mitrokhin
7171701599 checked Makefile 'if' constructions, fixed ReleaseNotes.txt, added comments to util.m4 2014-07-30 16:25:39 +04:00
Anton Mitrokhin
c0fc9b7aca removed duplicatees in Makefile 2014-07-30 15:15:15 +04:00
Anton Mitrokhin
f59a1db8b7 fixed 3.6 llvm build in type.cpp 2014-07-30 13:57:06 +04:00
Anton Mitrokhin
725be222ac added LLVM_3_6 var 2014-07-30 11:50:15 +04:00
Dmitry Babokin
e044db016d Merge pull request #827 from ncos/advanced-dump
Put the new code under try-catch
2014-07-22 12:47:02 +04:00
Anton Mitrokhin
b6592c796d fixed run_tests.py bug 2014-07-22 12:20:34 +04:00
Dmitry Babokin
21fb015069 Merge pull request #826 from ncos/advanced-dump
Advanced dump
2014-07-21 23:44:48 +04:00
Anton Mitrokhin
a051c9bbd8 tt_dump_read.py import bux fixed 2014-07-21 19:48:42 +04:00
Anton Mitrokhin
48e4373d28 comments on merge request are taken into account 2014-07-21 19:36:35 +04:00
Anton Mitrokhin
fb82d37f09 add host name printing in stability.log and e-mail body 2014-07-21 16:18:40 +04:00
Anton Mitrokhin
ad3b66a178 renamed regression_read.py -> tt_read.py 2014-07-21 16:12:44 +04:00
Anton Mitrokhin
ece37ea4f5 regression_read.py improved 2014-07-21 16:11:27 +04:00
Dmitry Babokin
aff9427f64 Merge pull request #825 from ncos/stability_compiler
passing icpc compiler for knc target
2014-07-20 17:22:57 +04:00
Anton Mitrokhin
00d435f726 fixed typos, added command line arguments check to regresion_read.py 2014-07-20 15:19:06 +04:00
Anton Mitrokhin
74160e3737 merged in the gcc_compile branch 2014-07-20 13:32:24 +04:00
Anton Mitrokhin
09ec2db6e9 add advanced test completion logging 2014-07-20 13:21:05 +04:00
Anton Mitrokhin
7c3a2435b3 passing icpc compiler for knc target 2014-07-19 14:49:49 +04:00
Dmitry Babokin
083e1db489 Merge pull request #824 from ncos/stability_compiler
pass g++ as a compiler to run_tests.py when ispc_build_compiler==gcc
2014-07-18 20:03:38 +04:00
Anton Mitrokhin
9dd1d67a39 pass g++ as a compiler to run_tests.py when ispc_build_compiler==gcc 2014-07-18 19:34:34 +04:00
Dmitry Babokin
edeeda6fa5 Merge pull request #823 from ncos/knc_faildb
updated fail_db.txt for knc target
2014-07-17 15:46:10 +04:00
Anton Mitrokhin
fd253d5e91 updated fail_db.txt for knc target 2014-07-17 15:38:58 +04:00
Dmitry Babokin
cb41a28391 Merge pull request #821 from ncos/ispc_gcc_compile_option
Add an alloy.py option to build ispc binary with g++
2014-07-16 15:14:30 +04:00
Anton Mitrokhin
21cb609423 changed 'ispc-compiler' to 'ispc-build-compiler' in error message 2014-07-16 15:12:39 +04:00
Anton Mitrokhin
111b10cd5b Merge branch 'master' of https://github.com/ncos/ispc into ispc_gcc_compile_option
Conflicts:
	alloy.py
2014-07-14 22:38:58 +04:00
Anton Mitrokhin
1461b09b5f '--ispc_compiler' changed to '--ispc_build_compiler', 'check_isa.cpp' now compiles with '--ispc_build_compiler' (g++ or clang), removed code duplication 2014-07-14 22:04:03 +04:00
Dmitry Babokin
c2d2efd37d Merge pull request #822 from dbabokin/dot2
Moving 3.4 build to LLVM 3.4.2 release.
2014-07-14 18:55:38 +04:00
Dmitry Babokin
ad67e3bb5d Moving 3.4 build to LLVM 3.4.2 release. 2014-07-14 18:53:43 +04:00
Dmitry Babokin
23b861b48c Merge pull request #820 from ncos/ispc-mail-tail
This will make 'alloy_build' shorter
2014-07-14 18:15:01 +04:00
Anton Mitrokhin
1cd5d72be4 removed a bug 2014-07-14 17:31:06 +04:00
Anton Mitrokhin
ff2f3c896f add an alloy.py option to build ispc binary with g++ 2014-07-14 16:18:07 +04:00
Anton Mitrokhin
7cde453e36 add 'tail' capability to 'attach_mail_file' function, shortened 'alloy_build_log.log' being sent to 100 lines in case of success and 400 in case of failure 2014-07-14 16:06:59 +04:00
Dmitry Babokin
6da9d43f69 Merge pull request #819 from ncos/Makefile-issue
Fixed Makefile target gcc to use g++ instead of clang
2014-07-11 19:09:04 +04:00
Anton Mitrokhin
d56d690d77 Fixed Makefile target gcc to use g++ instead of clang 2014-07-11 18:25:33 +04:00
Dmitry Babokin
d618670fd3 Merge pull request #818 from jbrodman/master
Add proper cast to eliminate size mismatch warning
2014-07-11 15:32:08 +04:00
jbrodman
d049746585 Add proper cast to eliminate size mismatch warning 2014-07-10 03:11:46 -07:00
Dmitry Babokin
108fa7b97c Merge pull request #817 from Vsevolod-Livinskij/hotfix
Fix for trunk 212388
2014-07-09 23:42:28 +04:00
Vsevolod Livinskiy
8b107ad67a Fix for trunk 212388+ 2014-07-09 23:19:27 +04:00
evghenii
d607b9cb31 added Licence for GPU Ocelot 2014-07-09 15:34:30 +02:00
Dmitry Babokin
20fb66fa31 Merge pull request #816 from Vsevolod-Livinskij/alloy_update
Alloy.py update
2014-07-09 15:13:54 +04:00
evghenii
b3c5a9c4d6 added #ifdef ISPC_NVPTX_ENALED ... #endif guards 2014-07-09 12:32:18 +02:00
evghenii
44c74728bc removed file 2014-07-09 08:27:30 +02:00
evghenii
1290e8c4cf added copyright to examples 2014-07-09 08:26:54 +02:00
evghenii
ed12687837 added copyright 2014-07-09 08:08:53 +02:00
Vsevolod Livinskiy
7ecc9e0769 Some fixes in send function 2014-07-09 09:59:50 +04:00
Vsevolod Livinskiy
d890ccc92c Unique way to send e-mail 2014-07-09 09:59:50 +04:00
Vsevolod Livinskiy
97057ed7a1 Send mail with build error. 2014-07-09 09:56:00 +04:00
Vsevolod Livinskiy
5512fb6275 alloy.py finds out revision of llvm 2014-07-09 09:56:00 +04:00
evghenii
28c5326711 replaced the rest with symlink 2014-07-09 07:50:55 +02:00
evghenii
38a35dfb9a replaced with symlinks first attempt 2014-07-09 07:47:44 +02:00
Evghenii Gaburov
562cbef14e Merge pull request #1 from dbabokin/egaburov-nvptx_clean
Egaburov nvptx clean
2014-07-09 07:48:01 +02:00
Dmitry Babokin
48a616d8a5 Whitespace and copyright fixes in test_static*.cpp 2014-07-08 20:20:10 +04:00
Dmitry Babokin
d8e2fdf913 White space and copyright fixes in examples. 2014-07-08 20:08:34 +04:00
evghenii
8894156df5 change documentation to remove llvm-3.2 dependency 2014-07-08 15:25:22 +02:00
evghenii
c117c49dc9 Merge branch 'nvptx_ll' into nvptx 2014-07-08 15:12:18 +02:00
evghenii
6fdf1a8f99 Merge branch 'master' into nvptx 2014-07-08 15:11:45 +02:00
evghenii
2dbb4d9890 remove dependenace on llvm-dis from 3.2 2014-07-08 15:11:13 +02:00
Dmitry Babokin
1459207d87 Merge pull request #815 from egaburov/master
fix for cpuFromIsa
2014-07-08 15:40:33 +04:00
evghenii
fe150c539f fix for exclusive_scan_and 2014-07-08 13:33:04 +02:00
Evghenii
4a2fe338ef fix for cpuFromIsa 2014-07-08 13:10:16 +02:00
Dmitry Babokin
6be1ce40bb Merge pull request #814 from ncos/master
Added KNC test target support for alloy.py (--only-targets='knc')
2014-07-08 15:09:04 +04:00
Anton Mitrokhin
4b3b293ba7 ... != Windows -> ... == Linux in knc target generation 2014-07-08 13:50:24 +04:00
Anton Mitrokhin
502acf97e7 Fix a bug with alloy.py constantly throwing an exception 2014-07-08 12:20:25 +04:00
evghenii
3459c75fbc PTX documentation. first commit 2014-07-08 09:21:20 +02:00
evghenii
1fc75ed494 started to work on documentation 2014-07-08 08:41:29 +02:00
evghenii
2ed65c8b16 resoved conflict with run_test.py 2014-07-08 08:20:41 +02:00
Anton Mitrokhin
c6ef1cab79 Added KNC test target support for alloy.py (--only-targets='knc') 2014-07-08 10:15:08 +04:00
evghenii
0a2aede9ab added notes from April 22 2014-07-08 08:13:29 +02:00
evghenii
6f3e6f368a added cpuFromIsa to nvptx target 2014-07-08 07:53:50 +02:00
Dmitry Babokin
182f2e83f4 Merge pull request #812 from Vsevolod-Livinskij/run_tests_update
run_test.py support knc target
2014-07-07 19:08:40 +04:00
evghenii
69f3898a61 Merge branch 'master' into nvptx_merge 2014-07-07 16:30:12 +02:00
Vsevolod Livinskiy
79bee49078 New option: save bin files 2014-07-07 12:38:38 +04:00
Vsevolod Livinskiy
af9f871b93 run_test.py support knc target 2014-07-04 17:32:32 +04:00
Dmitry Babokin
699674a054 Merge pull request #811 from ncos/master
Some minor fixes in alloy.py and exception handling in run_tests.py
2014-07-04 15:40:41 +04:00
Anton Mitrokhin
4dacd7e7a2 Added some basic test subprocess exception handling and remapped error messages to the e-mail 2014-07-04 15:19:45 +04:00
Anton Mitrokhin
c2d65f7ad2 Fixed multiple message sending and added more verbouse warning regarding inconsistent ISPC_HOME 2014-07-04 15:10:36 +04:00
Dmitry Babokin
6348ca8da9 Merge pull request #807 from ottxor/master
fix LLVM_VERSION for minor versions != 0
2014-06-26 11:52:41 +04:00
Christoph Junghans
1a8002cf65 fix LLVM_VERSION for minor versions != 0
llvm version 3.4.2 got converted to 3_4.2 and not 3_4 as intended.

see https://bugs.gentoo.org/show_bug.cgi?id=515114
2014-06-25 23:44:00 -06:00
Dmitry Babokin
8d510fd82c Merge pull request #804 from ifilippov/master
support LLVM build
2014-06-18 18:30:40 +04:00
Ilia Filippov
76ea59b40b support LLVM build 2014-06-18 17:53:42 +04:00
Dmitry Babokin
5e14469ea6 Merge pull request #802 from ifilippov/avx2
Changing +/-feature regulation to CPU regulation
2014-06-11 13:05:51 +04:00
Ilia Filippov
425540922c changing +/-feature regulation to CPU regulation 2014-06-11 10:25:31 +04:00
jbrodman
11fcaa20ae Merge pull request #799 from motiz88/patch-1
Small fixes for TBB on Windows
2014-06-09 09:52:08 -07:00
Dmitry Babokin
daae225887 Merge pull request #801 from ifilippov/master
support LLVM trunk
2014-06-09 17:15:13 +04:00
Ilia Filippov
4ed72335bd support LLVM 2014-06-09 16:53:26 +04:00
motiz88
5da05b365f Small fixes for TBB on windows
Changed an #ifdef ISPC_IS_WINDOWS in the definition of TaskInfo to #ifdef ISPC_USE_CONCRT, and fixed two calls to taskCount() that were missing parentheses.
2014-06-05 22:06:09 +03:00
Dmitry Babokin
3f183cfd06 Merge pull request #795 from jbrodman/undefinedmemberstructs
Add error messages for structs containing nested undefined structs
2014-05-28 14:03:34 +04:00
jbrodman
d3144da5eb Add error messages for structs containing nested undefined structs 2014-05-27 15:50:53 -07:00
Dmitry Babokin
e6b6766c33 Merge pull request #794 from ifilippov/master
Deleting print from 'safe_for_all_mask_off' functions
2014-05-27 19:34:48 +04:00
Ilia Filippov
2b064b272a deleting print from 'safe_for_all_mask_off' functions 2014-05-27 19:24:21 +04:00
Dmitry Babokin
5ec6c81d38 Merge pull request #793 from ifilippov/master
Support LLVM trunk (build and errors)
2014-05-27 12:39:16 +04:00
Ilia Filippov
e6131bd6a9 fixing error for LLVM trunk 2014-05-22 18:51:25 +04:00
Ilia Filippov
5f55a9b9e2 support of LLVM trunk 2014-05-22 18:50:57 +04:00
Dmitry Babokin
68f62b1fc3 Merge pull request #786 from dbabokin/copyright
Template copyright update (for html generation)
2014-04-22 19:07:48 +04:00
Dmitry Babokin
0173d60790 Template copyright update (for html generation) 2014-04-18 22:50:53 +04:00
jbrodman
39cb13a2e4 Merge pull request #785 from dbabokin/version
Bumping version to 1.7.1dev
2014-04-18 10:06:37 -07:00
Dmitry Babokin
eb8e94627d Bumping version to 1.7.1dev 2014-04-18 20:44:00 +04:00
Dmitry Babokin
0eb9c2b576 Merge pull request #784 from dbabokin/v1.7.0
v1.7.0
2014-04-18 18:24:18 +04:00
Dmitry Babokin
77de0ac342 News update 2014-04-18 18:20:22 +04:00
Dmitry Babokin
a2774f2cf5 Release notes, docs update 2014-04-18 18:19:47 +04:00
Dmitry Babokin
118542badd Merge pull request #783 from dbabokin/alias
Removing alias phases causing segfaults (second instance of the phases).
2014-04-18 00:12:02 +04:00
Dmitry Babokin
d63a94300c v1.7.0 2014-04-18 00:11:44 +04:00
Dmitry Babokin
dcc37451e5 Removing alias phases causing segfaults 2014-04-17 23:52:32 +04:00
Dmitry Babokin
fce4ba64a1 Merge pull request #782 from dbabokin/mic_perf
Fixing MIC performance issue (generate <2,2,2,2> as 2 in cpp).
2014-04-17 21:24:40 +04:00
Dmitry Babokin
096546f888 Fixing MIC performance issue, which showed up when we switched to
LLVM 3.4 (due to more aggressive optimizations): vector of *the same*
constants should be generated as scalar value in cpp file, instead of
__extract_element(splat(value), 0).
I.e. <2,2,2,2> should appear in cpp as 2, but not
__extract_element(splat(2), 0);
2014-04-17 21:03:42 +04:00
Dmitry Babokin
94467fdb70 Merge pull request #781 from dbabokin/master
Revert trigonometry to stdlib implementation on MIC
2014-04-14 19:37:53 +04:00
Dmitry Babokin
141ea81ba5 Revert trigonometry to stdlib implementation on MIC 2014-04-14 19:33:52 +04:00
Dmitry Babokin
d9c09e1b81 Merge pull request #779 from jbrodman/master
Guard for single inclusion
2014-04-10 20:17:24 +04:00
jbrodman
a8b03e768c 2014. 2014-04-10 01:13:46 -07:00
jbrodman
0cd53444a4 Merge branch 'master' of https://github.com/jbrodman/ispc
Conflicts:
	examples/util/util.isph
2014-04-10 01:11:14 -07:00
jbrodman
61970e1500 guard for single inclusion 2014-04-10 01:10:16 -07:00
jbrodman
1705b5a65e guard for single inclusion 2014-04-10 01:08:12 -07:00
Dmitry Babokin
c7281ef532 Merge pull request #777 from ifilippov/debug_info
Fix of debug_info
2014-04-02 16:04:50 +04:00
Ilia Filippov
7ebea86a44 These changes fix problem with debug info in LLVM 3.4 with structs and enums.
The reason of problem is that ISPC generates debugInfo type of struct (or enum)
in the scope, where the variable of this type appears.

Ctx.cpp:1586
llvm::DIScope scope = GetDIScope();
llvm::DIType diType = sym->type->GetDIType(scope);

It is always Lexical_Block in some function. It is not right because type can
be declared global or in some function or in some namespace. Struct and enums
are failing because they don't reduce to atomic types. The "Big" fix is to save
declaration place in Type class. But now ISPC doesn't know about it,
for example this doesn't emit an error:

void function_1() { struct Foo {float x;}; uniform Foo myFoo;}
void function_2() {                        uniform Foo myFoo;}

So now all type declarations are global and we can simply change scope
parameter to the current file.

The "Big" fix will be after integration with clang.
2014-04-02 15:49:57 +04:00
Dmitry Babokin
9bc9a5aa00 Merge pull request #776 from ifilippov/master
Removing winstuff include folder
2014-04-02 13:10:17 +04:00
Ilia Filippov
b1bf08c0d9 removing winstuff 2014-04-02 13:01:25 +04:00
Dmitry Babokin
372e7d42f9 Merge pull request #775 from ifilippov/alias_new
Support LLVM trunk after r204934 and zlib commits
2014-04-01 19:27:18 +04:00
Ilia Filippov
114f58bb0b support LLVM trunk after r204934 and zlib commits 2014-04-01 18:51:05 +04:00
Dmitry Babokin
f7d0158bac Merge pull request #774 from ifilippov/alias_new
Adding warning about LLVM_HOME in Makefile
2014-04-01 17:06:37 +04:00
Ilia Filippov
cc8bae2f2c Adding warning about LLVM_HOME in Makefile 2014-04-01 16:46:26 +04:00
Evghenii
fb581818c5 Merge branch 'master' into nvptx 2014-04-01 09:10:10 +02:00
Dmitry Babokin
5a16b3eb10 Merge pull request #773 from ifilippov/alias_new
Adding functions' inline attribute when we generate DebugInfo
2014-03-26 15:37:01 +03:00
Ilia Filippov
61ac03fc08 Adding functions' inline attribute when we generate DebugInfo 2014-03-26 16:27:47 +04:00
Dmitry Babokin
84073b9bf1 Merge pull request #772 from ifilippov/alias_new
Changing overload rules to match C++ behavior
2014-03-25 13:19:06 +03:00
Ilia Filippov
ecdc695b22 Changing overload rules to match C++ behavior: Emit a warning when the best overload match has some number of no-best matching parameters. 2014-03-25 12:41:09 +04:00
Dmitry Babokin
809a8f065c Merge pull request #771 from jbrodman/nomosoa3
Fix exported varying bug & backwards compatibility.
2014-03-24 18:19:48 +03:00
Dmitry Babokin
a8aabd78d2 Merge pull request #770 from ifilippov/alias_new
Fix in overloaded function selection logic.
2014-03-24 15:01:41 +03:00
Ilia Filippov
6f44d5b55f correction of overload issues 2014-03-24 15:47:21 +04:00
jbrodman
2c0a6d7f69 Fix exported varying bug & backwards compatibility. 2014-03-24 00:01:37 -07:00
Evghenii
4c837d94a9 raplce @llvm.trap into call void asm"trap;",""() 2014-03-19 16:35:14 +01:00
Evghenii
599624d962 fix for nvptx target 2014-03-19 11:25:33 +01:00
Evghenii
4641a15287 Merge branch 'master' into nvptx 2014-03-19 10:53:07 +01:00
Dmitry Babokin
792f04881c Merge pull request #769 from dbabokin/fails
fail_db.txt update on Linux
2014-03-18 22:08:22 +03:00
Dmitry Babokin
96fe6e4fb6 fail_db.txt update on Linux 2014-03-18 23:01:31 +04:00
Dmitry Babokin
f337401484 Merge pull request #768 from ifilippov/alias_new
Adding "const" attribute to "Atomic::Void" type
2014-03-17 14:22:39 +03:00
Ilia Filippov
02d55f24f6 adding const to Atomic::Void type 2014-03-17 14:42:55 +04:00
Dmitry Babokin
11f17ce0cd Merge pull request #767 from jbrodman/nomosoa2
Fix bugs with exported varyings.
2014-03-14 09:00:36 +03:00
jbrodman
43db682c6d Fix bugs with exported varyings. 2014-03-13 06:07:56 -07:00
Dmitry Babokin
2e5b0b7c07 Merge pull request #766 from dbabokin/saturated
Some editorial changes: copyright, etc.
2014-03-12 19:28:37 +03:00
Dmitry Babokin
ad9a0ac3df Merge pull request #765 from ifilippov/max_min
Resolving an issue with Debug Info metadata after LLVM_3_4
2014-03-12 19:24:32 +03:00
Dmitry Babokin
31b95b665b Copyright update 2014-03-12 20:19:16 +04:00
Ilia Filippov
27132e42e9 resolving an issue with Debug Info metadata after LLVM_3_4 2014-03-12 19:56:05 +04:00
Dmitry Babokin
8f8a9d89ef Removing trailing spaces in stdlib.ispc 2014-03-12 19:43:30 +04:00
Dmitry Babokin
f0ce2acc4f Copyright update, VS2010->VS2012 update, small fix in saturated arithmetic description 2014-03-12 19:42:08 +04:00
Dmitry Babokin
1c0729df59 Clarifying comment on new functions with saturated arithmetics 2014-03-12 19:40:52 +04:00
Dmitry Babokin
3f70585463 Merge pull request #764 from ifilippov/max_min
Support LLVM trunk after 203559 203213 and 203381 revisions
2014-03-12 14:09:45 +03:00
Ilia Filippov
ead5cc741d support LLVM trunk after 203559 203213 and 203381 revisions 2014-03-12 12:58:50 +04:00
Ilia Filippov
e1524891fc Merge pull request #751 from Vsevolod-Livinskij/master
Saturating multiplication for int64 was added.
2014-03-12 00:12:34 -07:00
jbrodman
e6a23a5932 Merge pull request #763 from dbabokin/debug_info
Fix for off by one problem in debug info with LLVM 3.3+
2014-03-11 09:18:41 -07:00
Dmitry Babokin
8999a69546 Fix for off by one problem in debug info with LLVM 3.3+ 2014-03-11 18:41:37 +04:00
Vsevolod Livinskij
dc00b4dd64 Undefined operation -INT64_MIN was fixed. 2014-03-08 20:11:04 +04:00
Ilia Filippov
fa8776cfa2 Merge pull request #762 from ifilippov/max_min
Support LLVM trunk
2014-03-07 04:39:06 -08:00
Ilia Filippov
47f7900cd3 support LLVM trunk 2014-03-07 16:28:56 +04:00
Dmitry Babokin
7d4f58f268 Merge pull request #761 from ifilippov/max_min
Changing uniform_min and uniform_max implementations for avx targets
2014-03-06 14:24:04 +03:00
Ilia Filippov
6738af0a0c changing uniform_min and uniform_max implementations for avx targets 2014-03-06 12:05:24 +04:00
Evghenii
335d36211c +1 2014-03-05 16:16:51 +01:00
Dmitry Babokin
d49c63ec3d Merge pull request #760 from ifilippov/master
support LLVM trunk after 202814-202842 revisions
2014-03-05 14:02:28 +03:00
Evghenii
644118cd17 runs on GPU but need further tuning 2014-03-05 11:49:15 +01:00
Evghenii
f086b7ff9b added tasking 2014-03-05 11:06:53 +01:00
Evghenii
2a77804739 first commit 2014-03-05 09:56:40 +01:00
Ilia Filippov
9ab8f4e10e support LLVM trunk after 202814-202842 revisions 2014-03-05 10:12:30 +04:00
Vsevolod Livinskij
2e2fd394bf Documents for saturating arithmetic was added. 2014-03-05 01:30:16 +04:00
Dmitry Babokin
eb7401e526 Merge pull request #759 from ifilippov/master
Supporting VS2012 for all examples
2014-03-04 18:33:01 +03:00
Ilia Filippov
4d05ec0e1e supporting VS2012 for all examples 2014-03-04 18:19:25 +04:00
Dmitry Babokin
1043ffae0a Merge pull request #758 from ifilippov/master
Support LLVM trunk
2014-03-04 15:14:52 +03:00
Ilia Filippov
c017e46820 support LLVM trunk after r202736 revision 2014-03-04 16:02:25 +04:00
Ilia Filippov
38ce3f368c support LLVM trunk after r202720 revision 2014-03-04 16:02:01 +04:00
Ilia Filippov
c4e35050b0 support of building with C++11 2014-03-04 16:01:18 +04:00
Dmitry Babokin
57c3c803d9 Merge pull request #757 from dbabokin/VS2012
VS2012 related updates
2014-02-28 19:14:43 +03:00
Vsevolod Livinskij
c2e05e2231 Algorithm was modified and division was changed to bit operations. 2014-02-28 20:06:46 +04:00
Dmitry Babokin
c9642aae86 Update of fail DB with result on Windows with VS2012 (3.4 and trunk). Excluding 3.3 result (too many fails) 2014-02-28 20:05:45 +04:00
Dmitry Babokin
9ef9f0bf32 Migrating to VS solution files to VS2012 2014-02-28 20:01:34 +04:00
Dmitry Babokin
798b6b202c Merge pull request #756 from jbrodman/memdocfix
Update docs to warn of the difference between sizeof(float) and sizeof(u...
2014-02-28 08:42:33 +03:00
jbrodman
5d58a814aa Merge pull request #744 from dbabokin/docs
Use icpc for KNC, not icc
2014-02-27 10:51:35 -08:00
jbrodman
fbb34b3f3a Merge pull request #747 from dbabokin/knc_extract_element
Knc.h fix
2014-02-27 10:51:17 -08:00
jbrodman
91621d7b17 Update docs to warn of the difference between sizeof(float) and sizeof(uniform float) 2014-02-27 02:38:57 -08:00
Dmitry Babokin
14718d8bbf Merge pull request #754 from ifilippov/master
Support LLVM trunk after r202168 r202190 revisions
2014-02-26 16:19:42 +03:00
Ilia Filippov
06c06456c4 support LLVM trunk after r202168 r202190 revisions 2014-02-26 17:06:58 +04:00
Dmitry Babokin
e95dfc7247 Merge pull request #753 from ifilippov/master
Supporting LLVM trunk after r202052 revision
2014-02-25 13:33:50 +03:00
Ilia Filippov
77e4564020 supporting LLVM trunk after r202052 revision 2014-02-25 14:25:22 +04:00
Vsevolod Livinskij
af836cda27 Saturating multiplication for int64 was added. 2014-02-23 19:48:03 +04:00
Dmitry Babokin
f0a7baf340 Remove conflicting __extract_element(__vec16_i64 ..., ...) 2014-02-22 01:10:55 +04:00
Evghenii
ed0f44f841 added .gitingore for ptxtools 2014-02-21 14:26:07 +01:00
evghenii
1a1dbdb476 runs on knc as well 2014-02-21 14:22:34 +01:00
Evghenii
259bb81b1e added softening 2014-02-21 14:18:29 +01:00
Evghenii
f09c74db21 added nbody_hermite4 2014-02-21 14:07:51 +01:00
Evghenii
42c4d3246c Merge branch 'master' into nvptx_clean 2014-02-21 12:45:01 +01:00
Evghenii
739e4e73ef make static illegal 2014-02-21 12:08:09 +01:00
Dmitry Babokin
1f2541fbec Merge pull request #748 from ifilippov/master
Revert "support of LLVM trunk after 201618 revision 'VirtualFileSystem'"
2014-02-21 13:43:18 +03:00
Ilia Filippov
95140d9d9f Revert "support of LLVM trunk after 201618 revision 'VirtualFileSystem'"
This reverts commit 7fcf408189.
2014-02-21 14:26:57 +04:00
Evghenii
b60d77c154 use __nv_* libcalls for rcp/sqrt/rsqrt 2014-02-21 10:36:46 +01:00
Dmitry Babokin
e8680760bf Merge pull request #741 from Vsevolod-Livinskij/master
Saturation arithmetic.
2014-02-21 12:30:58 +03:00
Dmitry Babokin
17d8047a93 Merge pull request #746 from ifilippov/master
Adding cases of 'cast' instructions in optimizations
2014-02-21 12:25:58 +03:00
Evghenii
a745aaf91c +1 2014-02-21 10:19:58 +01:00
Evghenii
8acfd8ea02 merged with master 2014-02-21 10:11:04 +01:00
Ilia Filippov
42e00ebb24 adding cases of 'cast' instructions in optimizations 2014-02-21 13:00:16 +04:00
evghenii
1ed4142fdd added Makefiles for knc 2014-02-21 09:37:07 +01:00
Evghenii
558d8182db used tuned omp version for tasksys for portable examples 2014-02-21 09:21:38 +01:00
Dmitry Babokin
5794d18737 Merge pull request #745 from egaburov/native_trigonometry
added transcdentals/trigonometry to builtins
2014-02-21 11:15:08 +03:00
Evghenii
6a52717184 options example 2014-02-21 09:07:45 +01:00
Evghenii
3fc95779ac added rt 2014-02-21 09:00:22 +01:00
Evghenii
a85972ffbc added volume rendering example 2014-02-21 08:51:33 +01:00
Evghenii
ee8e2b76e2 added deferred 2014-02-21 08:43:19 +01:00
Evghenii
71758e9186 added builtins for transcendentals 2014-02-21 08:34:30 +01:00
Evghenii
ac05de6835 merged with master 2014-02-21 08:25:28 +01:00
evghenii
e2d68e6119 added transcdentals/trigonometry to builtins 2014-02-21 08:17:40 +01:00
Vsevolod Livinskij
7dd7020c5f Dec constants was changed with hex constants. 2014-02-20 22:57:24 +04:00
Dmitry Babokin
788098b55f Use icpc for KNC, not icc 2014-02-20 20:36:39 +04:00
Dmitry Babokin
f280b32fa4 Merge pull request #736 from egaburov/native_trigonometry
Native trigonometry
2014-02-20 19:18:35 +03:00
Evghenii
690a8acb30 merged with master 2014-02-20 15:22:09 +01:00
Evghenii
7d3c248a05 added ifdefs 2014-02-20 14:46:09 +01:00
evghenii
94c0104ffa some tuning 2014-02-20 14:38:25 +01:00
evghenii
e20a8f7534 +1 2014-02-20 14:27:21 +01:00
evghenii
226bf34803 added knc Makefiles 2014-02-20 14:00:28 +01:00
Evghenii
607e010874 added mergeSort 2014-02-20 13:58:55 +01:00
evghenii
a169ff636b +added knc compilation path 2014-02-20 13:54:50 +01:00
Evghenii
463ddf1707 added radixSort 2014-02-20 13:49:58 +01:00
evghenii
0c0655c0f2 added knc compilation 2014-02-20 13:22:21 +01:00
Evghenii
93849370b1 added cpu compilation path 2014-02-20 12:57:50 +01:00
Evghenii
fc7cefcf19 adding portable examples 2014-02-20 12:43:24 +01:00
Evghenii
d4c6209aa0 added computation of elapsed msec 2014-02-20 12:30:51 +01:00
Evghenii
d6b04c58e8 moving helps to examples/util 2014-02-20 12:27:23 +01:00
Evghenii
04118d4965 +fix for .b0 in tests 2014-02-20 11:56:21 +01:00
Evghenii
5ca952bd9a some code cleanup 2014-02-20 11:31:48 +01:00
Evghenii
24e1a98275 compiles 2014-02-20 11:20:13 +01:00
Evghenii
a8c5da0ae0 adjusted Makefile 2014-02-20 11:04:11 +01:00
Evghenii
4196c723eb merged with nvptx 2014-02-20 11:01:58 +01:00
Evghenii
11612a24ee +added addition usage info 2014-02-20 10:53:23 +01:00
Evghenii
c54a91eab3 added ptxgen 2014-02-20 10:50:17 +01:00
Evghenii
dea856b7e3 added ptxgen 2014-02-20 08:52:37 +01:00
Evghenii
01cbe3d289 changed to Makefile for gpu 2014-02-20 08:45:52 +01:00
Evghenii
ce6ca49d21 adding ptxtools 2014-02-20 08:18:18 +01:00
Dmitry Babokin
c3b041f7ae Merge pull request #742 from ifilippov/master
Support of LLVM trunk after 201618 revision 'VirtualFileSystem'
2014-02-19 18:34:45 +03:00
Dmitry Babokin
aaf2684988 Merge pull request #743 from dbabokin/fails
Fail DB update on Linux and Windows
2014-02-19 18:33:26 +03:00
Ilia Filippov
7fcf408189 support of LLVM trunk after 201618 revision 'VirtualFileSystem' 2014-02-19 19:02:56 +04:00
evghenii
c325c0e085 added layout 2014-02-19 13:40:06 +01:00
Evghenii
67be0a85c0 use rand instead of drand48 2014-02-19 12:52:12 +01:00
Evghenii
3ec910ff85 +1 2014-02-19 12:50:54 +01:00
Dmitry Babokin
f387d12872 Use single variable to track the latest LLVM to use by default 2014-02-19 15:47:04 +04:00
Dmitry Babokin
1fec6a5556 Merging branches with Windows and Linux fails 2014-02-19 15:40:59 +04:00
Evghenii
946d2c17c8 use time for seed value 2014-02-19 11:57:21 +01:00
Evghenii
c513493757 +1 2014-02-19 11:51:14 +01:00
Evghenii
869379020d some filename changes 2014-02-19 11:44:08 +01:00
Dmitry Babokin
f3bb3dcfc2 Fail DB update on Windows 2014-02-19 14:40:02 +04:00
Dmitry Babokin
681898ed1e Add VS2012 as a default toolchain on Windows 2014-02-19 14:39:14 +04:00
Evghenii
a10be8087f +1 2014-02-19 11:38:15 +01:00
Evghenii
4d61c04e5c added working ptxc file 2014-02-19 11:26:30 +01:00
Evghenii
07fe1c5659 added anyoption 2014-02-19 09:28:31 +01:00
Dmitry Babokin
12b4345672 Fail DB update on Linux 2014-02-19 12:19:55 +04:00
Dmitry Babokin
6b50bd43f7 Default alloy run: 3.4+trunk, instead of 3.3+trunk 2014-02-19 12:19:22 +04:00
Vsevolod Livinskij
735e6a8ab3 Saturation arithmetic mul and div for int8/int16/int32 and div for int64 was added 2014-02-18 02:07:13 +04:00
Vsevolod Livinskij
f5508db24f Saturation arithmetic (sub and add) was added for int32/int64. 2014-02-17 18:55:40 +04:00
Dmitry Babokin
04fda2fcbe Merge pull request #739 from ifilippov/fluky
Fix for fluky problem 'argument out of range'
2014-02-13 16:13:32 +03:00
Ilia Filippov
e7b3a1c822 fix for fluky problem 'argument out of range' 2014-02-13 16:47:33 +04:00
Dmitry Babokin
ccbcd0a80d Merge pull request #738 from ifilippov/err
Set 3.4 version of LLVM to alloy.py, perf.py correction
2014-02-13 14:24:31 +03:00
Ilia Filippov
54b991cfcb set 3.4 version of LLVM to alloy.py, perf.py correction 2014-02-13 15:14:04 +04:00
Dmitry Babokin
b719019b26 Merge pull request #737 from ifilippov/err
Adding patch for LLVM 3.4 for bug #712
2014-02-13 13:33:31 +03:00
Ilia Filippov
cc81cd3215 adding patch for LLVM 3.4 for bug #712 2014-02-13 12:57:05 +04:00
Dmitry Babokin
e8039cd822 Merge pull request #673 from Vsevolod-Livinskij/master
Saturation arithmetic.
2014-02-11 16:40:40 +03:00
Vsevolod
a3c77e6dc6 Merge pull request #1 from dbabokin/Vsevolod-Livinskij-master
Fix for generic-1
2014-02-11 16:35:41 +03:00
Dmitry Babokin
ea0a514e03 Fix for generic-1 2014-02-11 15:33:23 +04:00
evghenii
193bba77b0 accuracy fix 2014-02-11 11:49:03 +01:00
Evghenii
f0779f95a3 added double precision tests 2014-02-11 11:40:40 +01:00
Vsevolod Livinskij
65d947e449 Else branch with error report was added 2014-02-10 15:18:48 +04:00
Vsevolod Livinskij
cef5b2eb04 Some changes in saturation arithmetic 2014-02-10 12:40:53 +04:00
Vsevolod Livinskij
1c1614d207 Some errors in comments and code were fixed 2014-02-09 21:39:42 +04:00
evghenii
8490efe0ad fix for knc.h. Due to a bug in ICC (tested with 13.1.3 & 14.0.1) ,the resulting .cpp file fails to compile 2014-02-07 16:00:21 +01:00
evghenii
438cee4e21 added support for double precision/native transendentals/trigonometry 2014-02-07 15:43:42 +01:00
Evghenii
70a9b286e5 added support for native and double precision trigonometry/transendentals 2014-02-07 15:28:39 +01:00
Evghenii
81aa19a8f0 added use of native_transendentals, need to add IR 2014-02-07 11:49:24 +01:00
Evghenii
668645fcda first commit 2014-02-07 11:05:36 +01:00
Evghenii
14e76108cb optimization for _all 2014-02-06 14:24:50 +01:00
Evghenii
9ecb4f4ac8 added tunnings for aobench 2014-02-06 10:13:18 +01:00
Evghenii
9e1ab7c6b6 allow to add ISPC_FLAGS 2014-02-06 10:13:01 +01:00
Evghenii
8ffa84f875 added some #ifdef .. #endif for control flow tests 2014-02-06 10:12:31 +01:00
Evghenii
c8e92feb14 added additional optimizaotion passes for PTX target 2014-02-06 10:11:58 +01:00
Evghenii
c23dd8a951 fixed __puts_nvptx 2014-02-05 17:48:04 +01:00
Evghenii
7b2ceba128 added "internal" for helper functions to avoid them being exported to PTX 2014-02-05 17:02:05 +01:00
Dmitry Babokin
2570385770 Merge pull request #730 from egaburov/double_math
Added double precision support for reciprocals: rsqrt rcp
2014-02-05 17:57:39 +03:00
Evghenii
aeb2f01a15 some performance fix. it works, but have no idea why. checkpoint 2014-02-05 15:36:06 +01:00
evghenii
c59cff396d added {rsqrt,rcp}d support for knc.h. test-147.ispc & test-148.ispc pass. 2014-02-05 13:55:38 +01:00
evghenii
ecc9c88ff8 fix packed_store_active2 for knc-i1x8.h 2014-02-05 13:52:24 +01:00
Evghenii
eb01ffd4e6 first commit for {rsqrt,rcp}d knc support. going to test on other node now 2014-02-05 13:43:07 +01:00
Evghenii
f225b558ec added {rsqrt,rcp}d support for sse4.h 2014-02-05 13:42:45 +01:00
Evghenii
688d9c9a82 added support for rsqrtd/rcpd for generic-*.h 2014-02-05 13:20:44 +01:00
evghenii
09e8381ec7 change {rsqrt,rcp}_double to {rsqrt,rcp}d_decl 2014-02-05 13:05:04 +01:00
evghenii
732a315a4b removed __declspec(safe) duplicate 2014-02-05 13:04:45 +01:00
Evghenii
686c1d676d improvements 2014-02-05 12:04:36 +01:00
Evghenii
048da693c5 fix sqrt 2014-02-05 10:52:08 +01:00
Dmitry Babokin
9a3b949687 Merge pull request #734 from dbabokin/run_test_compiler
Update list of accepted system compilers in run_test.py
2014-02-05 12:40:07 +03:00
Dmitry Babokin
40186d3813 Update list of accepted system compilers in run_test.py 2014-02-05 13:39:28 +04:00
Dmitry Babokin
66c986ba13 Merge pull request #733 from jbrodman/master
Modify alloy.py to put dbg llvm builds in different folders. Disallow initializing void * with ptr to const. (#731)
2014-02-05 11:32:25 +03:00
jbrodman
98cfc17843 Fix bug with printing due to uneven handling of bool types 2014-02-04 08:12:02 -08:00
Evghenii
d3a6693eef adding __have_native_{rsqrtd,rcpd} to select between native support for double precision reciprocals and using slower but safe version in stdlib 2014-02-04 16:29:23 +01:00
Evghenii
fe98fe8cdc added fast approximate rcp(double) accurate to 15 digits 2014-02-04 15:23:34 +01:00
Evghenii
eb1a495a7a added support for fast approximate rsqrt(double). Provide 16 digit accurancy but is over 3x faster than 1/sqrt(double) 2014-02-04 14:44:54 +01:00
jbrodman
720975dff4 Disallow initializing void * with ptr to const. 2014-02-04 03:36:19 -08:00
jbrodman
4ee0e6996a Merge branch 'master' of https://github.com/ispc/ispc 2014-02-04 02:48:41 -08:00
jbrodman
47bdca1041 Modify alloy.py to put dbg llvm builds in different folders. 2014-02-04 02:46:07 -08:00
Evghenii
c2ed214a74 added declaretion for movmsk_ptx 2014-02-03 08:57:27 +01:00
Evghenii
1a56fbc101 +1 2014-02-03 08:51:55 +01:00
Evghenii
98c82242c5 allowed static and disable memcpy/memmove/memset operations 2014-02-03 08:02:50 +01:00
Evghenii
e6a6df1052 +1 2014-02-02 19:04:26 +01:00
Evghenii
6d034596d3 +1 2014-02-02 19:01:10 +01:00
Evghenii
92e69c8197 +improved support for dp 2014-02-02 18:51:17 +01:00
Evghenii
b0753dc93d added double-version for rcp 2014-02-02 18:20:05 +01:00
Evghenii
4515dd5c89 added tests for rcp/rsqrt double 2014-02-02 18:19:56 +01:00
evghenii
3a72e05c3e +1 2014-02-02 18:16:48 +01:00
Evghenii
796942e6fa added type definition for real 2014-02-02 12:07:53 +01:00
Evghenii
522257343b added type definition for real 2014-02-02 12:07:26 +01:00
Evghenii
3594a80e04 some scheduler changes and stuff 2014-02-02 11:12:46 +01:00
Evghenii
9cc204e83a added ISPC_HOME path 2014-02-01 12:04:17 +01:00
Evghenii
17ab194031 added -G to nvptxcc 2014-02-01 11:58:32 +01:00
Evghenii
6bf2ad27d1 merge with master 2014-02-01 11:53:57 +01:00
Evghenii
0c47a902f5 added llvm compilation 2014-01-31 20:20:46 +01:00
Evghenii
b340663cdf fixed core count 2014-01-31 20:16:17 +01:00
Evghenii
98a275a382 fixed 2014-01-31 20:13:10 +01:00
Evghenii
11bc35eb6c +1 2014-01-31 20:10:47 +01:00
Evghenii
7ef2f7352c +1 2014-01-31 19:57:43 +01:00
Evghenii
eb82195ad7 some tuning 2014-01-31 19:53:52 +01:00
Dmitry Babokin
93cf02842c Merge pull request #729 from dbabokin/win_runtest
Fix for incorrect target list on Windows (alloy.py)
2014-01-31 09:15:46 -08:00
Dmitry Babokin
90ba3fddbc Fix for incorrect target list on Windows 2014-01-31 21:11:33 +04:00
Evghenii
bead800c13 +1 2014-01-31 17:49:39 +01:00
Dmitry Babokin
45d960f4f5 Merge pull request #728 from dbabokin/win_runtest
workaround for not removed tmp folders on Windows
2014-01-31 08:11:47 -08:00
Dmitry Babokin
ac0963a0a5 workaround for not removed tmp folders on Windows 2014-01-31 20:11:02 +04:00
Evghenii
5cf880e8fc works 2014-01-31 16:50:08 +01:00
Evghenii
86a6cfc1d0 problem solved 2014-01-31 10:36:30 +01:00
Evghenii
33e19d3bec 1 2014-01-30 20:02:26 +01:00
Evghenii
eb3277587a +1 2014-01-30 20:01:34 +01:00
Evghenii
adef91d82d +1 2014-01-30 17:59:21 +01:00
Evghenii
50c505b845 first commit 2014-01-30 15:01:02 +01:00
Evghenii
cedf4bccf4 +1 2014-01-30 14:49:38 +01:00
Evghenii
5e37370618 ispc generates slooow code .. 2014-01-30 14:47:18 +01:00
Evghenii
f90688b089 added driver 2014-01-30 14:17:09 +01:00
Evghenii
0a83f9ab6e changes to makefile 2014-01-30 14:17:00 +01:00
Evghenii
b79915c0c2 changes to makefile 2014-01-30 14:04:29 +01:00
Evghenii
a918bda679 first commit 2014-01-30 14:04:21 +01:00
Dmitry Babokin
1ba54e3b65 Merge pull request #727 from dbabokin/fails
Update fail_db on Linux - now use clang++3.4 as system compiler
2014-01-30 03:40:48 -08:00
Evghenii
92bb233668 some tining 2014-01-30 12:19:54 +01:00
Evghenii
420a8c986d preprocessing bugfix 2014-01-30 11:51:12 +01:00
Evghenii
900e71e5cc preprocessing bugfix 2014-01-30 11:50:52 +01:00
Evghenii
97971bef0c added performacne data 2014-01-30 11:45:23 +01:00
Evghenii
e93c2b88ba some fixes 2014-01-30 11:32:27 +01:00
Evghenii
eb57852f2e added cuda version 2014-01-30 11:32:23 +01:00
Evghenii
6457241e66 some tuning 2014-01-30 10:36:53 +01:00
Evghenii
bf245d3b98 +1 2014-01-30 10:33:49 +01:00
Evghenii
4e26a1b700 +1 2014-01-30 10:26:58 +01:00
Evghenii
2f44b81d4f first commit 2014-01-30 09:51:32 +01:00
Evghenii
d65e1b30ce fix for non-pow2 number of elements 2014-01-30 09:05:27 +01:00
Evghenii
4a17760a2d some common_gpu.mk improvements 2014-01-30 08:32:17 +01:00
Evghenii
23c6325cb6 somet tunning 2014-01-30 08:27:43 +01:00
Evghenii
0e4af8c057 added types for Key & Val 2014-01-29 20:32:40 +01:00
Evghenii
3bddfed542 some tuning 2014-01-29 20:21:52 +01:00
Evghenii
e6d7a493cc merge sort cmopiles & runs 2014-01-29 19:39:39 +01:00
Evghenii
73c6989951 +1 2014-01-29 19:22:14 +01:00
Evghenii
1be05cb03a +runs.. next step is tunning 2014-01-29 19:21:12 +01:00
Evghenii
ac4d847eac +1 2014-01-29 18:40:45 +01:00
Evghenii
3f641f487d +1 2014-01-29 16:01:23 +01:00
Evghenii
1c8517947b first commit for driver 2014-01-29 15:07:02 +01:00
Evghenii
4cb578bf60 compiles 2014-01-29 14:58:37 +01:00
Evghenii
6099492579 small improvement 2014-01-29 13:49:35 +01:00
Dmitry Babokin
8ff32bfbba Update fail_db on Linux - now use clang++3.4 as system compiler 2014-01-29 16:49:12 +04:00
Evghenii
d4b46b1295 +checkpoint 2014-01-29 13:47:39 +01:00
Evghenii
f6379dea82 added non-shared memory 2014-01-29 13:41:36 +01:00
Evghenii
784eb2d15b +1 2014-01-29 13:38:50 +01:00
Evghenii
36ee8911b4 optimization fix 2014-01-29 13:34:45 +01:00
Evghenii
97253354ac partiall workin cuda version 2014-01-29 13:20:41 +01:00
Evghenii
2f5eb9f6d3 added cuda 2014-01-29 12:15:19 +01:00
Evghenii
07c845be03 first commit 2014-01-29 12:15:13 +01:00
Evghenii
cb436d5b4a +1 2014-01-29 11:55:33 +01:00
Evghenii
2634ae65fd +1 2014-01-29 11:32:53 +01:00
Evghenii
922018ac2d tunning 2014-01-29 11:08:12 +01:00
Evghenii
e783afe87f +added a comment 2014-01-29 10:47:26 +01:00
Evghenii
fbcadf3d4d +sometuning 2014-01-29 10:40:48 +01:00
Evghenii
1f7b994232 first commit 2014-01-29 09:02:01 +01:00
Evghenii
c4ea5a3bfd first commit 2014-01-28 21:45:08 +01:00
Evghenii
651cc04ee1 +1 2014-01-28 21:29:20 +01:00
Evghenii
354e535b37 added different bit sorting 2014-01-28 21:16:27 +01:00
Evghenii
df51b41a45 +1 2014-01-28 20:07:34 +01:00
Evghenii
29bb129c9b +1 2014-01-28 19:44:23 +01:00
Evghenii
1e5476e573 +1 2014-01-28 17:19:34 +01:00
Evghenii
480dfc3879 +1 2014-01-28 17:04:03 +01:00
Evghenii
90a70945d6 +1 2014-01-28 16:55:59 +01:00
Evghenii
659573338c +1 2014-01-28 16:43:00 +01:00
Evghenii
5a6b650d8b restored nonptx atomic_*_local 2014-01-28 15:56:30 +01:00
Evghenii
585afa09e5 first commit alternative radix 2014-01-28 15:39:27 +01:00
Evghenii
f343e4cb0e working. next step tuning 2014-01-28 15:18:50 +01:00
Evghenii
1b993e167f tuning radixSort 2014-01-28 15:00:43 +01:00
Evghenii
d4dd945828 runs 2014-01-28 14:32:24 +01:00
Evghenii
88ffa96263 runs but incorrectly 2014-01-28 13:45:54 +01:00
Evghenii
2ae666dc7f radix sort compiles 2014-01-28 12:28:59 +01:00
Evghenii
8677890fc1 compiles 2014-01-28 12:20:12 +01:00
Evghenii
d9e8376209 other implementation 2014-01-27 16:26:55 +01:00
Evghenii
00677f73ec first commit 2014-01-27 15:21:35 +01:00
Evghenii
8289108d16 fix 2014-01-27 14:33:42 +01:00
Evghenii
711b4508c9 +fix for b8_t and b16_t 2014-01-27 14:31:22 +01:00
Evghenii
5885d47717 nbody first commit 2014-01-27 14:30:48 +01:00
Evghenii
adcf635c1f fist commit for radix sort 2014-01-27 14:24:50 +01:00
Evghenii
239ec10edf +1 2014-01-27 14:12:05 +01:00
Evghenii
3ae4a7e660 first commit bitonicSort 2014-01-27 14:02:42 +01:00
Evghenii
673d814a45 first commit for __do_print in ptx. 2014-01-27 11:56:21 +01:00
Evghenii
b7b5c9ad1d it is illegal to pass varying parapamter to a task function with nvptx target 2014-01-27 10:30:09 +01:00
Evghenii
1c2dbd6a27 a fix for .b0 ptx and some other code improvements 2014-01-27 08:51:05 +01:00
Dmitry Babokin
b1dbd3fcdf Merge pull request #724 from Duta/patch-1
Minor fix in a comment
2014-01-26 23:28:08 -08:00
Bertie Wheen
a78d75f185 Minor fix in a comment 2014-01-27 03:50:35 +00:00
Evghenii
52691fbb52 +some changes to ptxgen 2014-01-26 17:34:28 +01:00
Evghenii
4ecf30530a fixed for operator2 with nvptx target 2014-01-26 15:08:25 +01:00
Evghenii
a3b00fdcd6 added support for global atomics 2014-01-26 14:23:26 +01:00
Evghenii
a7d4a3f922 fix for __any 2014-01-26 13:15:13 +01:00
Dmitry Babokin
35395d7ed7 Merge pull request #723 from jbrodman/refcasting
Allow casting reference types like pointers. See Issue #721.
2014-01-25 10:14:25 -08:00
Evghenii
09ea9c9fd6 added function name mangling for operators 2014-01-25 18:06:12 +01:00
Evghenii
3e86dfe480 fix for __any 2014-01-25 17:09:11 +01:00
Evghenii
fcbdd93043 half/scan for 64 bit/clock/num_cores and other additions 2014-01-25 16:43:33 +01:00
Evghenii
805196a6a0 fixed doubles 2014-01-25 15:31:56 +01:00
Evghenii
bd34729217 added floor/ceil/round for float/double 2014-01-25 12:20:38 +01:00
Evghenii
6917c161c8 fixed reduce_equal 2014-01-25 11:39:37 +01:00
Evghenii
156aa4c139 partial support for reduce equal 2014-01-24 17:29:26 +01:00
Evghenii
ddb9b2fc47 added basic printing from ptx 2014-01-24 13:44:38 +01:00
jbrodman
00aeef9f1b Allow casting reference types like pointers. See Issue #721. 2014-01-24 03:37:27 -08:00
Evghenii
c76c916475 removed insert/extract_void 2014-01-24 12:37:18 +01:00
Evghenii
9090d8b128 added support for assert 2014-01-24 12:18:20 +01:00
Evghenii
5a8351d7ea added varying new/delete 2014-01-24 09:22:55 +01:00
Evghenii
be6ac0408a added compile-time constant __is_nvptx_traget that can be used with stdlib.ispc 2014-01-24 09:02:12 +01:00
Evghenii
1a07aed6aa foreach_unique will work now on atomic data types, not pointers yet. enum is not tested. All tests/foreach-unique-*.ispc pass 2014-01-24 08:30:50 +01:00
Evghenii
1cf1dab649 fixed foreach_unique and local_atomics 2014-01-23 21:57:20 +01:00
Evghenii
f0d3501dbd atomic globals now fail compilations. 2014-01-23 19:57:58 +01:00
Evghenii
da7a2c0c7f added emulation of "soa" data types via shared-memory 2014-01-23 16:17:06 +01:00
Evghenii
0091973bca packed_load and packed_store2 added 2014-01-23 14:34:00 +01:00
Evghenii
e734bbc0cc added new Errors 2014-01-23 13:11:02 +01:00
Evghenii
ce88e95032 added Error that "soa" data types are not support with nvptx target 2014-01-23 11:16:08 +01:00
Evghenii
e87e332d2f identified issue with __movmsk. 2014-01-23 10:45:10 +01:00
Evghenii
2e7609156a fixes for exclclusive_scan_and/or_i32 and shuffle2 and __movmsk 2014-01-23 10:24:44 +01:00
Evghenii
06313e0ec3 exclusive_scan_and is supported, but must be called outside if-statements. in pricniple other must do the same 2014-01-22 22:12:51 +01:00
Evghenii
08d78e6be5 partial exclusive_scan support 2014-01-22 21:55:22 +01:00
Evghenii
11964a8ce8 added broadcast 2014-01-22 20:46:41 +01:00
Evghenii
7d0aa7a336 added shift 2014-01-22 20:43:53 +01:00
Evghenii
39962623cc added shuffle 2014-01-22 19:18:45 +01:00
Evghenii
5cde87ce80 added reduce_add/min/max 2014-01-22 16:55:08 +01:00
Evghenii
e918fbf9f2 restored non-debug mode 2014-01-22 10:27:34 +01:00
Evghenii
e87ac449e6 added nvptx 2014-01-22 10:26:36 +01:00
Evghenii
6931f87fcd added support to run test via NVVM 2014-01-22 10:16:37 +01:00
Evghenii
98fc43d859 Merge branch 'master' into nvptx 2014-01-21 20:05:27 +01:00
Dmitry Babokin
624aecf72e Merge pull request #720 from ifilippov/err
Adding patch for LLVM. Fix for trunk(select)
2014-01-21 06:31:40 -08:00
Dmitry Babokin
6ab22c6e92 Merge pull request #719 from ifilippov/trunk
Supporting LLVM trunk
2014-01-21 06:30:28 -08:00
Dmitry Babokin
2b8403c7f7 Merge pull request #714 from ifilippov/master
Switching to 1d tasking in 'mandelbrot_tasks' benchmark
2014-01-21 06:28:02 -08:00
Evghenii
5376743281 added "const" before "static unfiform" in constant folding tests 2014-01-21 14:59:25 +01:00
Evghenii
215abab544 bugfix 2014-01-21 14:55:41 +01:00
Evghenii
bc99897fbb +fixed some example, found some bugs, and bugs in ptxas/cuda 2014-01-21 14:51:27 +01:00
Ilia Filippov
3b2fe1d8f6 adding patch for LLVM. Fix for trunk(select) 2014-01-21 17:26:05 +04:00
Ilia Filippov
aa31957d84 supporting LLVM trunk 2014-01-21 14:21:26 +04:00
Evghenii
5a773ed62a some cfor tests fixes for > 16 lanes 2014-01-20 16:42:33 +01:00
Evghenii
47c17305cf fixed double precision test call 2014-01-20 16:22:43 +01:00
Evghenii
c39a5e0de8 change extension to tmp files from .ispc to _ispc 2014-01-20 16:08:24 +01:00
Ilia Filippov
d87748c5dd switching to 1d tasking in 'mandelbrot_tasks' benchmark 2014-01-20 18:42:59 +04:00
Evghenii
63d3ac6679 Merge branch 'master' into nvptx 2014-01-20 13:47:24 +01:00
Evghenii
4581f10207 some changes 2014-01-20 13:46:49 +01:00
Vsevolod Livinskij
da02236b3a Scalar realization of no-vec functions was replaced from builtins to stdlib.ispc. 2014-01-20 16:06:34 +04:00
Dmitry Babokin
37e12045fb Merge pull request #717 from ifilippov/export_alias
Adding noalias attribute to uniform pointer parameters of export functions
2014-01-15 05:47:02 -08:00
Ilia Filippov
9552fc0724 adding noalias attribute to uniform pointer parameters of export function 2014-01-15 17:39:47 +04:00
Dmitry Babokin
0f56c11101 Merge pull request #716 from ifilippov/export_alias
Adding noalias attribute to uniform pointer parameters of export functions
2014-01-15 03:51:16 -08:00
Dmitry Babokin
6d59ef49f7 Merge pull request #713 from ifilippov/perf_correction
Support of LLVM trunk after changes in 198438, 199041, 199082 revisions.
2014-01-15 03:49:15 -08:00
Ilia Filippov
741dfaa2ea adding noalias attribute to uniform pointer parameters of export function 2014-01-15 15:15:42 +04:00
Ilia Filippov
5fa8bd3c78 changes for support LLVM trunk 2014-01-15 14:17:35 +04:00
Dmitry Babokin
86331022eb Merge pull request #687 from jbrodman/nomosoa
Add support for pointers to varying data in exported functions.
2014-01-13 09:03:45 -08:00
james.brodman
bdfcc615ea change if/for order. 2014-01-13 11:40:58 -05:00
james.brodman
05eac8631b typo fix. 2014-01-13 11:23:28 -05:00
Dmitry Babokin
7b537fd3b6 Merge pull request #711 from ifilippov/perf_correction
Returning 1d tasking in mandelbrot_tasks under ifdef
2014-01-13 04:50:58 -08:00
Ilia Filippov
1075b76612 returning 1d tasking in mandelbrot_tasks under ifdef 2014-01-13 16:41:30 +04:00
Evghenii
66faf8b4e4 some examples tuning 2014-01-13 09:52:34 +01:00
Evghenii
8f17468fa3 patch for stencil example 2014-01-13 08:50:21 +01:00
james.brodman
7726e87cd7 Add utility include file to return programCount 2014-01-10 16:23:48 -05:00
james.brodman
34b412bdf8 Add docs and example 2014-01-10 13:55:32 -05:00
Evghenii
8190869714 added basic support for nvptx target 2014-01-10 08:47:59 +01:00
Evghenii
84134678dc ISPC can emit LLVM PTX now 2014-01-10 07:53:09 +01:00
Evghenii
9389b6e3ef basic optimization path fails 2014-01-10 06:34:44 +01:00
evghenii
9053eed4b4 added basic optimization pass that promotes uniform into varying variables (not array) for nvptx target 2014-01-10 06:32:57 +01:00
Evghenii
b6b8855728 +1 2014-01-09 14:45:52 +01:00
Evghenii
2c043f67d0 +some uniform related improvements 2014-01-09 14:37:27 +01:00
Evghenii
1ed438dcdb cleaned up a bit code for treatment of non-static uniform variables. all stored in shared memory 2014-01-09 13:02:50 +01:00
Evghenii
db6f526b78 added experimental support for uniform variables, not only arrays. makes applications slower 2014-01-09 12:18:10 +01:00
Evghenii
07c818833a fix test_static_nvptx 2014-01-09 09:49:42 +01:00
Evghenii
f86de2be78 fix: laneIndex() must be varying 2014-01-09 09:41:57 +01:00
Evghenii
5f859e4885 added addrspace(3,4)->addrspace(0) convertion to ctx->GetElementPtrInst. Appears to work now. 2014-01-08 19:31:28 +01:00
Evghenii
cc53fa4c14 fixed addrspace(3) pointer offset computation during conversion. Now it is done in GetElementPtr 2014-01-08 18:43:33 +01:00
Evghenii
a401eb5a3b added DeviceCacheSettings 2014-01-08 15:38:02 +01:00
Evghenii
69c5e0aae7 convert pointers in function arguments to addrspace(3). still there is poroblem with shared memory. need to figure out which one .. 2014-01-08 15:12:32 +01:00
Evghenii
d3dc5e0df1 patched examples to work with uniform for nvptx. function calls with non-generic pointers fail. need fix 2014-01-08 14:25:21 +01:00
Evghenii
f98a8cc22f allow declaratoin of varying arrays in global scope. seems to work autmatically. needs more testing 2014-01-08 13:39:31 +01:00
Evghenii
de4d66c56f added addrspace(4)/constant memory for const uniform declarations 2014-01-08 13:27:24 +01:00
Evghenii
f011b3cb22 added Error when varying variable is defined in global scope with nvptx target 2014-01-08 11:46:36 +01:00
Evghenii
8347c766f0 added uniform memory test. 2014-01-08 11:16:51 +01:00
Evghenii
0a66f17897 experimental support for non-constant [non-static] uniform arrays mapped to addrspace(3) 2014-01-08 11:06:14 +01:00
Evghenii
f0b49995e5 fix offset compuations 2014-01-07 19:13:10 +01:00
Evghenii
21313e52b4 added local ptr correction of store instruction. change compilation to llvm ptx for tests 2014-01-07 18:54:23 +01:00
Evghenii
1303b07b72 added correction for local pointer computations 2014-01-07 18:45:22 +01:00
evghenii
7d37f7b634 added separate function that deal with local pointers 2014-01-07 18:29:44 +01:00
Evghenii
9b74e60185 added conversion from addrspace(3)/__local/__shared__ to addspace(0)/generic when PtrToInt is called 2014-01-07 14:29:55 +01:00
Evghenii
96597d3716 identified where to add shared memory. other problems showed up.. 2014-01-07 11:07:54 +01:00
evghenii
7e63cafc85 fixed common-1 2014-01-06 20:24:37 +01:00
Evghenii
ee9d1ad26f +testing works 2014-01-06 18:13:06 +01:00
Evghenii
277fd2f8eb +retire for today 2014-01-06 17:02:26 +01:00
Evghenii
f130bfd25c added compilation script for ptx 2014-01-06 16:54:19 +01:00
Evghenii
4562304a29 working on testing system for nvptx 2014-01-06 16:35:48 +01:00
Evghenii
3972d740a6 added mask for tasking function 2014-01-06 16:18:28 +01:00
Evghenii
7fbe2eba59 fixed sync 2014-01-06 15:51:14 +01:00
Evghenii
91d4ae46f6 sort --fails 2014-01-06 15:38:30 +01:00
Evghenii
79ebf07882 added .ptx->.o genreation routines. experimental 2014-01-06 14:38:27 +01:00
Evghenii
18a50aa679 further cleaning... 2014-01-06 14:34:28 +01:00
Evghenii
4d6a1d4cc0 remove object file 2014-01-06 14:11:37 +01:00
Evghenii
a8a2cf9bdb change nvptx64->nvptx 2014-01-06 14:04:27 +01:00
Evghenii
785b2f5d24 added examples 2014-01-06 14:00:36 +01:00
evghenii
c7ed130cce Merge branch 'master' into nvptx 2014-01-06 14:00:27 +01:00
Evghenii
bf8a16b0e1 merge with sm35 2014-01-06 13:53:02 +01:00
Evghenii
546f9cb409 MAJOR CHANGE--- STOP WITH THIS BRANCH-- 2014-01-06 13:51:02 +01:00
Evghenii
77113fbffd remove file 2014-01-06 12:48:32 +01:00
Evghenii
260b1ad887 added deferred example 2014-01-06 12:26:49 +01:00
Vsevolod Livinskij
97cc5b7f48 Added varying CFG and non-overflow part of the tests. 2014-01-06 15:24:09 +04:00
Evghenii
774e907ecf +1 2014-01-06 11:50:18 +01:00
Evghenii
27d957f0cb added taskCount 2014-01-06 11:01:38 +01:00
Evghenii
6878f83677 added options example 2014-01-06 11:01:26 +01:00
Evghenii
c75969d7c8 added LLVM_GPU and NVVM_GPU compiler rules 2014-01-06 10:27:25 +01:00
Evghenii
6500a98747 +checkpoint: fix .ispc kernels for compilation with llc->ptx 2014-01-06 08:52:41 +01:00
Evghenii
6c93793750 +checkpoint 2014-01-06 08:41:30 +01:00
Evghenii
e9058756b5 +checkpoint 2014-01-06 08:39:19 +01:00
Evghenii
15299571f9 use trunk and compile nvvm,llvm,cu files 2014-01-06 08:35:07 +01:00
Evghenii
b2368e243c +1 2014-01-06 08:15:50 +01:00
Evghenii
0d944ac87e use trunk llvm, use openmp in tasksys 2014-01-05 19:04:40 +01:00
Evghenii
730e0f7098 fixed helpers and makefile 2014-01-05 17:25:31 +01:00
Evghenii
6e2d59e279 added stencil 2014-01-05 17:24:38 +01:00
Evghenii
07049d6c76 volume rendering is working 2014-01-05 17:03:29 +01:00
Evghenii
eb97ef149a added gpu makefile, but doenst work just yet 2014-01-05 16:34:00 +01:00
Evghenii
2198d8ba51 +created overloaded new/delete 2014-01-05 16:20:56 +01:00
Evghenii
cd929b17a7 working on deferred 2014-01-05 15:30:18 +01:00
Evghenii
50e3b68586 +some changes. isolating problem 2014-01-05 15:20:51 +01:00
Evghenii
6c0d5a3334 rt compiles and runs both ispc and cuda version 2014-01-05 12:57:23 +01:00
Evghenii
4fa7f98074 fix 2014-01-05 12:24:20 +01:00
Evghenii
a7c2a7d4f9 +1 2014-01-05 12:23:48 +01:00
Evghenii
62deb82bf8 workingon rt example 2014-01-05 11:45:19 +01:00
Evghenii
10798cf2ae added cuda file 2014-01-05 11:11:37 +01:00
Evghenii
ac89a7d264 added cuda version 2014-01-05 11:10:50 +01:00
Evghenii
c429e930b2 +1 2014-01-05 11:04:10 +01:00
Evghenii
2f01af0595 +1 2014-01-05 11:01:19 +01:00
Evghenii
25ea3e4dc1 +some fix 2014-01-05 10:38:34 +01:00
Evghenii
478f4687b1 fixed helpers and added ao_bench example 2014-01-05 10:30:25 +01:00
Evghenii
89169d5506 +1 2014-01-05 10:12:45 +01:00
Evghenii
fd429e4fda added common_gpu makefile helper 2014-01-05 10:00:44 +01:00
Evghenii
034e6b9a0b added Makefile for gpu and cpu 2014-01-05 09:58:16 +01:00
evghenii
a784434e1f compiles 2014-01-04 23:06:49 +01:00
evghenii
be7f45b3bf dont remove bitcode 2014-01-04 22:42:42 +01:00
evghenii
916ccca3a1 +1 2014-01-04 22:39:40 +01:00
evghenii
c7bf732fdf added cuda helpers 2014-01-04 19:40:26 +01:00
evghenii
772d240782 fixes 2014-01-04 19:12:11 +01:00
Evghenii
155bc0534a working on ptx examples 2014-01-04 18:36:49 +01:00
evghenii
c4c30a416a fix function name mangling 2014-01-02 17:01:10 +01:00
Vsevolod Livinskij
323587f10f Scalar implementation and implementation for targets which don't have h/w instructions 2014-01-02 16:48:56 +04:00
evghenii
298b4b8b5b +1 2014-01-02 13:16:14 +01:00
evghenii
aa90a48872 +1 2014-01-01 13:41:52 +01:00
evghenii
71481150c7 in PTX mode, add ___export to exported function unmangled name 2014-01-01 10:35:25 +01:00
james.brodman
b2f6043181 Merge branch 'master' into nomosoa 2013-12-30 13:25:51 -05:00
james.brodman
387c3b317c remove get_programCount from stdlib 2013-12-27 16:27:08 -05:00
Dmitry Babokin
07c281738a Merge pull request #701 from ifilippov/alias_new
Adding Alias Analysis passes
2013-12-27 07:33:37 -08:00
Ilia Filippov
4ef38e1615 Adding some optimization passes between two Alias Analysis passes 2013-12-27 19:22:19 +04:00
Dmitry Babokin
904a575f93 Merge pull request #700 from ifilippov/alloy
Correction of checking tools in testing environment
2013-12-26 05:02:01 -08:00
evghenii
92a232b59b Merge branch 'sm35' of github.com:egaburov/ispc into sm35 2013-12-26 13:04:55 +01:00
evghenii
2621863d08 added test 2013-12-26 13:04:47 +01:00
Ilia Filippov
e34f0cc250 correction of checking tools in testing environment 2013-12-26 14:23:46 +04:00
Dmitry Babokin
5cfd773ec9 Adding Alias Analysis phases 2013-12-26 10:54:05 +04:00
Evghenii
c3165df156 +1 2013-12-25 23:33:48 +01:00
evghenii
2f1008e2c3 added driver fuction 2013-12-25 23:15:17 +01:00
Evghenii
2d8da306a1 merged with master 2013-12-25 21:32:34 +01:00
evghenii
1672ec80d0 +! 2013-12-25 20:12:36 +01:00
Dmitry Babokin
6c7e7325ec Merge pull request #699 from dbabokin/mac_build
Mac build fix on 10.9 #688
2013-12-24 06:41:34 -08:00
Dmitry Babokin
a69c4527a1 Bumping up 3.4 version from rc3 to final 2013-12-24 18:39:17 +04:00
Dmitry Babokin
34a588511f Checkout and install with clang standard library headers on MacOS 10.9 2013-12-24 18:38:25 +04:00
Dmitry Babokin
a615df9282 Merge pull request #698 from dbabokin/sext_and
Don't do sext+and optimization for generic targets
2013-12-23 04:34:14 -08:00
Dmitry Babokin
949984db18 Don't do sext+and optimization for generic targets 2013-12-23 16:31:33 +04:00
Vsevolod Livinskij
07c6f1714a Some fixes in function names and more tests was added. 2013-12-22 19:28:26 +04:00
jbrodman
c668165ca7 Merge pull request #695 from jbrodman/master
Add missing __cast_sext(__vec16_i32,__vec16_i1)
2013-12-20 13:51:41 -08:00
james.brodman
9f933b500b Add missing __cast_sext(__vec16_i32,__vec16_i1) 2013-12-20 16:45:27 -05:00
Dmitry Babokin
136f2d9549 Merge pull request #694 from dbabokin/br_161
Bumping ISPC version to 1.6.1dev
2013-12-19 10:33:13 -08:00
Dmitry Babokin
799e476b48 Bumping ISPC version to 1.6.1dev 2013-12-19 22:29:02 +04:00
Dmitry Babokin
952490a1b8 Merge pull request #693 from dbabokin/release_151
Release 1.6.0
2013-12-19 09:26:51 -08:00
Dmitry Babokin
bfb834fb24 Merge pull request #691 from dbabokin/docs
Documentation update for overloaded operators and packed_store_active2()
2013-12-19 09:26:37 -08:00
Dmitry Babokin
7faac4c241 Merge pull request #692 from ifilippov/embree
Changes in examples to make it work on Windows with VS2012 and some fixes for VS2010.
2013-12-19 09:24:37 -08:00
Dmitry Babokin
040605a83c Bumping up ispc version to 1.6.0 2013-12-19 21:17:42 +04:00
Dmitry Babokin
f936269a1e News update for 1.6.0 2013-12-19 21:14:22 +04:00
Ilia Filippov
7bf64bc490 changes in examples (windows) 2013-12-19 21:13:09 +04:00
Dmitry Babokin
5d51f8c7a7 Adding release notes for 1.6.0 2013-12-19 21:07:15 +04:00
Dmitry Babokin
f802164cce Fixing some typos in docs and adding operators to language description 2013-12-19 20:38:21 +04:00
Dmitry Babokin
e70936e654 Merge pull request #690 from ifilippov/generic
Adding __packed_store_active2 to generic targets
2013-12-19 06:14:47 -08:00
Ilia Filippov
15816eb07e adding __packed_store_active2 to generic targets 2013-12-19 17:50:18 +04:00
Dmitry Babokin
ca6b3dfa1c Vim syntax support for operators 2013-12-19 16:53:41 +04:00
Dmitry Babokin
bdeaf7e88c Documentation update for overloaded operators and packed_store_active2() 2013-12-19 16:53:21 +04:00
Dmitry Babokin
6198561d70 Merge pull request #689 from dbabokin/34_rc3
Bumping LLVM 3.4 from rc2 to rc3 in alloy.py
2013-12-19 01:08:27 -08:00
Dmitry Babokin
5d1cda9869 Bumping LLVM 3.4 from rc2 to rc3 in alloy.py 2013-12-19 13:05:46 +04:00
Evghenii
d4b8a0f2eb added packed_store_active2 declartion in IR 2013-12-18 11:40:03 +01:00
Evghenii
d77789d8fe +merged with master 2013-12-18 11:37:01 +01:00
james.brodman
4a4da858cf Clean up exported varyings and add support for querying program count from C/C++ 2013-12-17 15:55:59 -05:00
Dmitry Babokin
d666fc3f8f Merge pull request #686 from ifilippov/ttt
packed_store_active2() - tuned version of packed_store_active()
2013-12-17 09:23:39 -08:00
Ilia Filippov
473f1cb4d2 packed_store_active2 2013-12-17 21:14:29 +04:00
Dmitry Babokin
6d51987e67 Merge pull request #642 from egaburov/launch3d
concept of 3d tasking
2013-12-17 08:40:07 -08:00
Evghenii
59b989d243 fix for --target=sse4-i18x16 2013-12-17 16:06:20 +01:00
Evghenii
63ecf009ec fix compilation for Visual Studio 2013-12-17 15:06:29 +01:00
Dmitry Babokin
961116f4d5 Merge pull request #685 from dbabokin/34_alloy
Adding missing 3.4 handing in alloy.py (for alloy-build)
2013-12-17 05:14:33 -08:00
Dmitry Babokin
37f3c0926c Adding missing 3.4 handing in alloy.py (for alloy-build) 2013-12-17 17:11:57 +04:00
Dmitry Babokin
9a524fb838 Merge pull request #684 from ifilippov/complex_1
Improvement of varying structures gather
2013-12-16 06:55:31 -08:00
Ilia Filippov
b5dc78b06e adding support of shl instruction in lExtractConstantOffset optimization 2013-12-16 16:01:14 +04:00
evghenii
b506c92d21 restored-x2 2013-12-13 13:55:58 +01:00
Evghenii
cdb50beaa2 added nativeVector Alignement for nvptx64 2013-12-13 12:07:29 +01:00
Evghenii
20845eb117 Merge branch 'sm35' of github.com:egaburov/ispc into sm35 2013-12-13 11:56:56 +01:00
Evghenii
ddfe782151 merged 2013-12-13 11:56:43 +01:00
evghenii
c06ec92d0d added commas, added multi-dimensional tasking to mandelbrot_tasks & removed mandelbrot_task3d. Also adjusted documentaiton a bit 2013-12-13 11:49:11 +01:00
james.brodman
01432670fd Fix header file for multi-target output with pointers to varying in exported functions. 2013-12-12 13:27:23 -05:00
Dmitry Babokin
7bc1a41814 Merge pull request #683 from ifilippov/trunk
Match LLVM trunk - deleting isPrimitiveType()
2013-12-12 07:55:46 -08:00
Ilia Filippov
b19937c4dc deleting isPrimitiveType() 2013-12-12 19:25:02 +04:00
Dmitry Babokin
eef436b889 Merge pull request #682 from dbabokin/34_patch
Removing patch witch already presents in 3.4/rc2
2013-12-10 04:14:39 -08:00
Dmitry Babokin
e618b1e14b Removing patch witch already presents in 3.4/rc2 2013-12-10 16:11:48 +04:00
Dmitry Babokin
ea1473cbe8 Merge pull request #681 from dbabokin/34_rc2
Update of 3.4 branch to rc2
2013-12-10 03:28:29 -08:00
Dmitry Babokin
be21d190e2 Update of 3.4 branch to rc2 2013-12-10 15:26:30 +04:00
Dmitry Babokin
189459f2b9 Merge pull request #680 from ifilippov/perf_correction
Changing of examples
2013-12-10 03:17:41 -08:00
Ilia Filippov
98c56c214a changing of examples 2013-12-10 15:02:58 +04:00
evghenii
2951cad365 added description for multi-dimensional tasking 2013-12-09 13:10:26 +01:00
Vsevolod Livinskij
9a135c48d9 Functions name change 2013-12-09 00:20:52 +04:00
Vsevolod Livinskij
02dc2f460e Merge branch 'master' of https://github.com/ispc/ispc 2013-12-06 17:22:19 +04:00
Vsevolod Livinskij
ea94658411 Some saturation tests fixes 2013-12-06 17:20:37 +04:00
james.brodman
d10c0d9545 Add dynamic dispatch support for export functions with pointers to varying data as arguments 2013-12-05 17:47:58 -05:00
Dmitry Babokin
8766e44b95 Merge pull request #679 from ifilippov/perf_correction
Increase data for examples
2013-12-05 07:44:47 -08:00
Ilia Filippov
5012ba34b4 increase data for examples 2013-12-05 19:38:46 +04:00
Dmitry Babokin
7022163e87 Merge pull request #678 from dbabokin/win_patch_update
Adding r196261 patch to previous vzeroupper fix for 3.3
2013-12-04 13:18:23 -08:00
jbrodman
90cc7b25d4 Merge pull request #677 from dbabokin/alignment
Fixing --opt=force-aligned-memory for LLVM 3.3+
2013-12-04 13:16:59 -08:00
Dmitry Babokin
040ef0bc49 Adding r196261 patch to previous vzeroupper fix for 3.3 2013-12-05 00:50:55 +04:00
Vsevolod Livinskij
7642bdd73b Merge branch 'master' of https://github.com/ispc/ispc 2013-12-05 00:38:29 +04:00
Vsevolod Livinskij
65768c20ae Added tests for saturation and some fixes for generic and avx target 2013-12-05 00:34:14 +04:00
james.brodman
a448ccf20c Merge branch 'master' into nomosoa 2013-12-04 13:52:44 -05:00
jbrodman
d5b7c51e40 Merge pull request #676 from dbabokin/alloy_34
Adding LLVM 3.4 definition to alloy.py
2013-12-04 10:39:45 -08:00
Dmitry Babokin
2d2d14744b Fixing --opt=force-aligned-memory for LLVM 3.3+ 2013-12-04 19:00:02 +04:00
Dmitry Babokin
f61f1a2020 Fixing run_tests.py to understand LLVM 3.4 2013-12-03 19:52:11 +04:00
Dmitry Babokin
31ee2951ce Adding LLVM 3.4 definition to alloy.py 2013-12-03 19:40:30 +04:00
Vsevolod Livinskij
d46a54348a Merge branch 'master' of https://github.com/ispc/ispc 2013-12-02 23:30:44 +04:00
jbrodman
4a53ed1201 Merge pull request #674 from dbabokin/vs2013
Adopting Windows VS project files to work with VS2012/2013
2013-12-02 08:49:12 -08:00
jbrodman
d2e57cfcac Merge pull request #671 from dbabokin/select
Select optimization
2013-12-02 08:43:30 -08:00
Dmitry Babokin
e172d7f1a9 Update build messages (Windows) 2013-12-01 16:18:06 +04:00
Dmitry Babokin
3bc4788acb Fix errors with VS2013 2013-12-01 04:40:23 +04:00
Vsevolod Livinskij
4faff1a63c structural change 2013-11-30 10:48:18 +04:00
Vsevolod Livinskij
4c330bc38b Add code generation of saturation 2013-11-29 18:40:04 +04:00
Dmitry Babokin
67d1985550 Merge pull request #672 from ifilippov/master
Fix for LLVM trunk (3.5)
2013-11-29 02:47:55 -08:00
Ilia Filippov
b94b89ba68 support of LLVM trunk 2013-11-29 14:24:21 +04:00
Vsevolod Livinskij
bec6662338 Some cganges for avx1 and avx1.1 in saturation 2013-11-29 03:45:25 +04:00
Vsevolod Livinskij
42c148bf75 Changes for sse2 and sse4 in saturation 2013-11-29 03:33:40 +04:00
Dmitry Babokin
d6dfbcd743 Run alloy -j<num cores> by default 2013-11-28 21:44:12 +04:00
Dmitry Babokin
be813ea0a2 Select optimization for LLVM 3.3 2013-11-28 21:43:05 +04:00
Dmitry Babokin
c751e44c6c Merge pull request #670 from dbabokin/sext_patch
LLVM patches for sse4-i16x8 and sse4-i8x16 targets
2013-11-28 01:56:09 -08:00
Dmitry Babokin
eaa483d6e4 fail_db update (Linux) 2013-11-28 13:52:05 +04:00
Dmitry Babokin
672d43a6cf Adding patch for sse4-i16x8 and sse4-i8x16 targets 2013-11-28 13:52:04 +04:00
Dmitry Babokin
9b19f0aaba Merge pull request #669 from dbabokin/fail_db
fail_db.txt update with LLVM 3.5 (trunk) results on Linux
2013-11-26 15:28:00 -08:00
Dmitry Babokin
218d2892e8 fail_db.txt update with LLVM 3.5 (trunk) results on Linux 2013-11-27 03:24:17 +04:00
Vsevolod Livinskij
35a4d1b3a2 Add some AVX2 intrinsics 2013-11-27 00:55:57 +04:00
Dmitry Babokin
f6fa63bdef Merge pull request #668 from ifilippov/tests
Changing examples on windows
2013-11-26 07:15:50 -08:00
Ilia Filippov
f3ff1fcbeb supporting targets in perf windows 2013-11-26 19:12:02 +04:00
Ilia Filippov
935800d7f6 making common.props 2013-11-26 18:58:49 +04:00
Dmitry Babokin
726fc93634 Merge pull request #667 from ifilippov/warning
Changing error to warning: mismatch in size/layout of global variable
2013-11-26 05:23:43 -08:00
Ilia Filippov
8b972f2ed6 Changing error to warning: mismatch in size/layout of global variable 2013-11-26 17:08:06 +04:00
evghenii
ef9e212eec Merge remote-tracking branch 'upstream/master' into nvptx 2013-11-26 13:24:43 +01:00
evghenii
18d6986c22 Merge remote-tracking branch 'upstream/master' into sm35 2013-11-26 13:24:32 +01:00
Vsevolod Livinskij
19f73b2ede uniform signed/unsigned int8/16 2013-11-25 19:16:02 +04:00
Dmitry Babokin
b4102a4510 Merge pull request #665 from ifilippov/master
fix of perf.py
2013-11-22 06:36:22 -08:00
Evghenii
e192e8ea9e +change static naming in IR to make it compatible with NVVM 2013-11-22 14:43:14 +01:00
Ilia Filippov
18f90e6339 fix of perf.py 2013-11-22 17:06:19 +04:00
Evghenii
406aad78fe first support for integration with NVCC/CUDART API 2013-11-22 13:06:51 +01:00
Evghenii
280f3515b5 +1 2013-11-22 11:23:34 +01:00
Evghenii
522daa26a6 added PTX generator from NVVM 2013-11-22 08:18:59 +01:00
evghenii
bb46b561fd Merged with upstream/master 2013-11-22 08:13:16 +01:00
Evghenii
828a5d45cd Merge remote-tracking branch 'upstream/master' into nvptx 2013-11-22 08:10:08 +01:00
Dmitry Babokin
0f7ac1cc90 Merge pull request #664 from ifilippov/3_5
Support for LLVM 3.5
2013-11-21 07:35:35 -08:00
Dmitry Babokin
019ff4709c Merge pull request #663 from ifilippov/perf
Adding multiple targets in perf.py
2013-11-21 07:11:35 -08:00
Ilia Filippov
3fd9d5a025 support of LLVM 3.5 2013-11-21 19:09:43 +04:00
Ilia Filippov
924858509d checking targets in perf.py 2013-11-21 19:05:35 +04:00
Evghenii
6f200d310f fixed to work with LLVM 3.2 2013-11-21 11:03:03 +01:00
Evghenii
321b087039 added drviapierrorstrong 2013-11-21 09:22:07 +01:00
jbrodman
357f115f11 Merge pull request #661 from dbabokin/task_diagnostics
Fix task system dignostic to report real reason of the symaphore allocat...
2013-11-20 09:16:51 -08:00
Dmitry Babokin
5531586c35 Fix for existing semaphore problem 2013-11-20 19:19:15 +04:00
Dmitry Babokin
40da411fa5 Fix task system dignostic to report real reason of the symaphore allocation fail 2013-11-20 17:22:50 +04:00
Dmitry Babokin
676c367db1 Merge pull request #660 from dbabokin/fail_db
fail_db.txt update on Linux with new passes
2013-11-19 09:20:30 -08:00
Dmitry Babokin
5722d17924 fail_db.txt update on Linux with new passes 2013-11-19 21:17:54 +04:00
Ilia Filippov
97298eb112 multiple targets in perf.py 2013-11-19 17:37:52 +04:00
Evghenii
b4df3663a9 Merge branch 'sm35_foreach' of github.com:egaburov/ispc into sm35_foreach 2013-11-19 10:30:24 +01:00
Evghenii
86567ba96f +1 2013-11-19 09:11:20 +01:00
evghenii
0cf7043f5f added Makefile_knc 2013-11-18 22:09:40 +01:00
evghenii
0df23312ac +1 2013-11-18 22:00:01 +01:00
evghenii
e8fc32a7dc Merge branch 'sm35_foreach' of github.com:egaburov/ispc into sm35_foreach 2013-11-18 21:59:09 +01:00
evghenii
e4ee1692fb added KNC Makfile 2013-11-18 21:58:56 +01:00
Evghenii
4bc8c79bd3 fixed cuda kernel 2013-11-18 13:28:12 +01:00
Evghenii
915dc4be7f +1 2013-11-18 13:24:01 +01:00
Evghenii
cf2116e167 +1 2013-11-18 13:16:30 +01:00
Evghenii
64762c5acd +1 2013-11-18 13:15:05 +01:00
Evghenii
4f9b8ebc73 +1 2013-11-18 13:11:57 +01:00
Evghenii
ee61a265f4 fixed kernel 2013-11-18 13:01:36 +01:00
Evghenii
db4abfe198 +1 2013-11-18 12:58:30 +01:00
Evghenii
4b7dbbf43b added cuda kernel 2013-11-18 12:46:30 +01:00
Evghenii
db13639460 change # options 2013-11-18 12:42:23 +01:00
evghenii
1cdb4c21c2 makefile fix 2013-11-18 12:25:36 +01:00
evghenii
008d9371b1 added Makefile for KNC 2013-11-18 12:19:45 +01:00
evghenii
6640dd0a6c added Makefile for KNC 2013-11-18 12:19:33 +01:00
Evghenii
589538bf39 added stencil code 2013-11-18 12:04:00 +01:00
Evghenii
8d4dd13750 changes 2013-11-18 11:58:19 +01:00
Dmitry Babokin
754a3208f2 Merge pull request #659 from ifilippov/master
patch for LLVM 3.3 and test correction at avx2
2013-11-18 02:46:15 -08:00
Evghenii
4d278a654e +1 2013-11-18 10:58:24 +01:00
Evghenii
9eaec6d58a changed avx->avx-x2 2013-11-18 10:56:22 +01:00
Evghenii
616cd3316c +1 2013-11-18 10:54:29 +01:00
Ilia Filippov
4579d339ea patch for LLVM 3.3 and test correction at avx2 2013-11-18 13:53:21 +04:00
Evghenii
36f584d341 +1 2013-11-18 10:51:54 +01:00
Evghenii
6387bbf193 Merge branch 'sm35_foreach' of github.com:egaburov/ispc into sm35_foreach 2013-11-18 10:45:02 +01:00
Evghenii
de177e7529 +using kernels1 in deferred 2013-11-18 10:44:52 +01:00
evghenii
f17fdfdbef added KNC makefile 2013-11-18 10:41:52 +01:00
Evghenii
45d6bf196a change timings to ms 2013-11-18 10:37:17 +01:00
evghenii
5e2fff91f8 +1 2013-11-18 10:33:04 +01:00
evghenii
e13f8dd359 +1 2013-11-18 10:30:18 +01:00
Evghenii
0897f3c1c4 Merge branch 'sm35_foreach' of github.com:egaburov/ispc into sm35_foreach 2013-11-18 10:17:02 +01:00
Evghenii
a82368956e added ms timer 2013-11-18 10:16:24 +01:00
evghenii
6841d8ba9a added Makefile_mic 2013-11-18 09:59:41 +01:00
Evghenii
927da8e861 change register allocation, makes code much faster 2013-11-18 09:46:51 +01:00
Evghenii
3c220a2813 loop unrolling, maks code 10x faster 2013-11-18 09:37:25 +01:00
Evghenii
5a01819fdc unrolled loops in binomial options cuda version 2013-11-18 09:16:09 +01:00
Dmitry Babokin
4977933d81 Merge pull request #658 from dbabokin/fail_db
fail_db.txt update on Linux
2013-11-17 15:42:51 -08:00
Dmitry Babokin
953e467a85 fail_db.txt update on Linux 2013-11-18 03:39:09 +04:00
jbrodman
131ab07c2b Merge pull request #657 from dbabokin/avx-i32x4
avx1-i32x4 target
2013-11-15 16:00:57 -08:00
Evghenii
1d94667a15 +speed-up binomial options via use of shared memory 2013-11-15 20:46:55 +01:00
Evghenii
4421fb7e19 +1 2013-11-15 20:21:31 +01:00
Dmitry Babokin
131ff50333 Adding avx1-i32x4 to alloy.py testing 2013-11-15 22:09:13 +04:00
Evghenii
bc8b5b3896 added cuda versino 2013-11-15 18:09:03 +01:00
Evghenii
a2d12517e7 +options 2013-11-15 17:59:04 +01:00
Evghenii
95d6647dce +1 2013-11-15 17:32:59 +01:00
Evghenii
3454f51d2c added some ptx options 2013-11-15 17:23:22 +01:00
Evghenii
6b65f6d9f4 +1 2013-11-15 16:21:09 +01:00
Evghenii
c93e71698e restored intrinsics and added tuning options to ptxgen 2013-11-15 15:04:04 +01:00
Evghenii
f9d2ede83c +1 2013-11-14 23:15:51 +01:00
Evghenii
53bf4573f0 +fix 2013-11-14 23:05:44 +01:00
Evghenii
86652738c0 working on rt 2013-11-14 22:54:37 +01:00
Evghenii
294fb039fe some tuning, adding cuda kernels 2013-11-14 22:33:58 +01:00
Evghenii
f12826bac5 +added approx rcp/rsqrt/rtz with ftz=true 2013-11-14 22:17:57 +01:00
Evghenii
2c8afde6d9 chaning MF 2013-11-14 21:38:25 +01:00
Evghenii
1445202e0e identified bug due to llvm-3.4 2013-11-14 21:18:25 +01:00
Evghenii
1b940fd41e +1 2013-11-14 20:19:59 +01:00
Evghenii
f1fc3bdfba added nvptx declaration to other target & fixed nvptx64 recognition 2013-11-14 20:12:58 +01:00
Evghenii
7aa37b19a9 added some more macros as quick hack... 2013-11-14 20:04:05 +01:00
Evghenii
967a49dd66 +1 2013-11-14 19:54:18 +01:00
Evghenii
25df23fed3 workaround for programIndex via preprocessor 2013-11-14 19:48:50 +01:00
Evghenii
e162d5a99d programIndex still not working, found where change is needed... 2013-11-14 19:46:08 +01:00
Evghenii
918ca339b6 now programIndex returns laneIdx = %tid.x & (%warpsize-1) & programCount returns 32 2013-11-14 19:27:52 +01:00
Evghenii
8bb8f0eda4 +1 2013-11-14 17:04:50 +01:00
Evghenii
be2cc8f946 restored foreach in sort 2013-11-14 16:51:59 +01:00
Evghenii
599ada8354 added deferred shading foreach_tile 2013-11-14 16:49:47 +01:00
Evghenii
83b9cc5c0a +1 2013-11-14 16:44:09 +01:00
Evghenii
af75afeb7a foreach[_tiled] seems to work now 2013-11-14 16:29:40 +01:00
Dmitry Babokin
42e181112a Add avx1-i32x4 to the list of supported targets 2013-11-14 16:21:30 +04:00
Dmitry Babokin
801f78f8a8 Rebuild *.ispc when necessary 2013-11-14 15:37:11 +04:00
Dmitry Babokin
e100040f28 Fix bug with fail when --target=avx1.1-i32x8,avx2-i32x8 - avx11 is not a valid target anymore, need more complete string 2013-11-14 15:37:11 +04:00
Dmitry Babokin
b8a39a1b26 minor improvements in examples/common.mk 2013-11-14 15:37:10 +04:00
Dmitry Babokin
8f768633ad Make perf.py changes work as part of alloy.py 2013-11-14 15:37:10 +04:00
Dmitry Babokin
65ea6fd48a Reasoning to use sse4 bitcode file 2013-11-14 15:34:30 +04:00
Dmitry Babokin
d2c7b356cc Ordering functions in target-[avx|sse2].ll to be in the same order. No real changes, except adding a few alwaysinline in SSE4 target 2013-11-14 15:34:30 +04:00
Dmitry Babokin
af58955140 target-[sse4|avx]_common.ll are twin brothers, which diffes only cosmetically. This commit makes them diffable. No real changes, except adding alwaysinline to sse version iof __max_uniform_int32/__max_uniform_uint32 2013-11-14 15:34:30 +04:00
Dmitry Babokin
ffc9a33933 avx1-i32x4 implementation as sse4-i32x4 with avx target-feature flag 2013-11-14 15:34:30 +04:00
Dmitry Babokin
fbab9874f6 perf.py - target switch was added 2013-11-14 15:34:30 +04:00
Dmitry Babokin
017e7890f7 Examples makefiles to support setting single target via ISPC_IA_TARGETS 2013-11-14 15:34:30 +04:00
Evghenii
48644813d4 stmt.cpp forking on foreach 2013-11-14 11:30:22 +01:00
evghenii
c81821ed28 +1 2013-11-13 21:17:21 +01:00
Evghenii
42cfe97427 using now cuda_ispc.h 2013-11-13 21:06:40 +01:00
Evghenii
09a2c12ea0 added cuda_ispc.h & cuda eror_strings 2013-11-13 21:04:59 +01:00
Evghenii
a0f6f264f6 fixed problem with new/delete and added Mel/sec counter 2013-11-13 20:34:01 +01:00
Evghenii
6f9cea5b58 removed binary 2013-11-13 19:43:45 +01:00
Evghenii
dd4ac42491 added print m 2013-11-13 19:43:32 +01:00
Evghenii
01df6ed4a9 added ispc timers w/o task 2013-11-13 19:13:04 +01:00
Evghenii
e71259006c +1 2013-11-13 19:06:02 +01:00
Evghenii
0f161b500f +1 2013-11-13 19:02:45 +01:00
Evghenii
e442139c39 runs, next check correctness 2013-11-13 18:15:52 +01:00
Evghenii
8b0f871c06 +1 2013-11-13 17:23:23 +01:00
Evghenii
61fab0340c working on sort 2013-11-13 17:07:55 +01:00
Evghenii
525eacd035 +1 2013-11-13 16:32:56 +01:00
Evghenii
cddddfd255 +1 2013-11-13 16:23:24 +01:00
Evghenii
780e9f31fe some tuning 2013-11-13 16:23:05 +01:00
Evghenii
c0b54aa58c added Makefile_gpu 2013-11-13 16:20:51 +01:00
Evghenii
c0c1cc1ba7 +added Makefile and some fixes 2013-11-13 14:16:48 +01:00
Evghenii
dededd1929 cleaned 2013-11-13 13:56:45 +01:00
Evghenii
6cd8a8f895 cleaned-up 2013-11-13 13:47:53 +01:00
Evghenii
d3ade0654e added Makefile 2013-11-13 13:45:24 +01:00
Evghenii
2dd7128db5 added Makefile 2013-11-13 13:40:08 +01:00
Evghenii
1f13a236bf small tuning 2013-11-13 13:03:26 +01:00
Evghenii
ca1dbc3d3b fixed cuda kernel 2013-11-13 12:54:52 +01:00
Evghenii
74db8cbab3 +1 2013-11-13 12:12:09 +01:00
Evghenii
62bc39e600 +CDP works with deferred shading 2013-11-13 11:57:37 +01:00
Evghenii
268be7f0b5 fixed ISPCSync functionality 2013-11-13 11:19:10 +01:00
Evghenii
55bf0d23c2 resotred non-ptx functionality 2013-11-13 11:08:58 +01:00
Evghenii
f433aa3ad5 CDP works now 2013-11-13 10:43:52 +01:00
Evghenii
f587e0a459 handwired CDP launch 2013-11-12 21:20:10 +01:00
Evghenii
76bfcc29c2 ao1.ispc is not functional just yet :S 2013-11-12 19:30:41 +01:00
Evghenii
1d91a626f2 ISPC sync is not added 2013-11-12 17:02:31 +01:00
Evghenii
dbde936c3c bugfix in inlined ptx, now NVCC also compiles the ptx 2013-11-12 16:47:47 +01:00
Evghenii
cf679187b1 added CDP calls into IR, next step ... check :) 2013-11-12 16:39:22 +01:00
Evghenii
fd17ad236a export functions are now also generated... next add proper CDP calls.. 2013-11-12 14:05:12 +01:00
Evghenii
dbb96c1885 need to fix launch code 2013-11-12 13:41:03 +01:00
Evghenii
4cd7e10ad3 reversed to original changes. Here is the plan to use CDP and genarate only device code with host wrapper.. 2013-11-12 12:51:56 +01:00
Evghenii
3fd76d59ea +1 2013-11-12 11:32:42 +01:00
Evghenii
f445a470df handwired CDP launch 2013-11-12 11:25:43 +01:00
Evghenii
4e5299a9bf added CDP 2013-11-12 11:19:23 +01:00
Evghenii
a6afef9f3f +added some more mem management stuff 2013-11-12 08:31:45 +01:00
Evghenii
6a1fb8ea31 some kernel tuning 2013-11-11 14:24:13 +01:00
Evghenii
f2c66dc4c3 added any/none/all for bool 2013-11-11 12:59:40 +01:00
Evghenii
a91c8e15e2 added reduce_min/max_float, packed_store_active for CUDA, and now kerenls1.ispc just work :) 2013-11-11 12:33:39 +01:00
Evghenii
9c7a842163 ptx has support for half-float 2013-11-11 12:25:47 +01:00
Evghenii
3dd6173a65 added packed_store_active that can be called with active flag 2013-11-11 12:25:15 +01:00
Evghenii
e9bc2b7b54 added uniform_new/uniform_delete in util_ptx.m4 and __shfl intrinsics 2013-11-11 09:18:15 +01:00
Evghenii
38947ab71b made CU version working 2013-11-10 20:10:37 +01:00
Evghenii
8a7801264a added tuned code 2013-11-10 16:02:10 +01:00
Evghenii
66edc180be working on aobench 2013-11-10 14:29:53 +01:00
Evghenii
17809992d7 working on ao 2013-11-10 14:26:00 +01:00
evghenii
c10033211b removed 2013-11-10 14:17:59 +01:00
Evghenii
7d4ea1b6f0 added wc-timer 2013-11-10 14:15:16 +01:00
Evghenii
0dfe823c32 added kernels that use shared memory 2013-11-10 14:06:06 +01:00
Evghenii
bef275f62c amadded drv_api_error_String.h 2013-11-10 14:05:34 +01:00
evghenii
edb4c57e3d +added host code as well and restored original main.cpp 2013-11-10 14:07:15 +01:00
evghenii
c1b3face8f change time from sec to ms 2013-11-10 14:04:01 +01:00
Evghenii
9d23c10475 deffered_shading probilem identified. need solution 2013-11-10 13:59:41 +01:00
Evghenii
78d509dba5 working on deferred shading 2013-11-10 12:10:10 +01:00
Evghenii
1a37135f98 +1 2013-11-09 21:23:34 +01:00
Evghenii
dbd0581cb3 +added CUDA code 2013-11-09 21:05:28 +01:00
Evghenii
946530019a Merge branch 'nvptx' of github.com:egaburov/ispc into nvptx 2013-11-09 20:56:55 +01:00
Evghenii
8f6f6d10e7 +some tuning 2013-11-09 20:56:48 +01:00
evghenii
3a549e5c2f xeonphi tests added for rt 2013-11-09 19:26:19 +01:00
evghenii
dc7015c5f2 added wc-timer for host code 2013-11-09 19:08:08 +01:00
Evghenii
356e9c6810 +fixed rt.cpp to compile with nvvm 2013-11-09 19:02:14 +01:00
egaburov
d0ddec469a Merge branch 'master' into nvptx 2013-11-08 15:42:58 +01:00
evghenii
87de3a2d06 added wc-timer for host code 2013-11-08 15:39:57 +01:00
Evghenii
eb8e1a2160 +added wc-timer 2013-11-08 15:27:51 +01:00
Evghenii
ce5f8cd46f replaced with fresh examples 2013-11-08 14:17:26 +01:00
Evghenii
b2f62d51b0 tuned stencil 2013-11-08 14:15:27 +01:00
Evghenii
b3c68af40a added volume rendering to run on GPU 2013-11-08 13:57:16 +01:00
Evghenii
348100ba42 remove stencil.cubin 2013-11-08 10:01:15 +01:00
Evghenii
426afc7377 added workable .cu files for stencil & mandelbrot 2013-11-08 10:00:49 +01:00
jbrodman
b04caabf39 Merge pull request #656 from egaburov/knc-fix
restored ISPC_FORCE_ALIGNED_MEMORY
2013-11-07 17:36:36 -08:00
evghenii
32cfdd52d3 Merge branch 'master' into knc-fix 2013-11-05 15:46:54 +01:00
evghenii
015af03bdc changed back to #define ISPC_FORCE_ALIGNED_MEMORY aligned_ld/st #else unaligned ld/st #endif. However load<64>/store<64> will still be unaliged w/o this define because of fails related to the issue #632 2013-11-05 15:41:14 +01:00
Dmitry Babokin
99946ae0e6 Merge pull request #655 from ifilippov/windows
Windows support in alloy
2013-11-05 05:51:37 -08:00
Ilia Filippov
a910bfb539 Windows support 2013-11-05 16:31:01 +04:00
Evghenii
cb7cbec0d5 added cuda examples 2013-11-04 11:44:49 +01:00
Evghenii
cb6614da42 fixed the code that non-task code is also emitted for the host 2013-11-04 11:33:37 +01:00
Evghenii
6fae459847 a+1 2013-11-04 10:22:05 +01:00
Dmitry Babokin
bc7c54e5c8 Merge pull request #652 from jbrodman/ptrarithlvalues
Fix a case for a missing lvalue for pointer arithmetic
2013-11-01 10:39:27 -07:00
Dmitry Babokin
ab835d8086 Merge pull request #653 from jbrodman/master
Fix for Visual Studio compilation error.
2013-11-01 10:39:13 -07:00
james.brodman
0f7050d3aa More stds compliant. VS doesn't like non constant length local arrays. 2013-10-31 19:51:13 -04:00
Evghenii
2cef101022 now emits host object file with ptx in it... next step .. testing 2013-10-31 18:05:04 +01:00
Evghenii
0a069f7de2 added comment 2013-10-31 16:06:44 +01:00
Evghenii
dcf9c280ee some cleaning 2013-10-31 16:05:06 +01:00
Evghenii
a2fd124997 forced module name & ptx string to be generaetd nly once 2013-10-31 16:04:30 +01:00
Evghenii
63917f8cc2 now generates CUDALaunch call. Few tweaks are still necesary 2013-10-31 16:01:34 +01:00
Evghenii
e7ddb9e642 now adds function&module name. next step adding pointer to parameter list 2013-10-30 22:41:01 +01:00
james.brodman
ec17082864 Add unittest. 2013-10-30 17:21:10 -04:00
james.brodman
9ce6fbe1fa Support using pointer arithmetic as lvalue 2013-10-30 17:07:26 -04:00
Evghenii
8db3d25844 moved PtxString to Globals 2013-10-30 21:05:22 +01:00
Evghenii
f9ec1a0097 .. work in programm to embed PTX into host code .. 2013-10-30 16:47:30 +01:00
jbrodman
61c33969a2 Merge pull request #651 from jbrodman/shiftfix
Fix logic that looks for shift builtins.
2013-10-29 11:08:52 -07:00
james.brodman
85eb4cf0d6 Fix logic that looks for shift builtins. 2013-10-29 14:02:32 -04:00
Evghenii
47cc470bf6 change nativeVectorWidth from 1 -> 32 for nvptx64 2013-10-29 16:07:12 +01:00
egaburov
60881499dc Merge branch 'nvptx' of github.com:egaburov/ispc into nvptx 2013-10-29 15:25:14 +01:00
egaburov
f19cf9274e Merge remote-tracking branch 'upstream/master' into nvptx 2013-10-29 15:24:40 +01:00
Evghenii
ed9bca0e12 add __soa_to_aos*_float1 and __aos_to_soa*_float1 builtins 2013-10-29 15:06:08 +01:00
Evghenii
f15cdc03e3 nvptx64 generates 2 targets: task and normal function for nvptx64 and export for avx only 2013-10-29 14:46:51 +01:00
Evghenii
b31fc6f66d now can generate both targets for npvtx64. m_isPTX is set true, to distuish when to either skip or exlcusive euse export 2013-10-29 14:17:11 +01:00
Evghenii
8baef6daa3 +1 2013-10-29 14:01:53 +01:00
Evghenii
ac700d4860 checkpoint 2013-10-29 13:36:31 +01:00
Evghenii
b2baa35c3d added correct datalayout for nvptx64 2013-10-29 11:34:01 +01:00
Evghenii
b50d3944ea allow easy switch between llvm 2013-10-29 10:22:07 +01:00
Evghenii
f115a32073 fix llvm 3.2 compilation 2013-10-29 10:21:56 +01:00
Evghenii
57aefdf830 accepts ptx extension when target is nvptx64 2013-10-29 10:21:48 +01:00
Dmitry Babokin
0d48ace15e Merge pull request #649 from dbabokin/dispatch2
Typo fix
2013-10-28 14:35:55 -07:00
Dmitry Babokin
362ee06b9f Typo fix 2013-10-29 01:35:26 +04:00
jbrodman
9004f090c5 Merge pull request #646 from dbabokin/docs
Docs fix in memory management section
2013-10-28 14:26:17 -07:00
jbrodman
6948120094 Merge pull request #647 from dbabokin/dispatch2
CPU check fix
2013-10-28 14:25:41 -07:00
Dmitry Babokin
6585a925be Merge pull request #641 from jbrodman/stdlibshift
Add a "shift" operator to the stdlib.
2013-10-28 14:18:31 -07:00
james.brodman
e682b19eda Remove zero initialization for __vec4_i32 2013-10-28 17:13:07 -04:00
james.brodman
8ee3178166 Add Performance Warning 2013-10-28 16:51:02 -04:00
james.brodman
09a6e37154 Source cleanup. 2013-10-28 16:37:33 -04:00
james.brodman
1b8e745ffe remove condition. Don't use gcc 4.7 for tests. 2013-10-28 16:36:59 -04:00
james.brodman
6a1952d1f9 Merge branch 'stdlibshift' of https://github.com/jbrodman/ispc into stdlibshift 2013-10-28 16:16:30 -04:00
james.brodman
9ba7b96825 Make the new optimization play nicely with the other.s 2013-10-28 16:14:31 -04:00
Dmitry Babokin
ef5d2dc043 Merge pull request #648 from dbabokin/fail_db
fail_db update on all platforms
2013-10-28 12:08:30 -07:00
Dmitry Babokin
1f0f852fda Standalone checker for detecting the best ISA supported on the host 2013-10-28 22:54:14 +04:00
Dmitry Babokin
a166eb7ea1 Check AVX OS support in host cpu check code 2013-10-28 22:41:23 +04:00
james.brodman
02681d531e Minor tweak for interface. 2013-10-28 12:56:43 -04:00
james.brodman
641d882ea6 Add shift support for knc targets. This is not optimized. 2013-10-28 12:43:42 -04:00
james.brodman
1e80b3b0d7 Add shift support for generic-16 target. 2013-10-28 12:20:32 -04:00
Evghenii
ff98271a43 using mask i1 for nvptx64 2013-10-28 17:03:00 +01:00
Evghenii
500ad7fb51 using mask i1 for nvptx64 2013-10-28 17:01:03 +01:00
Evghenii
4f486333ed now nvptx allows extern "C" task void, which is emits a kernel that should (?) be callable by driver API from external code 2013-10-28 16:47:40 +01:00
Evghenii
9a677b62ab Merge branch 'launch3d' into nvptx 2013-10-28 14:28:21 +01:00
Evghenii
68ced6ce46 automatically adds -D__NVPTX__ when nvptx64 target is chosen 2013-10-28 14:08:32 +01:00
Evghenii
1bd5360d3b added now NVPTX64 automatically emits unmasked extern "C" for task function with kernel attributes 2013-10-28 13:58:01 +01:00
Evghenii
a7aa1ac1cf now nvptx allows extern "C" task void, which is emits a kernel that should (?) be callable by driver API from external code 2013-10-28 12:57:09 +01:00
Evghenii
ae23320417 added metadata for tasks with nvptx64 target. not tasks are kernel callable from host 2013-10-28 12:10:40 +01:00
Evghenii
b68a751f4e generating proper tasking function for nvptx 2013-10-28 11:36:08 +01:00
Evghenii
8391d05697 added blockIndex computations 2013-10-28 10:18:30 +01:00
Dmitry Babokin
63a3214cc6 Removing fails with g++4.4/g++4.7, as we are using clang by default now 2013-10-28 12:45:39 +04:00
Dmitry Babokin
4382902894 Fail_db update on Windows: 3.3 update and adding 3.4 2013-10-28 12:31:24 +04:00
Dmitry Babokin
103ef25f12 Docs fix in memory management section 2013-10-27 23:01:20 +04:00
Evghenii
84a7a5d1cb added tests for 3d launch 2013-10-26 16:16:28 +02:00
Evghenii
ac095dbf3e working on nvptx 2013-10-26 16:12:33 +02:00
Dmitry Babokin
a508bd4290 MacOS fails update 2013-10-26 14:50:45 +04:00
Dmitry Babokin
c1de753db6 Merge pull request #645 from ifilippov/636
fixing problem 644
2013-10-25 12:01:20 -07:00
Ilia Filippov
621679245a fixing problem 644 2013-10-25 22:44:37 +04:00
Dmitry Babokin
58aea1b61c Fail_db update with Linux passed with LLVM 3.4 2013-10-25 21:42:57 +04:00
Dmitry Babokin
9b5ee1b31b fail_db update on Linux 2013-10-25 02:07:51 +04:00
Evghenii
383e804ec1 changed notation form taskIndex1,2,3 -> taskIndex0,1,2 2013-10-24 17:20:56 +02:00
Dmitry Babokin
34f7986e19 Merge pull request #643 from ifilippov/testing
patch and regression test for problem with vzeroupper
2013-10-24 05:31:33 -07:00
egaburov
c5fc47cc19 tasksys cleaning 2013-10-24 14:09:46 +02:00
Ilia Filippov
814ee67519 patch and regression test for problem with vzeroupper 2013-10-24 16:03:55 +04:00
Evghenii
43761173ec changed notation, task[Index,Count]_[x,y,z] -> task[Index,Count][1,2,3]. Change launch <<< nx,ny,nz >>> into launch [nx,ny,nz] or equivalent launch [nz][ny][nx]. Programmer can pick the one the is liked the most 2013-10-24 13:16:23 +02:00
james.brodman
d2b89e0e37 Tweak generic target. 2013-10-23 18:01:01 -04:00
james.brodman
c4ad8f6ed4 Add docs/generic impls 2013-10-23 15:51:59 -04:00
james.brodman
4d289b16c2 Redesign after being hit with the KISS bat. 2013-10-23 14:25:43 -04:00
egaburov
e6c8765891 fixed tasksys.cpp for 3d tasking 2013-10-23 13:18:22 +02:00
egaburov
f89bad1e94 launch now passes the right info into tasking 2013-10-23 12:51:06 +02:00
egaburov
ade8751442 taskIndex_x,y,z are passed to the task 2013-10-23 08:39:17 +02:00
james.brodman
f97a2d68c8 Bugfix for non-const shift amt and unit tests. 2013-10-22 18:29:20 -04:00
james.brodman
899f85ce9c Initial Support for new stdlib shift operator 2013-10-22 18:06:54 -04:00
egaburov
78a05777bc added taskIndex_x,y,z and taskCount_x,y,z 2013-10-22 16:18:40 +02:00
Dmitry Babokin
e436e33771 Merge pull request #639 from ifilippov/master
correction errors in generic targets after operators support
2013-10-21 02:32:05 -07:00
Dmitry Babokin
faa69dc71f Merge pull request #638 from ifilippov/testing
time in alloy
2013-10-21 02:30:05 -07:00
Ilia Filippov
d72590ede6 correction errors in generic targets after operators support 2013-10-21 12:35:53 +04:00
Ilia Filippov
c378429ffb time in alloy 2013-10-18 19:45:39 +04:00
Dmitry Babokin
ebe1831e71 Merge pull request #636 from ifilippov/master
support of operators
2013-10-18 03:29:47 -07:00
Dmitry Babokin
8cdc70a5b9 Merge pull request #637 from dbabokin/egaburov-avx2-i64x4
Windows support and testing update for avx1.1-i64x4 and avx2-i64x4 targets.
2013-10-18 03:25:10 -07:00
Dmitry Babokin
6244902931 Updating fail_db with new Windows fails 2013-10-18 14:23:03 +04:00
Dmitry Babokin
1bd5b704c6 Adding support for build on Windows for avx1.1-i64x4 and avx2-i64x4 2013-10-18 14:23:03 +04:00
Dmitry Babokin
2117002c01 Adding testing support for avx1.1-i64x4 and avx2-i64x4 targets 2013-10-18 14:23:03 +04:00
Dmitry Babokin
0bd519da49 Merge pull request #631 from egaburov/avx2-i64x4
added avx2-i64x4 and avx1.1-i64x4 targets
2013-10-18 03:17:57 -07:00
Ilia Filippov
2e724b095e support of operators 2013-10-18 13:45:15 +04:00
egaburov
1710b9171f removed LLVM_3_0 legacy part and changed copyright to 2013 2013-10-18 08:53:01 +02:00
jbrodman
599153d86b Merge pull request #634 from egaburov/patch_knc
Making __masked_store_* thread-safe
2013-10-17 08:43:37 -07:00
evghenii
fb1a2a0a40 __masked_store_* uses vscatter now, and is thread-safe 2013-10-15 17:10:46 +03:00
egaburov
7e9b4c0924 added avx2-i64x4 and avx1.1-i64x4 targets 2013-10-15 10:02:10 +02:00
Dmitry Babokin
96112f9d45 Merge pull request #630 from ifilippov/testing
new changes in test system
2013-10-14 03:13:35 -07:00
Ilia Filippov
496845df60 new changes in test system 2013-10-14 12:23:14 +04:00
egaburov
8808a8cc9c Merge remote-tracking branch 'upstream/master' into nvptx 2013-10-13 13:03:00 +02:00
Dmitry Babokin
ba4977cd84 Merge pull request #629 from mmp/typo
Fix small typo for NEON targets in Target::SupportedTargets
2013-10-12 06:21:44 -07:00
Matt Pharr
e751977b72 Fix small typo for NEON targets in Target::SupportedTargets 2013-10-12 06:15:57 -07:00
Dmitry Babokin
5409fc2e19 Merge pull request #628 from dbabokin/fail_db
Fail_db update on Mac with clang 3.3 compiler
2013-10-12 03:10:59 -07:00
Dmitry Babokin
a4d6240ab4 Fail_db update on Mac with clang 3.3 compiler 2013-10-12 14:07:41 +04:00
Dmitry Babokin
78a26c8743 Merge pull request #627 from dbabokin/clang_build
Switch over to clang as default C++ compiler for build and test
2013-10-11 07:13:35 -07:00
Dmitry Babokin
a45e147d38 Update fail_db with new passes after change in trunk 2013-10-11 18:09:28 +04:00
Dmitry Babokin
477a688c68 fail_db.txt update with fails with clang 3.3 as a C++ compiler for tests (Linux) 2013-10-11 16:29:17 +04:00
Dmitry Babokin
d129e33b51 Enable clang as default C++ compiler used by run_tests.py and perf.py 2013-10-11 16:29:17 +04:00
Dmitry Babokin
17b54cb0c8 Fix problem with building ISPC by clang 3.4 2013-10-11 16:29:17 +04:00
Dmitry Babokin
99df2d9dbf Switch examples on Unix from using g++ to clang++ 2013-10-11 16:29:17 +04:00
Dmitry Babokin
8297edd251 Switching default compiler on Unix from g++ to clang++ 2013-10-11 16:29:16 +04:00
Dmitry Babokin
b43861b467 Merge pull request #626 from ifilippov/fix1
fix for ISPC for compfails at sse4-i8 and sse4-i16
2013-10-11 04:31:22 -07:00
Ilia Filippov
92773ada6d fix for ISPC for compfails at sse4-i8 and sse4-i16 2013-10-11 15:23:40 +04:00
Dmitry Babokin
e3db1091a8 Merge pull request #624 from ifilippov/fix1
patch for LLVM for fails on avx-x2 target
2013-10-11 04:03:35 -07:00
Ilia Filippov
7abbe97ee9 patch for LLVM for fails at avx-x2 2013-10-11 14:00:47 +04:00
Dmitry Babokin
c7606bb93b Merge pull request #623 from ifilippov/testing
adding --extra option and correction pathes to ispc compiler
2013-10-10 04:52:22 -07:00
Ilia Filippov
0d9594354a adding --extra option and correction pathes to ispc compiler 2013-10-10 15:38:08 +04:00
jbrodman
c18fa15db1 Merge pull request #622 from jbrodman/shengfu
Fix segfault when using both -g and -MMM
2013-10-08 16:01:27 -07:00
james.brodman
44912e6b1e Fix segfault when using both -g and -MMM 2013-10-08 18:27:03 -04:00
egaburov
5d56d29240 merged with master 2013-10-08 19:13:30 +02:00
james.brodman
0439605e94 Weird merge problems. 2013-10-07 14:20:59 -04:00
jbrodman
1485419414 Merge pull request #618 from egaburov/clean_knc
Cleaned knc-i1x16.h
2013-10-07 08:36:57 -07:00
evghenii
3da152a150 fixed zmm __mul for i64 with icc < 14.0.0, 4 knc::fails lefts, but I doubt these are due to this include.. 2013-10-07 18:30:22 +03:00
evghenii
4222605f87 fixed lshr/ashr/shl shifts. __mul i64 vector version for icc < 14.0.0 works only on signed, so commented it out in favour of sequential 2013-10-07 14:24:27 +03:00
evghenii
1b196520f6 knc-i1x16.h is cleaned: int32,float,double are complete, int64 is partially complete 2013-10-05 22:10:05 +03:00
evghenii
10223cfac3 workong on shuffle/rotate for double, there seems to be a bug in cvt2zmm cvt2hilo 2013-10-05 15:23:55 +03:00
evghenii
8b0fc558cb complete cleaning 2013-10-05 14:15:33 +03:00
evghenii
8a6789ef61 cleaned float added fails info 2013-10-04 14:11:09 +03:00
evghenii
57f019a6e0 cleaned int64 added fails info 2013-10-04 13:39:15 +03:00
evghenii
32c77be2f3 cleaned mask & int32, only test141 fails 2013-10-04 11:42:52 +03:00
james.brodman
ac79f3f345 format change 2013-10-01 12:31:33 -04:00
james.brodman
d45c5767d8 Due diligence tweaks. 2013-10-01 12:17:57 -04:00
Dmitry Babokin
2741e3c1d0 Merge pull request #616 from jbrodman/master
Adding missing typecasts and guarding i64 __mul with icc version check
2013-10-01 08:59:52 -07:00
james.brodman
dc8895352a Adding missing typecasts and guarding i64 __mul with compiler version check 2013-10-01 11:53:56 -04:00
Dmitry Babokin
959a741c14 Merge pull request #612 from dbabokin/missing_target
Adding support for avx1-i64x4 target in test system.
2013-10-01 07:43:26 -07:00
Dmitry Babokin
af95188f29 Merge pull request #615 from dbabokin/32bit_examples
Make examples 32-bit friendly.
2013-10-01 07:42:46 -07:00
Dmitry Babokin
c7b4164122 Redefining ISPC should not discard ISPC_FLAGS 2013-10-01 18:40:26 +04:00
Dmitry Babokin
4a908cf38d Merge pull request #614 from ifilippov/testing
pipe correction and some other small changes in test system (fixes problems with alloy.py on MacOS)
2013-10-01 07:35:48 -07:00
Ilia Filippov
b2cf0209b1 pipe correction and some other small changes in test system 2013-10-01 18:01:29 +04:00
Dmitry Babokin
2d6f7a7c93 Support i686 architecture recognition as x86 and enable 32 bit x86 platforms 2013-10-01 17:37:34 +04:00
Dmitry Babokin
660f14245b Merge pull request #611 from dbabokin/typos
Typo fix and copyright update
2013-09-30 08:40:38 -07:00
Dmitry Babokin
49cefc2e97 Updating fail_db for new target 2013-09-30 19:20:18 +04:00
Dmitry Babokin
7942bdb728 Typo fix and copyright update 2013-09-30 18:09:59 +04:00
Dmitry Babokin
758efebb3c Add missing testing support for avx1-i64x4 target 2013-09-30 17:54:59 +04:00
Dmitry Babokin
efad72fef4 Merge pull request #609 from dbabokin/1_5_0
Release 1.5.0
2013-09-27 14:35:47 -07:00
Dmitry Babokin
3b4cc90800 Changing ISPC to 1.5.dev 2013-09-28 01:32:00 +04:00
Dmitry Babokin
8a39af8f72 Release 1.5.0 2013-09-27 23:27:05 +04:00
james.brodman
ecb1174a18 leaving myself notes for later 2013-09-27 14:23:04 -04:00
jbrodman
39c2274f1a Merge pull request #588 from egaburov/knc-modes
Added knc-i1x16.h , knc-i1x8.h and knc-i1x8unsafe_fast.h
2013-09-27 11:20:56 -07:00
Dmitry Babokin
570246a016 Merge pull request #608 from dbabokin/fail_db
Fail_db.txt update
2013-09-27 07:16:52 -07:00
Dmitry Babokin
8e71dbd6c1 Adding comments to fail_db.txt 2013-09-27 18:12:12 +04:00
Dmitry Babokin
da52ae844f Adding AVX2 fails on Windows 2013-09-27 18:06:28 +04:00
Dmitry Babokin
396aaae098 Add fails with VS2010 on Windows 2013-09-27 17:00:17 +04:00
Dmitry Babokin
5855ae7460 Add fails with gcc 4.7 on Mac 2013-09-27 02:32:01 +04:00
Dmitry Babokin
2a83cefd5b Add fails with gcc 4.7 on Linux 2013-09-26 19:07:38 +04:00
Dmitry Babokin
dfc723bc19 Add fails with gcc 4.4 on Linux 2013-09-26 16:34:49 +04:00
Dmitry Babokin
23cb59427d Merge pull request #607 from ifilippov/testing
correction of test system
2013-09-26 04:02:49 -07:00
Ilia Filippov
1c858c34f7 correction of test system 2013-09-26 14:54:15 +04:00
james.brodman
9f7a4aa867 Update to include latest changes.
Merge branch 'master' into nomosoa
2013-09-25 19:17:56 -04:00
Dmitry Babokin
4285b5a89e Merge pull request #591 from dbabokin/dispatch
Adding check for OS AVX support to auto-dispatch code
2013-09-23 08:42:22 -07:00
Dmitry Babokin
a80696f98f Merge pull request #594 from ifilippov/testing
Incremental fixes for initial implementation of test system
2013-09-23 07:26:19 -07:00
Ilia Filippov
af5da885a5 small corrections of test system 2013-09-23 18:18:48 +04:00
Dmitry Babokin
349062d89c Merge pull request #596 from ifilippov/patch_3_
adding patch for LLVM 3.3 which increases performance after regression
2013-09-23 07:10:26 -07:00
Ilia Filippov
5a9b3b3abb adding patch for LLVM 3.3 which increases performance after regression 2013-09-23 18:01:03 +04:00
Dmitry Babokin
ea50609829 Merge pull request #595 from ifilippov/sort
adding sort to performance checking
2013-09-23 03:22:28 -07:00
evghenii
019043f55e patched half2float & float2half to pass the tests. Now only test-141 is failed. but it seems to be test rather than knc-i1x16.h related 2013-09-23 09:55:55 +03:00
egaburov
6319a84ded Merge branch 'knc-modes' of github.com:egaburov/ispc into knc-modes 2013-09-23 08:12:13 +02:00
egaburov
8eeb446f74 Merge remote-tracking branch 'upstream/master' into knc-modes 2013-09-23 08:12:00 +02:00
jbrodman
1913199a45 Merge pull request #597 from pgurd/master
- Add Silvermont (--cpu=slm) option for llvm 3.4+.
2013-09-20 14:08:35 -07:00
Preston Gurd
4b26b8b430 Remove redundant "slm". 2013-09-20 16:44:01 -04:00
Preston Gurd
9e0e9dbecc - Add Silvermont (--cpu=slm) option for llvm 3.4+.
- Change default Sandybridge isa name to avx1-i32x8 from avx-i32x8,
  to conform with replacement of avx-i32x8 by avx1-i32x8 everywhere else.
- Add "target-cpu" attribute, when using AttrBuilder, to correct a problem
  whereby llvm would switch from the command line cpu setting
  to the native (auto-detected) cpu setting on second and subsequent
  functions. e.g. if I wanted to build for Silvermont on a Sandy Bridge
  machine, ispc/llvm would correctly use Silvermont and turn on the
  Silvermont scheduler. For the second and subsequent functions,
  it would auto-detect Sandy Bridge, but still run the Silvermont
  scheduler.
2013-09-20 14:42:46 -04:00
Ilia Filippov
87cecddabb adding sort to performance checking 2013-09-20 18:57:20 +04:00
evghenii
ddecdeb834 move remaining int64 from knc.h some of fails to pass tests, grep for evghenii::fails to find out which functions fail and on what tests 2013-09-20 14:55:15 +03:00
evghenii
5cabf0bef0 adding int64 support form knc.h, phase 1. bugs: __lshr & __ashr fail idiv.ispc test, __equal_i64 & __equal_i64_and_mask fails reduce_equal_8.ispc test 2013-09-20 14:13:40 +03:00
Dmitry Babokin
0647c02561 Merge pull request #593 from ifilippov/testing
change head to trunk
2013-09-19 06:56:08 -07:00
Ilia Filippov
491c58aef3 change head to trunk 2013-09-19 17:47:10 +04:00
egaburov
28737f7ab4 Merge remote-tracking branch 'upstream/master' into knc-modes 2013-09-19 15:34:46 +02:00
evghenii
0ed89e93fa added fails info 2013-09-19 16:34:06 +03:00
Dmitry Babokin
0663cb41ff Merge pull request #592 from dbabokin/double
Typo fix is tests/double-consts.ispc
2013-09-19 06:29:14 -07:00
Dmitry Babokin
b2678b4338 Typo fix is tests/double-consts.ispc 2013-09-19 17:27:58 +04:00
egaburov
d68dbbc7bc Merge remote-tracking branch 'upstream/master' into knc-modes 2013-09-19 15:08:17 +02:00
evghenii
0c274212c2 performance tuning for knc-i1x8.h. this gives goed enough performance for double only. float performance is terrible 2013-09-19 16:07:22 +03:00
evghenii
dbef4fd7d7 fixed notation 2013-09-19 14:52:22 +03:00
Dmitry Babokin
43245bbc11 Adding check for OS AVX support to auto-dispatch code 2013-09-19 15:39:56 +04:00
Dmitry Babokin
971d5f8d12 Merge pull request #590 from ifilippov/testing
Redesigned, enhanced and improved test system.
2013-09-19 04:23:20 -07:00
evghenii
6a21218c13 fix warrning and add KNC 1 2013-09-19 13:45:31 +03:00
Ilia Filippov
bb8f7d4e3f removing LLVM 3.1 and 3.2 from default testing 2013-09-19 14:37:26 +04:00
Dmitry Babokin
67362935f4 Merge pull request #589 from egaburov/master
fixed support for numbers like .1d2
2013-09-19 01:59:45 -07:00
egaburov
3b8c7c9126 merger with upstream/master 2013-09-19 10:57:58 +02:00
egaburov
5f20dd01e8 merged with master 2013-09-19 10:56:39 +02:00
Dmitry Babokin
a5782dfb2d Merge pull request #586 from dbabokin/double
Test, documentation and vim support for double precision constants
2013-09-19 01:51:13 -07:00
Dmitry Babokin
1c527ae34c Adding tests and vim support for double constant of the form .1d41 2013-09-19 12:49:45 +04:00
Dmitry Babokin
f45f6cb32a Test, documentation and vim support for double precision constants 2013-09-19 12:49:45 +04:00
Ilia Filippov
00cd90c6b0 test system 2013-09-19 12:26:57 +04:00
egaburov
406e2eb8d0 fix double precision input to support .123d321 type of input 2013-09-19 09:16:37 +02:00
evghenii
3cf63362a4 small tuning 2013-09-18 20:03:08 +03:00
evghenii
e4b1f58595 performance fix.. still some issues left with equal_i1 for __vec8_i1 2013-09-18 19:14:41 +03:00
evghenii
4b1a0b4bc4 added fails 2013-09-18 18:41:22 +03:00
evghenii
922edb1128 completed knc-i1x16.h and added knc-i1x8.h with knc-i1x8unsafe_fast.h that doesnt pass several tests.. 2013-09-18 18:14:07 +03:00
egaburov
0672289cb9 Merge branch 'double' of https://github.com/dbabokin/ispc 2013-09-18 15:24:35 +02:00
Dmitry Babokin
ff547fc6cb Merge pull request #584 from egaburov/d-suffix
Added fortran notation double precision support
2013-09-17 12:46:59 -07:00
Dmitry Babokin
fa78d548cc Test, documentation and vim support for double precision constants 2013-09-17 23:36:16 +04:00
Dmitry Babokin
191d9dede5 Merge pull request #585 from tkoziara/master
Sort description.
2013-09-16 10:08:49 -07:00
Tomasz Koziara
6e0b9ddc74 Sort description. 2013-09-16 18:02:07 +01:00
egaburov
eef4e11768 now it is also case nonsensitive 2013-09-16 17:25:13 +02:00
Evghenii
6fd21d988d fixed lexer to properly read fortran-notation double constants 2013-09-16 17:15:02 +02:00
egaburov
2332490481 added fortran_double_constant 2013-09-16 16:31:41 +02:00
egaburov
e2a91e6de5 added support for "d"-suffix 2013-09-16 15:54:32 +02:00
Dmitry Babokin
b258027061 Merge pull request #582 from tkoziara/master
Uniform memory allocation in sort example is fixed.
2013-09-16 03:29:43 -07:00
Tomasz Koziara
97068765e8 Copyright reversed. 2013-09-14 18:09:04 +01:00
Dmitry Babokin
ec5cbb8117 Merge pull request #583 from dbabokin/win-fix
Fix for Windows build to include new target: avx-i64x4
2013-09-13 15:02:09 -07:00
Dmitry Babokin
ce99b17616 Fix for Windows buils to include new target: avx-i64x4 2013-09-14 02:00:23 +04:00
Dmitry Babokin
06aa2067d9 Merge pull request #578 from egaburov/master
added --target=avx-i64x4 & svml support for all sse/avx modes
2013-09-13 09:40:24 -07:00
Evghenii
36886971e3 revert lex.ll parse.yy stdlib.ispc to state when all constants are floats 2013-09-13 16:02:53 +02:00
Evghenii
9861375f0c renamed avx-i64x4 -> avx1-i64x4 2013-09-13 15:07:14 +02:00
Tomasz Koziara
ed825b3773 Uniform memory allocation fixed. 2013-09-13 13:14:31 +01:00
egaburov
a9913c8337 changed lexer/parser to be able to read float constants, if they have "f"-suffix 2013-09-13 10:26:15 +02:00
Evghenii
a97eb7b7cb added clamp in double precision 2013-09-13 09:32:59 +02:00
egaburov
715b828266 fixed float constants to be read as doubles 2013-09-13 09:25:52 +02:00
Evghenii
40af8d6ed5 fixed segfault in tests/launch-*.ispc. nativeVectoWidth in avx-i64x4 was set to 4. Fixed 2013-09-12 20:25:44 +02:00
Evghenii
059d80cc11 included suggested changes, ./tests/launch-*.ispc still fails. something is mask64 related, not sure what. help... 2013-09-12 17:18:12 +02:00
egaburov
7364e06387 added mask64 2013-09-12 12:02:42 +02:00
egaburov
efc20c2110 added svml support to all sse/avx modes 2013-09-11 17:07:54 +02:00
egaburov
19379db3b6 svml cleanup 2013-09-11 16:48:56 +02:00
egaburov
9cf8e8cbf3 builtins fix for double precision svml and __stdlib_asin 2013-09-11 15:23:45 +02:00
egaburov
7a32699573 added svml.m4 2013-09-11 15:18:03 +02:00
egaburov
320c41ffcf added svml support. experimental. for some reason all sybmols are visible.. 2013-09-11 15:16:50 +02:00
egaburov
9c79d4d182 addded avxh with vectorWidth=4 support, use --target=avxh to enable it 2013-09-11 12:58:02 +02:00
jbrodman
582cfe55b6 Merge pull request #575 from jbrodman/master
Revert "Remove support for using SVML for math lib routines."
2013-09-05 10:31:23 -07:00
james.brodman
8db378b265 Revert "Remove support for using SVML for math lib routines."
This reverts commit d9c38b5c1f.
2013-09-04 16:01:58 -04:00
jbrodman
71a7564317 Merge pull request #574 from jbrodman/uniftypedef
Fix to respect uniform/varying qualifiers inside of typedefs.
2013-09-03 13:14:00 -07:00
jbrodman
c14b035a46 Merge pull request #572 from ifilippov/master
correction of adding -Werror option
2013-08-30 11:17:01 -07:00
jbrodman
cf2eaa0014 Merge pull request #569 from dbabokin/unmasked
Fix for incorrect implementation of reduce_[min|max]_[float|double]
2013-08-30 11:16:42 -07:00
jbrodman
cb92d54808 Merge pull request #570 from dbabokin/docs
Minor docs fixes.
2013-08-30 11:16:26 -07:00
james.brodman
97d430d5cd Fix to respect uniform/varying qualifiers inside of typedefs. 2013-08-30 14:13:08 -04:00
Ilia Filippov
320b1700ff correction of adding -Werror option 2013-08-30 16:01:01 +04:00
Dmitry Babokin
e06267ef1b Fix for incorrect implementation of reduce_[min|max]_[float|double], it showed up as -O0 2013-08-29 16:16:02 +04:00
Dmitry Babokin
501a23ad20 Typos fixes in docs 2013-08-29 14:48:09 +04:00
Dmitry Babokin
c1cc80b1d5 Merge pull request #568 from jbrodman/master
Fix against LLVM ToT
2013-08-27 14:08:12 -07:00
james.brodman
28080b0c22 Fix build against 3.4 2013-08-27 16:56:00 -04:00
james.brodman
be3a40e70b Fix for 3.4 2013-08-27 15:15:16 -04:00
Dmitry Babokin
5d8ebf3ca1 Fixing r183327-AVX2-GATHER.patch file permissions 2013-08-27 18:27:06 +04:00
Dmitry Babokin
443987f536 fixing ispc.rst file properties (should not be executable) 2013-08-27 15:33:44 +04:00
Dmitry Babokin
f6ce969d9f Merge pull request #567 from ifilippov/master
Changes in perf.py functionality, unification of examples, correction build warnings
2013-08-26 03:26:28 -07:00
Ilia Filippov
f620cdbaa1 Changes in perf.py functionality, unification of examples, correction build warnings 2013-08-26 14:04:59 +04:00
james.brodman
090dec8549 Output regular header for multiple targets + fix exported varying types. 2013-08-22 13:23:22 -04:00
Dmitry Babokin
3f2217646e Merge pull request #562 from mmp/arm
New target naming scheme, new targets (SSE4-i8x16 and SSE4-i16x8), plus some cleanup and improvements.
2013-08-22 08:33:25 -07:00
Matt Pharr
611477e214 Revert change to lEmitVaryingSelect().
Using vector select versus a store and masked load for varying vector
selects seems to give worse code.  This may be related to
http://llvm.org/bugs/show_bug.cgi?id=16941.
2013-08-22 07:50:25 -07:00
Dmitry Babokin
9bb5c314cd Merge pull request #565 from dbabokin/run_tests
run_tests.py fix and new switch.
2013-08-22 01:48:22 -07:00
Dmitry Babokin
f31a31478b Moving time calculation earlier 2013-08-22 12:41:57 +04:00
Dmitry Babokin
5fb30939be Fix for #564, using wrong ispc in run_tests.py 2013-08-21 19:46:18 +04:00
Dmitry Babokin
60b413a9cb Adding --non-interactive switch to run_tests.py 2013-08-21 19:25:30 +04:00
JCB
3e9d784013 Support exported arrays of varyings 2013-08-20 16:14:29 -04:00
JCB
0452b77169 Generate multiple headers for multiple targets. 2013-08-20 15:25:53 -04:00
JCB
10b8c481f5 initial support for exported varying 2013-08-20 15:14:15 -04:00
Matt Pharr
502f8fd76b Reduce debug spew on failing idiv.ispc tests 2013-08-20 09:22:09 -07:00
Matt Pharr
2b2905b567 Fix (preexisting) bugs in generic-32/64.h with type of "__any", etc.
This should be a bool, not a one-wide vector of bools.  The equivalent
fix was previously made in generic-16.h, but not made here.  (Note that
many tests are still failing with these targets, but at least they
compile properly now.)
2013-08-20 09:05:50 -07:00
Matt Pharr
e7f067d70c Fix handling of __clock() builtin for "generic" targets. 2013-08-20 09:04:52 -07:00
Matt Pharr
d976da7559 Speed up idiv test (dont test int32 as thoroughly) 2013-08-20 08:49:51 -07:00
Dmitry Babokin
84dbd66d10 Merge pull request #563 from jbrodman/debugopt
Separate -O and -g
2013-08-15 13:10:13 -07:00
james.brodman
6be3c24ee5 Separate -O and -g 2013-08-15 15:24:46 -04:00
Matt Pharr
42f31aed69 Another attempt at fixing the Windows build (added sse4-8/sse4-16 targets). 2013-08-14 11:02:45 -07:00
Matt Pharr
ed017c42f1 Fix ispc.vcxproj for Windows builds 2013-08-11 07:47:20 -07:00
Matt Pharr
4766467271 Revert ispc.vcxproj to version from top-of-tree. 2013-08-10 11:23:39 -07:00
Matt Pharr
ea8591a85a Fix build with LLVM top-of-tree (link libcurses) 2013-08-10 11:22:43 -07:00
Matt Pharr
7ab4c5391c Fix build with LLVM 3.2 and generic-4 / examples/sse4.h target. 2013-08-09 19:56:43 -07:00
Matt Pharr
0c5742b6f8 Implement new naming scheme for --target.
Now targets are named like "<isa>-i<mask size>x<gang size>", e.g.
"sse4-i8x16", or "avx2-i32x16".

The old target names are still supported.
2013-08-08 19:23:44 -07:00
Matt Pharr
1d76f74b16 Fix compiler warnings 2013-08-07 12:53:39 -07:00
Matt Pharr
5e5d42b918 Fix build with LLVM 3.1 2013-08-06 17:55:37 -07:00
Matt Pharr
cd9afe946c Merge branch 'master' into arm
Conflicts:
	Makefile
	builtins.cpp
	ispc.cpp
	ispc.h
	ispc.vcxproj
	opt.cpp
2013-08-06 17:39:21 -07:00
Matt Pharr
1276ea9844 Revert "Remove support for building with LLVM 3.1"
This reverts commit d3c567503b.

Conflicts:
	opt.cpp
2013-08-06 17:00:35 -07:00
jbrodman
0755e4f8ff Merge pull request #561 from dbabokin/neon_condition
Fix for Windows build and making NEON target optional
2013-08-06 13:45:30 -07:00
Matt Pharr
ccdbddd388 Add peephole optimization to match int8/int16 averages.
Match the following patterns in IR, turning them into target-specific
intrinsics (e.g. PAVGB on x86) when possible.

(unsigned int8)(((unsigned int16)a + (unsigned int16)b + 1)/2)
(unsigned int8)(((unsigned int16)a + (unsigned int16)b)/2)
(unsigned int16)(((unsigned int32)a + (unsigned int32)b + 1)/2)
(unsigned int16)(((unsigned int32)a + (unsigned int32)b)/2)
(int8)(((int16)a + (int16)b + 1)/2)
(int8)(((int16)a + (int16)b)/2)
(int16)(((int32)a + (int32)b + 1)/2)
(int16)(((int32)a + (int32)b)/2)
2013-08-06 08:59:46 -07:00
Matt Pharr
5b20b06bd9 Add avg_{up,down}_int{8,16} routines to stdlib
These compute the average of two given values, rounding up and down,
respectively, if the result isn't exact.  When possible, these are
mapped to target-specific intrinsics (PADD[BW] on IA and VH[R]ADD[US]
on NEON.)

A subsequent commit will add pattern-matching to generate calls to
these intrinsincs when the corresponding patterns are detected in the
IR.)
2013-08-06 08:41:12 -07:00
Dmitry Babokin
dff7735af9 Fix for Windows build and making NEON target optional 2013-08-02 19:24:34 -07:00
Dmitry Babokin
fb34fc5a85 Merge pull request #559 from ifilippov/debug_phases
Supporting dumping, switching off and debug printing of optimization phases.
2013-08-01 14:55:07 -07:00
Dmitry Babokin
43423c276f Merge pull request #560 from ifilippov/perf
Supporting perf.py on Mac OS
2013-08-01 13:20:01 -07:00
jbrodman
5ffc3a8f4c Merge pull request #558 from dbabokin/win_examples
Fix for examples to make them work on Windows properly
2013-08-01 08:02:42 -07:00
Ilia Filippov
3c06924a02 Supporting perf.py on Mac OS 2013-08-01 12:47:37 +04:00
Ilia Filippov
a174a90f86 Supporting dumping, switching off and debug printing of optimization phases 2013-08-01 11:37:52 +04:00
Matt Pharr
4f48d3258a Documentation updates for NEON 2013-07-31 20:06:04 -07:00
Matt Pharr
d9c38b5c1f Remove support for using SVML for math lib routines.
This path was poorly maintained and wasn't actually available on most
targets.
2013-07-31 06:56:48 -07:00
Matt Pharr
d3c567503b Remove support for building with LLVM 3.1 2013-07-31 06:46:45 -07:00
Matt Pharr
d7562d3836 Merge branch 'master' into arm 2013-07-31 06:38:17 -07:00
Dmitry Babokin
220f0b0b40 Renaming mandelbrot_tasks files to be different from mandelbrot 2013-07-30 19:53:12 -07:00
Matt Pharr
48ff03112f Remove __pause from stdlib_core() in utils.m4.
It wasn't ever being used, and was breaking compilation on ARM.
2013-07-30 08:44:22 -07:00
Matt Pharr
ab3b633733 Add 8-bit and 16-bit specialized NEON targets.
Like SSE4-8 and SSE4-16, these use 8-bit and 16-bit values for mask
elements, respectively, and thus should generate the best code when used
for computation with datatypes of those sizes.
2013-07-30 08:44:16 -07:00
Dmitry Babokin
fa93cb7d0b InterlockedAdd -> InterlockedExchangeAdd for better portability (InterlockedAdd is not always supported) 2013-07-29 22:46:36 -07:00
egaburov
153fbc3d7d some changes 2013-07-29 11:05:05 +02:00
egaburov
307abc8db7 adding text folder to test ptx generator 2013-07-28 15:57:10 +02:00
egaburov
af61c9bae3 working on target-nvptx64... need to add nvptx64 2013-07-28 15:50:08 +02:00
egaburov
67b549a937 Added nvptx64 target. Things to do:
1. builtins/target-nvptx64.ll to write, now it is just a copy of target-generic-1.ll
2. add __global__ & __device__ scope
2. make code work for a single cuda thread
3. use tasks to work as a block grid and programIndex as laneIdx, programCount as warpSize
4. ... and more...
2013-07-28 14:31:43 +02:00
Matt Pharr
b6df447b55 Add reduce_add() for int8 and int16 types.
This maps to specialized instructions (e.g. PSADBW) when available.
2013-07-25 09:46:01 -07:00
Matt Pharr
2d063925a1 Explicitly call the PBLENDVB intrinsic for i8 blending with sse4-8.
This is slightly cleaner than trunc-ing the i8 mask to i1 and using
a vector select.  (And is probably more safe in terms of good code.)
2013-07-25 09:46:01 -07:00
Matt Pharr
bba84f247c Improved optimization of vector select instructions.
Various LLVM optimization passes are turning code like:

%cmp = icmp lt <8 x i32> %foo, %bar
%cmp32 = sext <8 x i1> %cmp to <8 x i32>
. . .
%cmp1 = trunc <8 x i32> %cmp32 to <8 x i1>
%result = select <8 x i1> %cmp1, . . .

Into:

%cmp = icmp lt <8 x i32> %foo, %bar
%cmp32 = zext <8 x i1> %cmp to <8 x i32>   # note: zext
. . .
%cmp1 = icmp ne <8 x i32> %cmp32, zeroinitializer
%result = select <8 x i1> %cmp1, …

Which in turn isn't matched well by the LLVM code generators, which
in turn leads to fairly inefficient code.  (i.e. it doesn't just emit
a vector compare and blend instruction.)

Also, renamed VSelMovmskOptPass to InstructionSimplifyPass to better
describe its functionality.
2013-07-25 09:46:01 -07:00
Matt Pharr
780b0dfe47 Add SSE4-16 target.
Along the lines of sse4-8, this is an 8-wide target for SSE4, using
16-bit elements for the mask.  It's thus (in principle) the best
target for SIMD computation with 16-bit datatypes.
2013-07-25 09:46:01 -07:00
Matt Pharr
04d61afa23 Fix bug in lEmitVaryingSelect() for targets with i1 mask types.
Commit 53414f12e6 introduced a but where lEmitVaryingSelect() would
try to truncate a vector of i1s to a vector of i1s, which in turn
made LLVM's IR analyzer unhappy.
2013-07-25 09:45:20 -07:00
Dmitry Babokin
663ebf7857 Merge pull request #551 from mmp/constfold
Improvements to constant folding.
2013-07-24 10:27:04 -07:00
Matt Pharr
53414f12e6 Add SSE4 target optimized for computation with 8-bit datatypes.
This change adds a new 'sse4-8' target, where programCount is 16 and
the mask element size is 8-bits.  (i.e. the most appropriate sizing of
the mask for SIMD computation with 8-bit datatypes.)
2013-07-23 17:30:32 -07:00
Matt Pharr
15a3ef370a Use @llvm.readcyclecounter to implement stdlib clock() function.
Also added a test for the clock builtin.
2013-07-23 17:24:57 -07:00
Matt Pharr
c14659c675 Fix bug in lGetConstantInt() in parse.yy.
Previously, we weren't handling signed/unsigned constant types correctly.
2013-07-23 17:24:57 -07:00
Matt Pharr
f7f281a256 Choose type for integer literals to match the target mask size (if possible).
On a target with a 16-bit mask (for example), we would choose the type
of an integer literal "1024" to be an int16.  Previously, we used an int32,
which is a worse fit and leads to less efficient code than an int16
on a 16-bit mask target.  (However, we'd still give an integer literal
1000000 the type int32, even in a 16-bit target.)

Updated the tests to still pass with 8 and 16-bit targets, given this
change.
2013-07-23 17:24:50 -07:00
Matt Pharr
9ba49eabb2 Reduce estimated costs for 8 and 16-bit min() and max() in stdlib.
These actually compile to a single instruction.
2013-07-23 16:52:43 -07:00
Matt Pharr
e7abf3f2ea Add support for mask vectors of 8 and 16-bit element types.
There were a number of places throughout the system that assumed that the
execution mask would only have either 32-bit or 1-bit elements.  This
commit makes it possible to have a target with an 8- or 16-bit mask.
2013-07-23 16:50:11 -07:00
Matt Pharr
83e1630fbc Add support for fast division of varying int values by small constants.
For varying int8/16/32 types, divides by small constants can be
implemented efficiently through multiplies and shifts with integer
types of twice the bit-width; this commit adds this optimization.
    
(Implementation is based on Halide.)
2013-07-23 16:49:56 -07:00
Matt Pharr
0277ba1aaa Improve warnings for right shift by varying amounts.
Fixes:
- Don't issue a warning when the shift is a by the same amount in all
  vector lanes.
- Do issue a warning when it's a compile-time constant but the values
  are different in different lanes.

Previously, we warned iff the shift amount wasn't a compile-time constant.
2013-07-23 16:49:07 -07:00
Matt Pharr
753c001e69 Merge branch 'master' of https://github.com/ispc/ispc into constfold 2013-07-23 16:12:04 -07:00
Dmitry Babokin
10c0b42d0d Merge pull request #549 from mmp/fix-tot
Fix build with LLVM top-of-tree.
2013-07-23 09:14:08 -07:00
Matt Pharr
564e61c828 Improvements to constant folding.
We can now do constant folding with all basic datatypes (the previous
implementation handled int32 well, but had limited, if any, coverage
for other datatypes.)

Reduced a bit of repeated code in the constant folding implementation
through template helper functions.
2013-07-22 16:12:02 -07:00
Matt Pharr
946c39a5df Fix build with LLVM top-of-tree.
The DIBuilder::getCU() method has been removed; we now just store the
compilation unit returned when we call DIBuilder::createCompileUnit.
2013-07-22 15:42:52 -07:00
Jean-Luc Duprat
2948e84846 Merge pull request #547 from mmp/arm-merge
Add ARM NEON support
2013-07-22 09:24:16 -07:00
Matt Pharr
068fd8098c Explicitly set armv7-eabi target triple on ARM.
This lets the compiler generate FMA instructions, which seems
desirable.
2013-07-20 11:19:10 -07:00
Matt Pharr
d7b0c5794e Add support for ARM NEON targets.
Initial support for ARM NEON on Cortex-A9 and A15 CPUs.  All but ~10 tests
pass, and all examples compile and run correctly.  Most of the examples
show a ~2x speedup on a single A15 core versus scalar code.

Current open issues/TODOs
- Code quality looks decent, but hasn't been carefully examined.  Known
  issues/opportunities for improvement include:
  - fp32 vector divide is done as a series of scalar divides rather than
    a vector divide (which I believe exists, but I may be mistaken.)
    This is particularly harmful to examples/rt, which only runs ~1.5x
    faster with ispc, likely due to long chains of scalar divides.
  - The compiler isn't generating a vmin.f32 for e.g. the final scalar
    min in reduce_min(); instead it's generating a compare and then a
    select instruction (and similarly elsewhere).
  - There are some additional FIXMEs in builtins/target-neon.ll that
    include both a few pieces of missing functionality (e.g. rounding
    doubles) as well as places that deserve attention for possible
    code quality improvements.

- Currently only the "cortex-a9" and "cortex-15" CPU targets are
  supported; LLVM supports many other ARM CPUs and ispc should provide
  access to all of the ones that have NEON support (and aren't too
  obscure.)

- ~5 of the reduce-* tests hit an assertion inside LLVM (unfortunately
   only when the compiler runs on an ARM host, though).

- The Windows build hasn't been tested (though I've tried to update
  ispc.vcxproj appropriately).  It may just work, but will more likely
  have various small issues.)

- Anything related to 64-bit ARM has seen no attention.
2013-07-19 23:07:24 -07:00
Matt Pharr
b007bba59f Replace inline assembly in task system with equivalent gcc intrinsics.
gcc/icc build only: the Windows build still uses the Win32 calls for
these.
2013-07-19 23:07:24 -07:00
Dmitry Babokin
abf43ad01d Merge pull request #546 from dbabokin/release
Release 1.4.4
2013-07-19 18:49:07 -07:00
Dmitry Babokin
922895de69 Changing ISPC version to 1.4.5dev 2013-07-19 18:47:43 -07:00
Dmitry Babokin
28f0bce9f2 Release 1.4.4 2013-07-19 16:22:10 -07:00
Dmitry Babokin
0f82f216a2 Merge pull request #544 from mmp/master
Handle SHL with a constant vector in LLVMVectorIsLinear().
2013-07-18 11:46:11 -07:00
Matt Pharr
7454b1399c Handle SHL with a constant vector in LLVMVectorIsLinear().
LLVM3.4 seems to be turning multiplies by a constant power of 2 into
the equivalent SHL, which was in turn thwarting the pattern matching
for turning gathers/scatters into vector loads/stores.
2013-07-17 14:12:43 -07:00
jbrodman
4ebf46bd63 Merge pull request #543 from mmp/master
Fix build with LLVM top-of-tree
2013-07-17 10:38:06 -07:00
Matt Pharr
f1cce0ef5f Fix build with LLVM top-of-tree 2013-07-17 09:25:00 -07:00
Dmitry Babokin
8c9e873c10 Merge pull request #540 from dbabokin/embree_bug
Fix for the bug introduced by --intrumentation fix
2013-07-04 10:45:06 -07:00
Dmitry Babokin
c85439e7bb Fix for the bug introduced by --intrumentation fix 2013-07-04 21:41:57 +04:00
Ilia Filippov
fd7f87b55e Supporting perf.py on Windows and some small corrections in it 2013-07-02 19:23:18 +04:00
Dmitry Babokin
8be4128c5a Merge pull request #534 from ifilippov/perf
add script for measuring performance
2013-07-01 05:09:03 -07:00
Ilia Filippov
806e37338c add script for measuring performance 2013-07-01 13:30:49 +04:00
Dmitry Babokin
ec1095624a Merge pull request #527 from tkoziara/master
examples/sort added
2013-06-25 10:11:39 -07:00
Tomasz Koziara
a23d69ebe8 Copyright changed to simplify legal matters. 2013-06-25 17:28:27 +01:00
Dmitry Babokin
0aff61ffc6 Merge pull request #533 from dbabokin/patch
Quick fix for LLVM 3.3 patch
2013-06-25 08:50:32 -07:00
Dmitry Babokin
05aa540984 Quick fix for LLVM 3.3 patch 2013-06-25 19:49:41 +04:00
Dmitry Babokin
033e83e490 Merge pull request #532 from dbabokin/release_1_4_3
Release 1.4.3
2013-06-25 07:42:08 -07:00
Dmitry Babokin
594485c38c Release 1.4.3 2013-06-25 18:38:21 +04:00
Dmitry Babokin
d52e2d5a8d License update (just dates) 2013-06-25 17:02:42 +04:00
Dmitry Babokin
1e5d852e2f Merge pull request #531 from ifilippov/qsize_fail
replacement of qsize due to it's fails on MacOS
2013-06-25 05:36:45 -07:00
Ilia Filippov
cc32d913a0 replacement of qsize due to it's fails on MacOS 2013-06-25 16:27:25 +04:00
Dmitry Babokin
fc66066d4d Merge pull request #530 from dbabokin/llvm_fix
Adding LLVM patch to fix #519 with LLVM 3.3
2013-06-25 05:22:09 -07:00
Dmitry Babokin
6169338815 Adding LLVM patch to fix #519 with LLVM 3.3 2013-06-25 16:21:14 +04:00
Tomasz Koziara
86ee8db778 Parallel prefix sum added + minor amendements. 2013-06-25 12:45:51 +01:00
Dmitry Babokin
6bc8cb1ff1 Merge pull request #529 from ifilippov/instrument_fix
correction of --instrument option support
2013-06-25 03:08:02 -07:00
Dmitry Babokin
0fc49b1c37 Merge pull request #528 from ifilippov/test3
Reapplying lost commits
2013-06-25 02:14:24 -07:00
Ilia Filippov
9fb981e9a0 correction of --instrument option support 2013-06-25 12:33:23 +04:00
Ilia Filippov
cba1b3cedd additional libraries for LLVM_3_4 build 2013-06-25 12:22:53 +04:00
Ilia Filippov
12c4512932 adding two additional libraries for LLVM_3_4 build 2013-06-25 12:22:53 +04:00
Tomasz Koziara
f2452f040d First commit of the radix sort example. 2013-06-24 18:37:44 +01:00
Dmitry Babokin
0dd1dbb568 Merge pull request #526 from dbabokin/master
Tracking LLVM trunk: removing llvm::createSimplifyLibCallsPass() call
2013-06-23 23:10:19 -07:00
Dmitry Babokin
fdcec5a219 Tracking LLVM trunk: removing llvm::createSimplifyLibCallsPass() call 2013-06-24 10:08:06 +04:00
Dmitry Babokin
bebab7ab0d Merge pull request #525 from dbabokin/debug
--debug output: stdout instead of stderr
2013-06-21 03:56:17 -07:00
Dmitry Babokin
fb771b6aa3 --debug output: stdout instead of stderr 2013-06-20 22:47:29 +04:00
jbrodman
8156559475 Merge pull request #522 from dbabokin/broadcast
Fix for #520
2013-06-18 11:47:24 -07:00
jbrodman
9f5e51cd01 Merge pull request #523 from dbabokin/tot
Tracking ToT changes
2013-06-18 11:47:16 -07:00
Dmitry Babokin
27daab2f1b Fix for #520 2013-06-18 22:15:49 +04:00
Dmitry Babokin
c4d404b15f Tracking ToT changes: changes in MCContext interface 2013-06-18 22:13:14 +04:00
Dmitry Babokin
95fcdc36ee Tracking ToT changes, which now require to link option library. This is Unix only. Windows will be fixed separately 2013-06-18 22:12:33 +04:00
Dmitry Babokin
2fdaba53c1 Merge pull request #517 from ifilippov/bug_34
Fix for tests/soa-22 on x86/sse4 - cleanup in function LLVMFlattenInsertChain().
2013-06-14 08:40:01 -07:00
Ilia Filippov
5c89080469 changes in function LLVMFlattenInsertChain 2013-06-14 16:38:54 +04:00
Ilia Filippov
d92f9df17c changes in function LLVMFlattenInsertChain 2013-06-14 15:21:45 +04:00
Dmitry Babokin
f551390420 Merge pull request #516 from ifilippov/master
Changes to support skipping tests.
2013-06-13 08:48:29 -07:00
Ilia Filippov
8642b4d89f changing run_tests to support skipping tests and time 2013-06-13 19:25:34 +04:00
Ilia Filippov
6fb70c307d changing run_tests to support skipping tests and time 2013-06-13 19:00:02 +04:00
Ilia Filippov
d08346fbcf changes to support skipping tests 2013-06-13 16:47:10 +04:00
jbrodman
141d240a91 Merge pull request #513 from dbabokin/release_142
Release 1.4.2, 11 June 2013
2013-06-11 07:47:37 -07:00
Dmitry Babokin
cf9ceb6bf9 Release 1.4.2, 11 June 2013 2013-06-11 17:18:54 +04:00
Dmitry Babokin
7589ae0de5 Merge pull request #512 from ifilippov/bug_34
Fix to track LLVM 3.4 ToT changes
2013-06-04 07:10:04 -07:00
jbrodman
f46e5b37e9 Merge pull request #511 from dbabokin/win32
Fix for #503 - avoid omitting frame pointer on Win32
2013-06-04 06:43:53 -07:00
Ilia Filippov
560acd5017 changes to support createFunction() with DICompositeType argument in LLVM_3_4 2013-06-04 15:48:39 +04:00
Dmitry Babokin
2267f278d2 Fix for #503 - avoid omitting frame pointer on Win32 2013-06-04 14:51:36 +04:00
jbrodman
0feeef585c Merge pull request #509 from jbrodman/master
Change generic-16's knc.h to use __mmask16 instead of a struct.
2013-05-30 13:21:23 -07:00
james.brodman
6211966c55 Change mask to use __mmask16 instead of a struct. 2013-05-30 16:04:44 -04:00
Dmitry Babokin
92f591b4bd Merge pull request #508 from dbabokin/master
Bumping version to 1.4.1dev
2013-05-28 08:59:13 -07:00
Dmitry Babokin
29ceb42b7b Bumping version to 1.4.1dev 2013-05-28 19:58:27 +04:00
Dmitry Babokin
adaabe5993 Merge pull request #507 from dbabokin/master
Bumping up to 1.4.1 version
2013-05-28 08:49:14 -07:00
Dmitry Babokin
6c392ee4a1 Changes for 1.4.1 release 2013-05-28 19:46:30 +04:00
jbrodman
7699eda5ba Merge pull request #506 from jbrodman/master
Typo Fix
2013-05-28 08:13:03 -07:00
james.brodman
d8b5fd5409 Typo fix. 2013-05-28 11:13:43 -04:00
Dmitry Babokin
b37ffdbe85 Merge pull request #505 from dbabokin/release
Changes for 1.4.0 release
2013-05-27 06:03:22 -07:00
Dmitry Babokin
481bcc732b Changes for 1.4.0 release 2013-05-27 16:48:41 +04:00
jbrodman
ce175aee4c Merge pull request #504 from dbabokin/attr
Adding missing attributes on exported functions
2013-05-24 07:38:03 -07:00
jbrodman
50896b373b Merge pull request #502 from dbabokin/malloc
Aligned memory allocation generalization.
2013-05-24 07:37:41 -07:00
Dmitry Babokin
1a40f936df Windows build cleanup: moving generated lex.cc and parse.cc to build dir 2013-05-24 10:29:01 +04:00
Dmitry Babokin
1024ba9b0f Parameter cast for posix_memalign(), otherwise gcc issues an error 2013-05-24 10:29:01 +04:00
Dmitry Babokin
1a7ac8b804 Enable memory alignment management via compiler options 2013-05-24 10:29:01 +04:00
Dmitry Babokin
7bedb4a081 Add memory alignment dependant on the platform (16/32/64/etc) 2013-05-24 10:29:01 +04:00
Dmitry Babokin
630215f56f Defining memory routines completely separately for Windows/Unix 32/64 bit. 2013-05-24 10:29:01 +04:00
Dmitry Babokin
6f0e5fd402 Adding RUNTIME define to gen-bitcode-* files generation command 2013-05-24 10:29:01 +04:00
Dmitry Babokin
66ec43739a Moving temporary files to Debug/Release folder on Windows 2013-05-24 10:29:01 +04:00
Dmitry Babokin
44f9d1ed78 Fix for CYGWIN build warnings (Windows stlye slash, instead of Unix style) 2013-05-24 10:29:01 +04:00
Dmitry Babokin
c6d479b8ad Enabling 32/64 bit version library build on Windows 2013-05-24 10:29:00 +04:00
Dmitry Babokin
80e2f4e342 Removing redundant Debug/Release records in VS proj file - they are unified 2013-05-24 10:29:00 +04:00
Dmitry Babokin
4b388edca9 Splitting .ll files to be compiled in two versions - 32 and 64 bit. Unix only 2013-05-24 10:29:00 +04:00
Dmitry Babokin
5362dade37 Fixing util.m4 to declare nothing unless some macro is instantiated 2013-05-24 10:29:00 +04:00
Dmitry Babokin
3d24265d50 Adding missing attributes on exported functions 2013-05-24 10:28:06 +04:00
Dmitry Babokin
7057fc2930 Merge pull request #501 from jbrodman/sven
Fix branch tests for breaks and returns.
2013-05-21 08:07:10 -07:00
jbrodman
e661ee95ff Merge pull request #499 from dbabokin/debug
Fix for #462: broken debug infor support with LLVM 3.3+
2013-05-21 08:00:14 -07:00
jbrodman
333b89fa07 Merge pull request #500 from dbabokin/docs
Bringing docs/ispc.rst in sync with ispc.html at web site
2013-05-21 07:59:18 -07:00
jbrodman
04e888548e Merge pull request #497 from dbabokin/knc-vector
Minor fix for generic DataLayout
2013-05-21 07:59:01 -07:00
james.brodman
403d9e1059 Update break/continue test to use contribution of function mask. 2013-05-21 10:52:38 -04:00
james.brodman
4ea02c59d8 Disable break optimization and change return check to use full mask. 2013-05-21 10:00:22 -04:00
Dmitry Babokin
de7ba7a55b Bringing docs/ispc.rst in sync with ispc.html at web site (some changes were done there directly) 2013-05-21 16:44:46 +04:00
Dmitry Babokin
23ba61e76f Fix for #462: broken debug infor support with LLVM 3.3+ 2013-05-20 22:28:47 +04:00
Dmitry Babokin
c0aa7e0314 Merge pull request #498 from jbrodman/master
Codegen Cleanup
2013-05-17 11:31:31 -07:00
james.brodman
9f44e597d6 Additional Not -> Xor w/ MaskAllOn 2013-05-15 18:15:41 -04:00
james.brodman
60c5bef90f Simplify ~mask codegen to emit single XOR like other places in the code. 2013-05-15 16:57:41 -04:00
Dmitry Babokin
a38fcf1127 Merge pull request #496 from dbabokin/llvm34
Enabling llvm 3.4
2013-05-15 04:40:47 -07:00
Dmitry Babokin
f22e237381 Minor fix for generic DataLayout 2013-05-13 20:24:51 +04:00
Dmitry Babokin
b6b9daa3c5 Enabling llvm 3.4 2013-05-13 19:25:31 +04:00
jbrodman
d958b0b9d6 Merge pull request #495 from jbrodman/master
knc.h cleanup
2013-05-10 13:24:54 -07:00
james.brodman
7b2eaf63af knc.h cleanup 2013-05-10 13:36:18 -04:00
Dmitry Babokin
aef20b536a Merge pull request #489 from jbrodman/gacteon
Fix for cases where valid lvalues were not being computed.
2013-05-04 02:38:11 -07:00
jbrodman
1a34b1410f Merge pull request #488 from dbabokin/broadcast_library
Efficient library implementation of broadcast
2013-05-03 11:04:26 -07:00
james.brodman
5af2f80bc5 Fix for cases where valid lvalues were not being computed. 2013-05-03 12:12:42 -04:00
Dmitry Babokin
4d241736f0 Merge pull request #487 from dbabokin/win_output
Typos and formatting fixes on Windows
2013-05-01 17:07:28 -07:00
Dmitry Babokin
a47460b4c3 Efficient library implementation of broadcast 2013-05-02 00:12:16 +02:00
Dmitry Babokin
32be338f60 Minor indentation fix 2013-05-02 00:05:17 +02:00
Dmitry Babokin
549655bff4 Adding new line to error/warning message on Windows and fixing some typos. 2013-05-01 20:22:01 +02:00
jbrodman
3e18cec691 Merge pull request #486 from jbrodman/master
Add check for Enum type in Assert
2013-04-30 13:12:46 -07:00
james.brodman
658dd3486b Add check for enum type in Assert. 2013-04-30 16:10:57 -04:00
jbrodman
018e9a12a3 Merge pull request #484 from dbabokin/malloc
Fix for aligned move of unaligned data in 32 bit platforms.
2013-04-30 12:02:04 -07:00
jbrodman
2027a6ac12 Merge pull request #483 from dbabokin/win_rm_files
Fix for removing temp files on Windows
2013-04-30 11:03:48 -07:00
Dmitry Babokin
26bec62daf Removing duplicating free defintion on Linux 2013-04-27 00:29:51 +04:00
Dmitry Babokin
7497e86902 Adding Windows support for aligned memory allocation on Windows 2013-04-26 22:07:30 +02:00
Dmitry Babokin
e084f1c311 Adding missing copyright info in Makefile 2013-04-26 19:11:20 +02:00
Dmitry Babokin
95950885cf Use posix_memalign to allocate 16 byte alligned memeory on Linux/MacOS. 2013-04-26 20:33:24 +04:00
Dmitry Babokin
9cd84aeea9 Fix for removing temp files on Windows 2013-04-25 22:50:37 +02:00
jbrodman
d324ec247e Merge pull request #482 from dbabokin/sprintf
Some more clean and stability fixes
2013-04-25 12:07:27 -07:00
Dmitry Babokin
cbb0d6ce06 Don't run many threads when only one test is specified 2013-04-25 21:12:16 +04:00
Dmitry Babokin
d36ab4cc3c Adding noalias attribute to malloc return 2013-04-25 20:39:01 +04:00
Dmitry Babokin
1069a3c77e Removing some sources of warnings sse4.h and trailing spaces 2013-04-25 03:40:32 +04:00
Dmitry Babokin
36da23c9c5 MacOS specific fix: enable translation of symbols with variant name. It's needed for fputs() translation from __do_print() for generic targets. 2013-04-25 03:11:39 +04:00
Dmitry Babokin
e756daa261 Remove sprintf warnings on Windows and fix sprintf-related fails on Mac 2013-04-24 22:36:48 +02:00
jbrodman
65ac336211 Merge pull request #481 from dbabokin/win_cleanup
Set of changes for better build and testing experience on Windows
2013-04-24 08:36:53 -07:00
Dmitry Babokin
14fe987956 More portable way of doing print in run_tests.py 2013-04-24 17:11:30 +02:00
Dmitry Babokin
c5acf239f2 Pass lock as a parameter to subprocesses to make task counter work on Windows 2013-04-24 01:14:46 +02:00
Dmitry Babokin
a02500b112 Make update look good in standard 80 char terminal (print in single line).
Issue a message about used compiler only once on Windows.
2013-04-24 00:21:18 +02:00
Dmitry Babokin
8fea85a85c Pass total number of tests as expicit parameter to subprocesses, so it works on Windows 2013-04-23 23:47:21 +02:00
Dmitry Babokin
6f42bfc640 Fixing native testing on Windows
All temporary files are stored in tmp* directories, including generic targets
Generic target are handled correctly on Windows now (still fail for different reasons)
2013-04-23 22:47:38 +02:00
Dmitry Babokin
6b8741dbd7 Removing 4244 warning and cleanup of linked libraries list 2013-04-23 15:21:33 +02:00
jbrodman
ce5d4b6ccc Merge pull request #477 from dbabokin/broadcast
One more opportunity to do better broadcast
2013-04-18 07:57:47 -07:00
Dmitry Babokin
eb2e5f378c Comment fixes 2013-04-18 15:36:35 +04:00
jbrodman
9251ea2b22 Merge pull request #476 from dbabokin/win_dl
Fix for DataLayout inconsistency on win32
2013-04-17 13:15:03 -07:00
Dmitry Babokin
cb650d6100 One more opportunity to do better broadcast 2013-04-17 20:56:32 +04:00
jbrodman
3f3d9219ea Merge pull request #475 from dbabokin/path
Fix for #474: colon separated path in -I
2013-04-17 07:44:37 -07:00
Dmitry Babokin
11528b0def Fix for #474: colon separated path in -I 2013-04-17 18:38:57 +04:00
Dmitry Babokin
a5f0e713d6 Fix for DataLayout inconsistency on win32 2013-04-17 17:20:17 +04:00
jbrodman
de3a864bd6 Merge pull request #473 from dbabokin/win_test
Avoid double spaces in error messages to get same messages on Win and Unix
2013-04-16 07:22:11 -07:00
Dmitry Babokin
af551b3c09 Avoid double spaces in error messages to get same messages on Windows and Unix 2013-04-15 19:10:10 +04:00
jbrodman
5d64d23b61 Merge pull request #472 from dbabokin/gcc47
Fix for the bug with gcc4.7: incorrect usage of ArrayRef
2013-04-15 04:37:55 -07:00
Dmitry Babokin
269a00f9ec Fix for the bug with gcc4.7: incorrect usage of ArrayRef 2013-04-15 13:16:28 +04:00
jbrodman
29c482789f Merge pull request #471 from dbabokin/multi_targets
#469: Fix for multi-target compilation
2013-04-12 07:58:05 -07:00
Dmitry Babokin
a0462fe1ee #469: Fix for multi-target compilation 2013-04-12 14:06:12 +04:00
jbrodman
78a840f48d Merge pull request #467 from dbabokin/broadcast
Broadcast implementation as InsertElement+Shuffle and related improvements
2013-04-11 13:42:56 -07:00
Dmitry Babokin
7371d82bdf Changes in cbackend.cpp to match broadcast generation changes 2013-04-12 00:10:41 +04:00
Dmitry Babokin
4c35d9456a Additional cleanup to enable more broadcasts 2013-04-10 15:34:21 +04:00
Dmitry Babokin
0704081d91 ctx.h copyright fix 2013-04-10 02:33:41 +04:00
Dmitry Babokin
5898532605 Broadcast implementation as InsertElement+Shuffle and related improvements 2013-04-10 02:18:24 +04:00
Dmitry Babokin
603abf70dc Merge pull request #463 from jbrodman/master
Fix to avoid generating 64-bit gather/scatter when force32BitAddressing is set
2013-04-08 05:14:25 -07:00
james.brodman
0a3822f2e5 Fix to make sure we're generating 32-bit gather/scatter when force32bitaddressing is set. 2013-04-05 16:21:05 -04:00
Dmitry Babokin
f76eb2b7f5 Merge pull request #460 from Vsevolod-Livinskij/master
Fix for issue #453
2013-04-04 08:59:49 -07:00
jbrodman
08bbf5f5ef Merge pull request #458 from dbabokin/dl_check
DataLayout consistency check
2013-04-03 11:03:29 -07:00
Vsevolod Livinskij
4ea08116b8 Issue #453: now run_tests.py checks ispc_exe availability 2013-04-03 18:04:11 +04:00
Vsevolod Livinskij
78e03c6402 Issue #453: now run_tests.py checks ispc_exe availability otherwise prints an error message and exits 2013-04-03 16:51:16 +04:00
Vsevolod Livinskij
4ab89de343 Issue #453: now run_tests.py checks ispc_exe availability otherwise prints an error message 2013-04-03 02:58:16 +04:00
Vsevolod Livinskij
6db460fb81 Issue #453: now run_tests.py checks ispc_exe availability otherwise prints an error message 2013-04-03 02:34:42 +04:00
Dmitry Babokin
be859df51e Fix for #457 - issue with compiler Unicode output 2013-04-03 02:23:06 +04:00
Dmitry Babokin
3c533f2ba4 Fix for the issue #449: warning due to DataLayout mismatch 2013-04-02 19:08:44 +04:00
Dmitry Babokin
9cf037c90f Merge pull request #451 from Vsevolod-Livinskij/master
Issue #436
2013-04-02 02:12:57 -07:00
Vsevolod Livinskij
9e0425e824 Checks the required compiler otherwise prints an error message and exits program 2013-03-29 18:42:02 +04:00
Vsevolod Livinskij
2960479095 Issue 2013-03-29 18:30:23 +04:00
jbrodman
baf1ff033a Merge pull request #450 from jbrodman/master
Added implementations of 3 intrinsics for double precision vectors
2013-03-28 08:57:02 -07:00
james.brodman
52dcbf087a Implemented 3 more intrinsics on double precision vectors 2013-03-28 11:55:53 -04:00
jbrodman
d1c6331924 Merge pull request #448 from dbabokin/target_class
Redesign of Target class with several fixes
2013-03-25 10:21:00 -07:00
Dmitry Babokin
0af2a13349 DataLayout is changed to be managed from single place. v4-128-128 is added to generic DataLayout 2013-03-23 14:38:51 +04:00
Dmitry Babokin
7f0c92eb4d Fix for #431: memory leak due to multiple TargetMachine creation 2013-03-23 14:33:45 +04:00
Dmitry Babokin
0f86255279 Target class redesign: data moved to private. Also empty target-feature attribute is not added anymore (generic targets). 2013-03-23 14:28:05 +04:00
jbrodman
b2517c8a18 Merge pull request #445 from dbabokin/master
Adding information about compiler used to build ISPC to make output.  Also adds line endings cleanup.
2013-03-19 08:09:31 -07:00
Dmitry Babokin
95d0c5e67b Merge branch 'master' of https://github.com/dbabokin/ispc 2013-03-18 16:40:54 +04:00
Dmitry Babokin
0885b2bf23 Merge pull request #444 from dbabokin/master
Fixing tabs and trainling white spaces
2013-03-18 05:39:13 -07:00
Dmitry Babokin
706337bb5a Doxygen fix: don't identify the words Type and Target as class names 2013-03-18 16:31:56 +04:00
Dmitry Babokin
3f8a678c5a Editorial change: fixing trailing white spaces and tabs 2013-03-18 16:17:55 +04:00
Dmitry Babokin
0f631ad49b Add info about compiler used for ispc build to Makefle output 2013-03-18 12:30:06 +04:00
Dmitry Babokin
5bc3b4f768 Merge pull request #442 from dbabokin/master
Preprocessor should consider "line comments"
2013-03-14 01:48:49 -07:00
jbrodman
e8bd464ab2 Merge pull request #443 from jbrodman/master
Tweak to sse4.h intrinsics include file.
2013-03-13 07:59:27 -07:00
james.brodman
ef1af547e2 Change sse4.h to enable inlining. 2013-03-13 10:55:53 -04:00
Dmitry Babokin
f2dcad27bb Fix for LLVM 3.1 and #441 2013-03-12 21:13:08 +04:00
Dmitry Babokin
01992006b2 Fix for #441: Prepocessor complains on code commented out by // 2013-03-12 18:56:32 +04:00
jbrodman
487820fb4d Merge pull request #437 from dbabokin/docs
Docs
2013-03-04 10:48:35 -08:00
Dmitry Babokin
d760d67598 Doxygen fix: don't identify the word Declaration as a class 2013-03-04 03:28:02 +04:00
Dmitry Babokin
3cb827ac56 Fix for some typos in User's Guide 2013-03-04 03:04:46 +04:00
pengtu
15c5dfc12b Merge pull request #435 from dbabokin/master
Tracking ToT changes in DIBuilder interface
2013-02-28 12:25:45 -08:00
Dmitry Babokin
524939dc5b Fix for issue #430 2013-02-27 18:03:07 +04:00
Dmitry Babokin
51fdff208e Tracking ToT changes in DIBuilder interface 2013-02-25 14:50:33 +04:00
jbrodman
65a309648b Merge pull request #434 from dbabokin/master
Fix for #433
Reading the LLVM lists makes me concur.  The functionality (if it was even needed) was merged into other existing infrastructure.
2013-02-23 09:09:30 -08:00
Dmitry Babokin
7d08eeb8dd Fix for #433: fix for ToT changes, removal of llvm::createGCInfoDeleter() 2013-02-23 20:49:56 +04:00
Dmitry Babokin
9e0428ba0d One more missing file in doxygen.cfg: cbackend.cpp 2013-02-23 20:45:31 +04:00
Jean-Luc Duprat
15461a2460 Merge pull request #432 from dbabokin/master
Some minor changes in Makefile and doxygen.cfg
2013-02-22 10:29:17 -08:00
Dmitry Babokin
88f21b5c57 Update for doxygen.cfg: adding ast.* and func.* and fixing builtins-c.c => builtins/builtins.c 2013-02-21 18:13:22 +04:00
Dmitry Babokin
bee3029764 Adding debug and clang targets, changing asan target 2013-02-21 17:26:21 +04:00
Dmitry Babokin
150d6d1f56 Adding Address Sanitizer build 2013-02-15 06:50:26 -08:00
jbrodman
d03e4ac100 Merge pull request #429 from dbabokin/master
Fix for #428
2013-02-11 11:10:32 -08:00
Dmitry Babokin
8d8d9c63fe Fix for #349: build issue when no git found 2013-02-11 11:01:46 -08:00
Dmitry Babokin
52147ce631 Fixing issue #428: need to specify LLVM libs explicitly 2013-02-11 04:15:50 -08:00
jbrodman
5af41df8a5 Merge pull request #427 from jbrodman/master
Clang ToT fixes.
2013-02-04 10:56:37 -08:00
james.brodman
775ecd6dfe Tracking ToT changes. Clang PP APIs changed. 2013-01-30 11:57:33 -05:00
jbrodman
b139235c62 Merge pull request #425 from jbrodman/master
Tracking Attribute API changes in ToT
2013-01-23 07:55:33 -08:00
james.brodman
8f2c910600 Tracking ToT changes to Attribute API 2013-01-23 10:57:05 -05:00
jbrodman
7e67f01d4b Merge pull request #424 from jbrodman/master
Tracking Attribute API changes in ToT
2013-01-22 07:45:20 -08:00
james.brodman
ad7e800446 Tracking Attribute API Changes in ToT 2013-01-22 10:46:42 -05:00
Jean-Luc Duprat
6326924de7 Fixes to the implementations of any() and none() in the stdlib.
These make sure that inactive vector lanes do not interfere with the results
2013-01-18 11:19:54 -08:00
jbrodman
801a3a6ff5 Merge pull request #423 from jbrodman/master
Properly size double/int64 initializers.
2013-01-18 08:56:48 -08:00
james.brodman
a4e94a26ba Tweak to not oversize short vec types for 64 bit values 2013-01-17 15:45:51 -05:00
jbrodman
44e6be7914 Merge pull request #422 from jbrodman/master
Fixes to build with ToT / M4 Macro Fix
2013-01-14 11:57:25 -08:00
james.brodman
3aaf2ef2d4 ToT Fixes / M4 macro fix 2013-01-14 14:55:10 -05:00
jbrodman
8f902fde9c Merge pull request #420 from jbrodman/master
Fix for c++ backend
2013-01-08 11:55:03 -08:00
james.brodman
b6023c517e Fix/Hack to avoid the cbackend generating spurious array type declarations. 2013-01-08 14:53:17 -05:00
james.brodman
42d77e9191 Modified to mirror asin.ispc and not fail. 2013-01-08 14:33:32 -05:00
jbrodman
312a7582df Merge pull request #419 from jbrodman/master
Fix to acos.ispc test
2013-01-08 11:32:34 -08:00
jbrodman
dc939eba78 Merge pull request #418 from mmp/master
Fix build with LLVM top-of-tree, fix warnings, remove LLVM 3.0 support
2013-01-08 10:28:02 -08:00
jbrodman
f8bec51de2 Merge pull request #411 from pengtu/master
Simple fixes to allow SOA pointer and array be passed as function argument.
2013-01-08 08:40:01 -08:00
Matt Pharr
0bf1320a32 Remove support for building with LLVM 3.0 2013-01-06 12:27:53 -08:00
Matt Pharr
81dbd504aa Small fixes to eliminate compiler warnings when using clang 2013-01-06 12:10:54 -08:00
Matt Pharr
63dd7d9859 Fix build to work with LLVM top-of-tree again 2013-01-06 12:02:08 -08:00
Jean-Luc Duprat
2063d34f3e Merge pull request #414 from jbrodman/master
Fix to build with 3.2
2013-01-03 11:00:45 -08:00
james.brodman
83fdc2e5ad Fix to build with 3.2. LLVM API Change? 2013-01-03 13:43:47 -05:00
Peng Tu
6ba7368ab0 Fix two compile time errors to allow SOA pointer and array be passed as function argument. 2012-12-11 17:20:15 -08:00
Jean-Luc Duprat
c2805942a9 Merge pull request #409 from mmp/master
Bugfix for issue #408.
2012-12-06 09:46:34 -08:00
Matt Pharr
9892c8bf9a Fix logic for ordering of struct declarations in generated header files.
When a struct had an array of another struct type as a member, we weren't
detecting that the struct type in the array needed to be declared before the
enclosing struct type.

Fixes issue #408.
2012-12-06 11:39:22 -05:00
Matt Pharr
23e5877509 merge 2012-12-02 14:32:52 -08:00
Matt Pharr
8cbfde6092 Small fixes to build with LLVM top-of-tree (now numbered as version 3.3) 2012-12-02 14:29:24 -08:00
Jean-Luc Duprat
24087ff3cc Expose none() in the ISPC standard library.
On KNC: all(), any() and none() do not generate a redundant movmsk instruction.
2012-11-27 13:38:28 -08:00
Jean-Luc Duprat
6827001c1d Merge pull request #406 from pengtu/master
Fix ISPC with LLVM TOT build problem
2012-11-22 09:27:10 -08:00
Peng Tu
16b0806d40 Fix LLVM TOT build issue. 2012-11-21 19:09:10 -08:00
Jean-Luc Duprat
2129b1e27d knc.h: Fixed __rsqrt_varying_float() to use _mm512_invsqrt_ps() instead of _mm512_invsqrt_pd()
This was a typo.
2012-11-21 15:40:35 -08:00
Jean-Luc Duprat
a267762f59 Merge pull request #404 from mmp/master
Fix build with LLVM top-of-tree
2012-11-21 10:37:40 -08:00
Jean-Luc Duprat
65ca795030 Merge pull request #405 from jbrodman/master
Tweaked Scalar Repl of Aggregates Optimization
2012-11-19 13:12:26 -08:00
Matt Pharr
e82b649ec0 Fix build with LLVM top-of-tree (various changes to clang entrypoints). 2012-11-16 11:04:11 -08:00
james.brodman
275cdb1713 Merge branch 'master' of https://github.com/ispc/ispc 2012-11-14 13:30:45 -05:00
Jean-Luc Duprat
d3b86dcc90 KNC: fix implementation of __all() to use KNCni mask test instructions... 2012-11-14 09:24:01 -08:00
james.brodman
c736b75075 Merge branch 'master' of https://github.com/ispc/ispc 2012-11-13 17:08:09 -05:00
Jean-Luc Duprat
b601331362 Approximation for inverse sqrt and reciprocal provided in fast math mode.
RCP was actually slow in fast math mode
   Inverse sqrt did not expose fast approximation
2012-11-13 14:01:35 -08:00
ptu1
32d44a5b9e Merge branch 'master' of ssh://fmygit6001.fm.intel.com:29418/ssg_dpd_tpi_ispc-ispc_git 2012-11-13 12:47:13 -08:00
ptu1
810784da1f Set the ScalarReplAggregate maximum structure size based on target vector width. 2012-11-13 12:35:45 -08:00
james.brodman
d517b37f3f Merge branch 'master' of https://github.com/ispc/ispc 2012-11-09 10:14:18 -05:00
Jean-Luc Duprat
adeef0af01 Merge pull request #403 from jbrodman/master
Fixed =/== error for KNC intrinsic implementation of __all()
2012-11-08 13:57:42 -08:00
james.brodman
97ddc1ed10 Fixed =/== error in __all() 2012-11-08 16:30:12 -05:00
james.brodman
bf580648a1 Merge branch 'master' of https://github.com/ispc/ispc 2012-11-06 12:03:27 -05:00
Jean-Luc Duprat
ecc54fa0eb Merge pull request #402 from pengtu/master
Fix a bug where an unsigned index variable in subscript is sxt to 64 bit
2012-11-05 21:51:38 -08:00
Peng Tu
04d32ae3e6 Inside LLVM, both signed and unsigned integer are represented with the same type - i32 - effectively a signed int32. On 64 bit target, we must generate explicit sxt/zxt during the LLVM IR creation to promote the array index into 64 bit. Otherwise, an unsigned int index becomes signed int index in the LLVM IR.
I limit the fix to uniformed index to avoid widening a varying index vector to 64 bits.  This means that the 32 bit values in the varying indices must be positive and smaller than 2^31 at the runtime for a program to behave correctly.
2012-11-05 15:02:15 -08:00
james.brodman
a18b3ae88e Merge branch 'master' of https://github.com/ispc/ispc 2012-10-31 15:25:41 -04:00
james.brodman
e57801a5d1 Typo Fix 2012-10-31 15:25:26 -04:00
ingowald
da4390aede Merge pull request #401 from pengtu/master
Fix a "continue" handling bug in foreach_unique/foreach_active
2012-10-30 01:46:01 -07:00
Peng Tu
9e85667219 Merge remote branch 'upstream/master' 2012-10-29 22:51:22 -07:00
Peng Tu
b80867d473 Move the call to RestoreContinuedLanes from bbBody to the correct place of bbCheckForMore for foreach_unique and foreach_active. 2012-10-29 17:27:11 -07:00
Jean-Luc Duprat
d742dcce59 Merge pull request #400 from jbrodman/master
Fixes a compile error in examples/intrinsics/sse4.h. Assignment was used instead of equality comparison.
2012-10-26 14:08:56 -07:00
james.brodman
7a7af3d5f9 Merge branch 'master' of https://github.com/jbrodman/ispc 2012-10-26 16:55:53 -04:00
jbrodman
e323b1d0ad Fixed compile error: == instead of = 2012-10-26 16:55:28 -04:00
james.brodman
3c18c7a713 Fixed compile error: == instead of = 2012-10-26 16:52:54 -04:00
james.brodman
7c16292cb7 Merge branch 'master' of https://github.com/ispc/ispc 2012-10-24 13:49:04 -04:00
Gerrit Code Review
d665e2e85b Initial empty repository 2012-10-24 09:53:29 -07:00
Matt Pharr
172a189c6f Fix build with LLVM top-of-tree 2012-10-17 11:11:50 -07:00
Matt Pharr
406fbab40e Fix bugs in declarations of __any, __all, and __none in examples/intrinsics.
They return bool, not vector of bool.
2012-10-17 10:55:50 -07:00
Matt Pharr
09dc217f8c Fix hex constant in lParseInteger() (missing an f) 2012-10-16 06:03:33 -07:00
Matt Pharr
9002837750 Remove incorrect assert in tasksys.cpp 2012-10-15 10:43:46 -07:00
Matt Pharr
411d5b44ef Add ISPC_HAS_RAND definition on targets that have a HW RNG.
This lets us check for a functioning rdrand() call in the stdlib
more reliably.  Fixes issue #333.
2012-10-03 09:18:12 -07:00
Matt Pharr
360cc8044e Improve RNG documentation.
Issue #390.
2012-10-03 08:33:43 -07:00
Matt Pharr
ec2e9b5e79 Fix typo in assert() documentation.
Issue #388.
2012-10-03 08:26:38 -07:00
Matt Pharr
881dba61e4 Fix build with LLVM top-of-tree 2012-09-28 06:07:01 -07:00
Matt Pharr
6412876f64 Remove unused __reduce_add_uint{32,64} target functions.
The stdilb code just calls the signed int{32,64} functions,
which gives the right result for the unsigned case anyway.
The various targets didn't consistently define the unsigned
variants in any case.
2012-09-28 05:55:41 -07:00
Matt Pharr
538d51cbfe Add GMRES example 2012-09-20 14:06:55 -07:00
Jean-Luc Duprat
3dd9ff3d84 knc.h:
Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used
	Fixed usage of loadunpack and packstore to use proper memory offset
	Fixed implementation of __masked_load_*() __masked_store_*() incorrectly (un)packing the lanes loaded
	Cleaned up usage of _mm512_undefined_*(), it is now mostly confined to constructor
	Minor cleanups

knc2x.h
	Fixed usage of loadunpack and packstore to use proper memory offset
	Fixed implementation of __masked_load_*() __masked_store_*() incorrectly (un)packing the lanes loaded
	Properly pick up on ISPC_FORCE_ALIGNED_MEMORY when --opt=force-aligned-memory is used
	__any() and __none() speedups.
	Cleaned up usage of _mm512_undefined_*(), it is now mostly confined to constructor
2012-09-19 17:11:04 -07:00
Ingo Wald
7f386923b0 Merge branch 'master' of https://github.com/ispc/ispc 2012-09-17 15:54:25 +02:00
Ingo Wald
d2312b1fbd now using the ASSUME_ALIGNED flag in knc.h 2012-09-17 15:54:00 +02:00
Ingo Wald
6655373ac3 commit test 2012-09-17 15:51:37 +02:00
Ingo Wald
d492af7bc0 64-bit gather/scatter, aligned load/store, i8 support 2012-09-17 03:39:02 +02:00
Matt Pharr
230a7b7374 Fix bug with floating-point constant zero vectors.
Issue #377.
2012-09-14 14:24:51 -07:00
Jean-Luc Duprat
4204a752f7 Merge branch 'master' of https://github.com/ispc/ispc 2012-09-14 14:12:49 -07:00
Jean-Luc Duprat
0e88d5f97f Fixed unaligned masked stores on KNC 2012-09-14 14:11:41 -07:00
Matt Pharr
a13e7f2435 #define ISPC_FORCE_ALIGNED_MEMORY, if appropriate, in C++ output. 2012-09-14 13:53:12 -07:00
Matt Pharr
be2108260e Add --opt=force-aligned-memory option.
This forces all vector loads/stores to be done assuming that the given
pointer is aligned to the vector size, thus allowing the use of sometimes
more-efficient instructions.  (If it isn't the case that the memory is
aligned, the program will fail!).
2012-09-14 13:49:45 -07:00
Matt Pharr
59b0a2b208 Mark __any(), __all(), and __none() as internal after they're linked in.
This fixes multiple symbol definition errors when compiling a single binary
for multiple ISA targets.
2012-09-14 13:32:42 -07:00
Matt Pharr
05a5a42a08 Don't force loads/stores from varying types to be unaligned.
These should always actually be aligned in memory.
2012-09-14 12:17:33 -07:00
Jean-Luc Duprat
f0b0618484 Added the following mask tests: __any(), __all(), __none() for all supported targets.
This allows for more efficient code generation of KNC.
2012-09-14 11:06:18 -07:00
Ingo Wald
4ecdbe4bd9 two changes:
- exported structs now get protected with #ifdef/#define blocks (allows including multiple ispc-generated header fiels into the same c source
- when creating offload stubs, encountering a 'export' function for which we cannot produce a stub will only trigger a warning, not an error.
2012-09-08 16:09:04 +02:00
Matt Pharr
9e9f266e52 Add files inadvertently missed in c58d92d46b.
Truly fixes issue #363.
2012-09-07 13:27:07 -07:00
Matt Pharr
0ce67f37ac Use LLVM_VERSION env variable to get LLVM version with MSVC build.
Previously, it was set directly in the ispc.vcxproj file.

Issue #371.
2012-09-06 06:04:32 -07:00
Matt Pharr
ddcd0a49ec Fix bugs with handling of 'continue' statements in foreach_* loops. 2012-09-05 10:16:58 -07:00
Matt Pharr
63b8fac852 Improve naming of temporary variable in IR 2012-09-05 10:13:45 -07:00
Matt Pharr
def8d7850b Fix crasher with malformed programs 2012-09-05 08:43:46 -07:00
Jean-Luc Duprat
0442efc856 Merge branch 'master' of https://github.com/ispc/ispc 2012-09-04 11:00:03 -07:00
Jean-Luc Duprat
f928bbb53c Updated usage of Initial Many Core Instructions (Intel® IMCI) instructions. 2012-09-04 10:57:25 -07:00
Jean-Luc Duprat
1ab7500dbb Updated user's guide to comply with Intel® Xeon Phi™ brand usage guidelines 2012-09-04 10:53:01 -07:00
Matt Pharr
c58d92d46b Issue error if a vector-typed parameter is used in an exported function.
Issue #363.
2012-08-31 06:59:58 -07:00
Matt Pharr
8276e912fd Switch to LLVM 3.1 for default for MSVC builds. Also fixes issue #374 2012-08-31 05:58:39 -07:00
Jean-Luc Duprat
e0490d0df5 Minor fixes needed for building on windows. 2012-08-30 10:56:13 -07:00
Jean-Luc Duprat
11db466a88 Implement the KNC prefetch API so that ISPC prefetch_*() stdlib functions may be used. 2012-08-30 10:24:31 -07:00
Matt Pharr
caaee0b666 Fix crash when using launch with non-task-qualified function 2012-08-29 09:06:47 -07:00
Matt Pharr
f2f470f369 Merge pull request #369 from jduprat/master
Task system updates
2012-08-28 14:01:37 -07:00
Jean-Luc Duprat
09bb36f58c Updated the task system in the example directory to support:
Cilk (cilk_for), OpenMP (#pragma omp parallel for), TBB(tbb::task_group and tbb::parallel_for)
as well as a new pthreads-based model that fully subscribes the machine (good for KNC).
With major contributions from Ingo Wald and James Brodman.
2012-08-28 11:13:12 -07:00
Matt Pharr
21719df6fd remove assert that hit with fast-math if user defined their own functions named rcp() 2012-08-21 16:39:36 -07:00
Matt Pharr
39329809dd fix crash with malformed program 2012-08-21 16:35:31 -07:00
Matt Pharr
44797e2925 remove incorrect assert 2012-08-21 16:27:49 -07:00
Jean-Luc Duprat
c8f373d119 Merge branch 'master' of https://github.com/ispc/ispc 2012-08-15 17:42:00 -07:00
Jean-Luc Duprat
8a22c63889 knc2x.h
Introduced knc2x.h which supprts 2x interleaved code generation for KNC (use the target generic-32).
This implementation is even more experimental and incomplete than knc.h but is useful already (mandelbrot works for example)

knc.h:
Switch to new intrinsic names _mm512_set_1to16_epi32() -> _mm512_set1_epi32(), etc...
Fix the declaration of the unspecialized template for __smear_*(), __setzero_*(), __undef_*()
Specifically mark _mm512_undefined_*() a few vectors in __load<>()
Fixed implementations of some implementations of __smear_*(), __setzero_*(), __undef_*() to remove unecessary dependent instructions.
Implemented ISPC reductions by simply calling existing intrinsic reductions, which are slightly more efficient than our precendent implementation.  Also added reductions for double types.
2012-08-15 17:41:10 -07:00
Matt Pharr
1a4434d314 Fix build with LLVM top-of-tree 2012-08-11 09:28:48 -07:00
Jean-Luc Duprat
165a13b13e knc.h:
vec16_i64 improved with the addition of the following: __extract_element(), insert_element(), __sub(), __mul(),
		   __sdiv(), __udiv(), __and(), __or(), __xor(), __shl(), __lshr(), __ashr(), __select()
	Fixed a bug in the __mul(__vec16_i64, __vec16_i32) implementation
	Constructors are all explicitly inlined, copy constructor and operator=() explicitly provided
	Load and stores for __vec16_i64 and __vec16_d use aligned instructions when possible
	__rotate_i32() now has a vector implementation
	Added several reductions: __reduce_add_i32(), __reduce_min_i32(), __reduce_max_i32(),
	       __reduce_add_f(), __reduce_min_f(), __reduce_max_f()
2012-08-10 12:20:10 -07:00
Matt Pharr
43364b2d69 Loosen tolerances to test passes with FMA on AVX2 2012-08-10 06:52:14 -07:00
Matt Pharr
6eaecd20d5 Mark __{get,set}_system_isa builtins as "internal" functions.
This ensures that they have static linkage, which in turn lets one
have multiple object files compiled to multiple targets without having
those cause link errors.

Issue #355.
2012-08-09 16:12:07 -07:00
Matt Pharr
c80bfeacf6 Fix crashes when input program tried to access undefined struct types.
(This in particular would happen when there was an error in the body of a struct
definition and we were left with an UndefinedStructType and then later tried to
do loads/stores from/to it.)

Issue #356.
2012-08-09 14:59:29 -07:00
Matt Pharr
2a19cc1758 Fix cases where we were trying to type cast instead of type convert.
Also, removed erroneous checks about the type of the test expression
in DoStmt and ForStmt.

These together were preventing conversion of pointer types to boolean
values, so things like "while (ptr)" would improperly not compile.

Issue #346.
2012-08-03 12:47:53 -07:00
Matt Pharr
8f5189f606 Type convert arrays in select expressions to pointers to the first element.
Fixes issue #345.
2012-08-03 11:53:59 -07:00
Matt Pharr
49dde7c6f2 Fix bug in declaration of double-precision sqrt intrinsic for AVX targets.
This was preventing sqrts of uniform double values from being compiled
properly.

Issue #344.
2012-08-03 11:43:31 -07:00
Matt Pharr
765a0d8896 Use puts() rather than printf() for printing assertion failure strings.
This way, we don't lose '%'s in the assertion strings.

Issue #342.
2012-08-03 11:31:38 -07:00
Matt Pharr
19d8f2e258 Generate FMA instructions with AVX2 (when possible).
Issue #320.
2012-08-03 10:43:41 -07:00
Matt Pharr
e6aec96e05 Fix build with LLVM top-of-tree 2012-08-03 09:59:41 -07:00
Jean-Luc Duprat
a2d42c3242 KNC: all masked_load_*() and masked_store_*() functions need to do unaligned accesses 2012-08-01 14:37:25 -07:00
Jean-Luc Duprat
52836aae87 Minor documentation clarrification on the impact of ICC -fp-model except option. 2012-08-01 10:24:35 -07:00
Matt Pharr
bda566d6a7 Fix incorrect assertion 2012-08-01 08:11:32 -07:00
Jean-Luc Duprat
63ed90b0fd docs/build.sh runs rst2html rather than rst2html.py
Explicitly documented that fact that ICC needs the -mmic flag to compile for KNC.
Updated ISPC User Guide with details on ICC compiler options that impact FP performance in generated code.
2012-07-30 11:47:25 -07:00
Matt Pharr
0bb4d282e2 Add sys/types.h include for linux/osx. 2012-07-23 08:32:41 -07:00
Matt Pharr
ae89a65dad Fix bug that caused unterminated basic blocks.
Issue #339.
2012-07-23 08:24:18 -07:00
Matt Pharr
e9fe9f5043 Add cpu strings for Ivy Bridge and HSW.
Default to avx2 ISA for HSW CPUs.
2012-07-23 08:24:18 -07:00
Matt Pharr
ce8dc5927c Fix bug in FunctionEmitContext::MatchIntegerTypes
Cause of issue #329.
2012-07-20 10:05:17 -07:00
Matt Pharr
f6989cce38 Disallow native output with generic targets, C++ output with non-generic targets.
Also wrote FAQs about why this is the way it is.
Issue #334.
2012-07-20 09:55:50 -07:00
Jean-Luc Duprat
6dbbf9aa80 Merge branch 'master' of https://github.com/ispc/ispc 2012-07-19 17:33:00 -07:00
Jean-Luc Duprat
fe6282e837 Fixed small issue with name mangling introduced in aecd6e08 2012-07-19 17:32:49 -07:00
Matt Pharr
51210a869b Support core-avx-i and core-avx2 CPU types.
(And map them to avx1.1 and avx2 targets, respectively.)
2012-07-19 10:15:59 -07:00
Matt Pharr
658652a9ff Merge pull request #331 from jduprat/master
New templated API for __setzero() __undef() and __smear()
2012-07-18 16:39:38 -07:00
Jean-Luc Duprat
aecd6e0878 All the smear(), setzero() and undef() APIs are now templated on the return type.
Modified ISPC's internal mangling to pass these through unchanged.
Tried hard to make sure this is not going to introduce an ABI change.
2012-07-17 17:06:36 -07:00
Jean-Luc Duprat
1334a84861 Merge branch 'master' of https://github.com/ispc/ispc 2012-07-17 11:46:30 -07:00
Matt Pharr
6a410fc30e Emit gather instructions for the AVX2 targets.
Issue #308.
2012-07-13 12:29:05 -07:00
Matt Pharr
984a68c3a9 Rename gen_gather() macro to gen_gather_factored() 2012-07-13 12:24:12 -07:00
Matt Pharr
daf5aa8e8b Run inst combine before memory optimizations.
We were previously emitting 64-bit indexing for some gathers where
32-bit was actually fine, due to some adds of constant vectors
that hadn't been simplified to the result.
2012-07-13 12:14:53 -07:00
Matt Pharr
98b2e0e426 Fixes for intrinsics unsupported in earlier LLVM versions.
Specifically, don't use the half/float conversion routines with
LLVM 3.0, and don't try to use RDRAND with anything before LLVM 3.2.
2012-07-13 12:14:10 -07:00
Matt Pharr
9a1932eaf7 Only set gcc's "-msse4.2", etc, option when compiling for generic targets.
We don't need it when ispc is just generating an object file directly, and gcc
on OS X doesn't recognize -mavx.
2012-07-13 12:02:05 -07:00
Matt Pharr
371d4be8ef Fix bugs in detection of Ivy Bridge systems.
We were incorrectly characterizing them as basic AVX1 without further
extensions, due to a bug in the logic to check CPU features.
2012-07-12 14:11:15 -07:00
Matt Pharr
d180031ef0 Add more tests of basic gather functionality. 2012-07-12 14:05:38 -07:00
Jean-Luc Duprat
e09e953bbb Added a few functions: __setzero_i64() __cast_sext(__vec16_i64, __vec16_i32), __cast_zext(__vec16_i32)
__min_varying_in32(), __min_varying_uint32(), __max_varying_int32(), __max_varying_uint32()
Fixed the signature of __smear_i64() to match current codegen
2012-07-12 10:32:38 -07:00
Matt Pharr
2c640f7e52 Add support for RDRAND in IvyBridge.
The standard library now provides a variety of rdrand() functions
that call out to RDRAND, when available.

Issue #263.
2012-07-12 06:07:07 -07:00
Matt Pharr
2bacebb1fb Doc fixes (Crystal Lemire). 2012-07-11 19:51:28 -07:00
Jean-Luc Duprat
df18b2a150 Fixed missing tmp var needed for use with gather intrinsic 2012-07-11 15:43:11 -07:00
Matt Pharr
216ac4b1a4 Stop factoring out constant offsets for gather/scatter if instr is available.
For KNC (gather/scatter), it's not helpful to factor base+offsets gathers
and scatters into base_ptr + {1/2/4/8} * varying_offsets + const_offsets.
Now, if a HW instruction is available for gather/scatter, we just factor
into base + {1/2/4/8} * offsets (if possible).  Not only is this simpler,
but it's also what we need to pass a value along to the scale by
2/4/8 available directly in those instructions.

Finishes issue #325.
2012-07-11 14:52:29 -07:00
Jean-Luc Duprat
898cded646 Merge branch 'master' of https://github.com/ispc/ispc
Conflicts:
	examples/intrinsics/knc.h
2012-07-11 14:45:00 -07:00
Matt Pharr
c09c87873e Whitespace / indentation fixes. 2012-07-11 14:29:46 -07:00
Matt Pharr
10b79fb41b Add support for non-factored variants of gather/scatter functions.
We now have two ways of approaching gather/scatters with a common base
pointer and with offset vectors.  For targets with native gather/scatter,
we just turn those into base + {1/2/4/8}*offsets.  For targets without,
we turn those into base + {1/2/4/8}*varying_offsets + const_offsets,
where const_offsets is a compile-time constant.

Infrastructure for issue #325.
2012-07-11 14:29:42 -07:00
Matt Pharr
ec0280be11 Rename gather/scatter_base_offsets functions to *factored_based_offsets*.
No functional change; just preparation for having a path that doesn't
factor the offsets into constant and varying parts, which will be better
for AVX2 and KNC.
2012-07-11 14:16:39 -07:00
Matt Pharr
8e19d54e75 Merge pull request #328 from jduprat/explicit_isa_in_tests
Explicit isa in tests
2012-07-10 20:49:37 -07:00
Jean-Luc Duprat
3c070e5e20 run_tests.py will only attempt to use the -mmic flag when the knc.h header is used 2012-07-10 17:07:56 -07:00
Jean-Luc Duprat
dde599f48f run_tests.py now picks the ISA via a -m flag based on the target selected, rather than always picking -msse4.2;
this is needed because -msse4.2 is not supported on KNC.
2012-07-10 16:39:18 -07:00
Jean-Luc Duprat
cc15ecfb3a Merge branch 'master' of https://github.com/ispc/ispc
Conflicts:
	cbackend.cpp
	examples/intrinsics/generic-16.h
	examples/intrinsics/generic-32.h
	examples/intrinsics/generic-64.h
	examples/intrinsics/knc.h
	examples/intrinsics/sse4.h
2012-07-10 16:36:08 -07:00
Jean-Luc Duprat
7a7c54bd59 Minor fixes to knc.h that resulted from integrating bea88ab122 2012-07-10 16:10:48 -07:00
Jean-Luc Duprat
bea88ab122 Integrated changes from mmp/and-fold-opt:
Add peephole optimization to eliminate some mask AND operations.

On KNC, the various vector comparison instructions can optionally
be masked; if a mask is provided, the result is effectively that
the value returned is the AND of the mask with the result of the
comparison.

This change adds an optimization pass to the C++ backend that looks
for vector ANDs where one operand is a comparison and rewrites
them--e.g. "and(equalfloat(a, b), c)" is changed to
"_equal_float_and_mask(a, b, c)", saving an instruction in the end.

Issue #319.

Merge commit '8ef6bc16364d4c08aa5972141748110160613087'

Conflicts:
	examples/intrinsics/knc.h
	examples/intrinsics/sse4.h
2012-07-10 10:33:24 -07:00
Matt Pharr
926b3b9ee3 Fix bugs with mask-handling for switch/do/for/while statements.
All of these pass the current mask to FunctionEmitContext::SetBlockEntryMask()
so that when a break/continue/return is encountered, it can test to see if all
lanes have followed that path and then return; this in turn ensures that we never
run statements with an all-off execution mask.

These functions were passing the function internal mask, not the full mask, and
thus could end up executing code with the mask all off if some lanes were
disabled by an outer function.  (The new tests test this case.)
2012-07-09 15:13:30 -07:00
Matt Pharr
bc7775aef2 Fix __ordered and _unordered floating point functions for C++ target.
Fixes include adding "_float" and "_double" suffixes as appropriate as well
as providing a number of missing implementations.

This fixes a number of failures in the half* tests.
2012-07-09 14:35:51 -07:00
Matt Pharr
107669686c Fix naming of some comparison ops in knc.h 2012-07-09 12:43:15 -07:00
Matt Pharr
bb11b3ab66 Fix build with LLVM 3.0 2012-07-09 10:45:36 -07:00
Jean-Luc Duprat
516ba85abd Merge pull request #322 from mmp/vector-constants
Vector constants
2012-07-09 09:28:26 -07:00
Jean-Luc Duprat
098277b4f0 Merge pull request #321 from mmp/setzero
More varied support for constant vectors from C++ backend.
2012-07-09 08:57:05 -07:00
Matt Pharr
950a989744 Add test that was supposed to go with 080241b7d1 2012-07-09 08:21:15 -07:00
Matt Pharr
fb8b893b10 Fix incorrect LLVM_3_1svn tests.
1. For some time now, we provide the version without the 'svn'
2. We should be testing "not LLVM 3.0" in these cases, since they
   apply to LLVM 3.2 and beyond as well...
2012-07-09 07:09:25 -07:00
Matt Pharr
9ca80debb8 Remove stale LLVM 2.9 support from builtins/util.m4 2012-07-09 06:54:29 -07:00
Matt Pharr
080241b7d1 Fix bugs with handling types of integer constants.
We now follow the rule that the type of an integer constant is
the first of int32, uint32, int64, uint64, that can hold the
value.  (Unless 'u' or 'l' suffixes have been provided.)

Fixes issue #299.
2012-07-08 08:43:03 -07:00
Matt Pharr
0d534720bb Fix bug with constant folding of select expressions.
We would sometimes pass an int32_t * to the ConstExpr constructor
but claim the underlying type was uint32, which made it grumpy.
2012-07-08 08:36:51 -07:00
Matt Pharr
1dc4424a30 Only override module datalayout for generic targets.
Doing it for all targets was causing a number of tests to fail.
(Actual root cause not determined.)
2012-07-07 15:12:50 -07:00
Matt Pharr
57f0cf30c0 Fix small typos in documentation. 2012-07-07 11:19:57 -07:00
Matt Pharr
8ef6bc1636 Add peephole optimization to eliminate some mask AND operations.
On KNC, the various vector comparison instructions can optionally
be masked; if a mask is provided, the result is effectively that
the value returned is the AND of the mask with the result of the
comparison.

This change adds an optimization pass to the C++ backend that looks
for vector ANDs where one operand is a comparison and rewrites
them--e.g. "__and(__equal_float(a, b), c)" is changed to
"__equal_float_and_mask(a, b, c)", saving an instruction in the end.

Issue #319.
2012-07-07 08:35:38 -07:00
Matt Pharr
974b40c8af Add type suffix to comparison ops in C++ output.
e.g. "__equal()" -> "__equal_float()", etc.

No functional change; this is necessary groundwork for a forthcoming
peephole optimization that eliminates ANDs of masks in some cases.
2012-07-07 07:50:59 -07:00
Matt Pharr
45e9e0be0b Map comparison predicates to strings for C++ output in a stand-alone function. 2012-07-06 16:00:09 -07:00
Matt Pharr
ec0918045d Issue error if compiling for multiple targets and program is coming from stdin.
We currently don't support this, so at least now we issue an intelligible error
message in this case.

Issue #269.
2012-07-06 13:21:53 -07:00
Matt Pharr
38bcecd2f3 Print a useful error if llvm-config isn't found when building.
Previously, there was a ton of unintelligible error spew.

Issue #273.
2012-07-06 13:18:11 -07:00
Matt Pharr
aabbdba068 Switch a few remaining fprintf() calls to use Warning()/Error(). 2012-07-06 12:56:45 -07:00
Matt Pharr
84c183da1f Issue error if a non "generic" target is used with C++ emission.
Issue #314.
2012-07-06 12:56:24 -07:00
Matt Pharr
b363b98211 Improve handling of datalayout for generic targets.
Flag 32-bit vector types as only requiring 32-bit alignment (preemptive
bug fix for 32xi1 vectors).

Force module datalayouts to be the same before linking them to silence
an LLVM warning.

Finishes issue #309.
2012-07-06 12:51:17 -07:00
Matt Pharr
8defbeb248 Handle llvm.objectsize intrinsic in C++ backend.
Partially addresses issue #309.
2012-07-06 12:29:23 -07:00
Matt Pharr
f52d227d80 Remove extra newline in error message 2012-07-06 11:31:29 -07:00
Matt Pharr
78cb45fb25 Improve error message with ambiguous function overloads.
Issue #316.
2012-07-06 11:25:57 -07:00
Matt Pharr
2d8026625b Always check the execution mask after break/continue/return.
When "break", "continue", or "return" is used under varying control flow,
we now always check the execution mask to see if all of the program
instances are executing it.  (Previously, this was only done with "cbreak",
"ccontinue", and "creturn", which are now deprecated.)

An important effect of this change is that it fixes a family of cases
where we could end up running with an "all off" execution mask, which isn't
supposed to happen, as it leads to all sorts of invalid behavior.

This change does cause the volume rendering example to run 9% slower, but
doesn't affect the other examples.

Issue #257.
2012-07-06 11:09:11 -07:00
Matt Pharr
73afab464f Provide mask at block entry for switch statements.
This fixes a crash if 'cbreak' was used in a 'switch'.  Renamed
FunctionEmitContext::SetLoopMask() to SetBlockEntryMask(), and
similarly the loopMask member variable.
2012-07-06 11:08:05 -07:00
Matt Pharr
8aa139b6be For C++ output, store constant vector values in local arrays.
When we have a constant vector of primitive types, we now generate
a definition of a static const array of the individual values.  This
in turn allows us to emit a simple aligned vector load to get the
constant vector value, rather than inefficiently inserting the values
into a vector.

Issue #318.
2012-07-06 08:57:09 -07:00
Matt Pharr
e5fe0eabdc Update __load() builtins to take const pointers. 2012-07-06 08:47:47 -07:00
Matt Pharr
0d3993fa25 More varied support for constant vectors from C++ backend.
If we have a vector of all zeros, a __setzero_* function call is emitted,
permitting calling specialized intrinsics for this.  Undefined values
are reflected with an __undef_* call, which similarly allows passing that
information along.

This change also includes a cleanup to the signature of the __smear_*
functions; since they already have different names depending on the
scalar value type, we don't need to use the trick of passing an
undefined value of the return vector type as the first parameter as
an indirect way to overload by return value.

Issue #317.
2012-07-05 20:19:11 -07:00
Jean-Luc Duprat
ac421f68e2 Ongoing support for int64 for KNC:
Fixes to __load and __store.
Added __add, __mul, __equal, __not_equal, __extract_elements, __smear_i64, __cast_sext, __cast_zext,
and __scatter_base_offsets32_float.

__rcp_varying_float now has a fast-math and full-precision implementation.
2012-07-05 17:05:42 -07:00
Jean-Luc Duprat
b9d1f0db18 Ongoing support for int64 for KNC:
Fixes to __load and __store.
Added __add, __mul, __equal, __not_equal, __extract_elements, __smear_i64, __cast_sext, __cast_zext,
and __scatter_base_offsets32_float.

__rcp_varying_float now has a fast-math and full-precision implementation.
2012-07-05 16:56:13 -07:00
Matt Pharr
6aad4c7a39 Bump version number to 1.3.1dev 2012-07-05 13:35:34 -07:00
Matt Pharr
4186ef204d Fix build with LLVM top of tree. 2012-07-05 13:35:01 -07:00
Matt Pharr
ae7a094ee0 Merge pull request #315 from NicolasT/master
Fix build on Fedora 17
2012-07-04 08:21:03 -07:00
Nicolas Trangez
3a007f939a Build: Include unistd.h where required
Some modules require an include of unistd.h (e.g. for getcwd and isatty
definitions).

These changes were required to build successfully on a Fedora 17 system,
using GCC 4.7.0 & glibc-headers 2.15.
2012-07-04 14:49:00 +02:00
Matt Pharr
b8503b9255 News and doxygen version number bump for 1.3.0 2012-06-29 08:38:38 -07:00
Matt Pharr
b7bc76d3cc Documentation updates for 1.3.0. 2012-06-29 08:35:29 -07:00
Matt Pharr
27d6c12972 Bump ISPC_MINOR_VERSION to 3 2012-06-28 16:15:46 -07:00
Matt Pharr
b69d783e09 Bump version to 1.3.0 2012-06-28 15:35:52 -07:00
Matt Pharr
3b2ff6301c Use fputs() rather than puts() for printing final result from print().
puts() sillily adds an undesired newline.
2012-06-28 12:29:40 -07:00
Matt Pharr
6c7043916e Silence bogus compiler warning 2012-06-28 12:11:56 -07:00
Matt Pharr
96a6e75b71 Fix issues with LLVM 3.0 and 3.1 build in cbackend.cpp
Should fix issue #312.
2012-06-28 12:11:27 -07:00
Matt Pharr
a91e4e7981 Fix missing ;s from 66d4c2ddd9 2012-06-28 12:04:58 -07:00
Jean-Luc Duprat
95d8f76ec3 Added prelimary support for Intel's Xeon Phi KNC processor.
float, int32 and double support is included; int8, int16 and int64
not supported yet.

This is work in progress and not considered stable yet.
2012-06-28 12:00:55 -07:00
Jean-Luc Duprat
66d4c2ddd9 When the --emit-c++ option is used, the state of the --opt=fast-math option is passed into the generated C++ code.
If --opt=fast-math is used then the generated code contains:
   #define ISPC_FAST_MATH 1
Otherwise it contains:
   #undef ISPC_FAST_MATH

This allows the generic headers to support the user's request.
2012-06-28 11:17:11 -07:00
Jean-Luc Duprat
8115ca739a Added prelimary support for Intel's Xeon Phi KNC processor.
float, int32 and double support is included; int8, int16 and int64 not supported yet.
This is work in progress and not considered stable yet.
2012-06-28 10:54:09 -07:00
Jean-Luc Duprat
ec4021bbf4 When the --emit-c++ option is used, the state of the --opt=fast-math option is passed into the generated C++ code.
If --opt=fast-math is used then the generated code contains:
   #define ISPC_FAST_MATH 1
Otherwise it contains:
   #undef ISPC_FAST_MATH

This allows the generic headers to support the user's request.
2012-06-28 10:42:29 -07:00
Jean-Luc Duprat
e431b07e04 Changed the C API to use templates to indicate memory alignment to the C compiler
This should help with performance of the generated code.
Updated the relevant header files (sse4.h, generic-16.h, generic-32.h, generic-64.h)

Updated generic-32.h and generic-64.h to the new memory API
2012-06-28 09:29:15 -07:00
Matt Pharr
d34a87404d Provide (undocumented for now) __pause() call to emit PAUSE inst. 2012-06-28 09:28:25 -07:00
Matt Pharr
f38770bf2a Fix build with LLVM ToT 2012-06-28 07:36:10 -07:00
Jean-Luc Duprat
dc9998ccaf Missed a few minor fixes to generic-64.h in previous commit 2012-06-27 17:14:03 -07:00
Jean-Luc Duprat
f1b3703389 Changed the C API to use templates to indicate memory alignment to the C compiler
This should help with performance of the generated code.
Updated the relevant header files (sse4.h, generic-16.h, generic-32.h, generic-64.h)

Updated generic-32.h and generic-64.h to the new memory API
2012-06-27 16:59:26 -07:00
Jean-Luc Duprat
b6a8d0ee7f Merge branch 'master' of git://github.com/ispc/ispc 2012-06-27 10:15:24 -07:00
Jean-Luc Duprat
2a4dff38d0 cbackend.cpp now makes explicit use of the llvm namespace
(Rather than implicitly with a using declaration.)  This will
allow for some further changes to ISPC's C backend, without collision
with ISPC's namespace. This change aims to have no effect on the code
generated by the compiler, it should be a big no-op; except for its
side-effects on maintainability.
2012-06-27 08:30:30 -07:00
Jean-Luc Duprat
665c564dcf cbackend.cpp now makes explicit use of the llvm namespace, rather than implicitly with a using declaration.
This will allow for some further changes to ISPC's C backend, without collision with ISPC's namespace.
This change aims to have no effect on the code generated by the compiler, it should be a big no-op; except
for its side-effects on maintainability.
2012-06-26 22:15:31 -07:00
Jean-Luc Duprat
ed71413e04 Merge branch 'master' of git://github.com/ispc/ispc 2012-06-26 14:32:27 -07:00
Jean-Luc Duprat
4b5e49b00b Merge branch 'master' of github.com:jduprat/ispc 2012-06-26 14:32:01 -07:00
Matt Pharr
f558ee788e Fix bug with generating implicit zero initializer values.
Issue #300.
2012-06-26 11:58:16 -07:00
Matt Pharr
ceb8ca680c Fix crash in codegen for assert() with malformed program.
Issue #302.
2012-06-26 11:54:55 -07:00
Matt Pharr
79ebcbec4b Fix crash in SwitchStmt::TypeCheck() with malformed programs. 2012-06-26 11:21:33 -07:00
Matt Pharr
2c7b650240 Add FAQ to explain how to launch per-instance tasks with foreach_active and unmasked.
Issue #227.
2012-06-22 14:32:05 -07:00
Matt Pharr
54459255d4 Add unmasked { } statement.
This reestablishes an "all on" execution mask for the gang, which can
be useful for nested parallelism..
2012-06-22 14:30:58 -07:00
Matt Pharr
b4a078e2f6 Add foreach_active iteration statement.
Issue #298.
2012-06-22 10:35:43 -07:00
Matt Pharr
ed13dd066b Distinguish between 'regular' foreach and foreach_unique in FunctionEmitContext
We need to do this since it's illegal to have nested foreach statements, but
nested foreach_unique, or foreach_unique inside foreach, etc., are all fine.
2012-06-22 06:04:00 -07:00
Matt Pharr
2b4a3b22bf Issue an error if the user has nested foreach statements.
Partially addresses issue #280.  (We should support them properly,
but at least now we don't silently generate incorrect code.)
2012-06-21 16:53:27 -07:00
Matt Pharr
8b891da628 Allow referring to the struct type being defined in its members.
It's now legal to write:

struct Foo { Foo *next; };

previously, a predeclaration "struct Foo;" was required.  This fixes
issue #287.

This change also fixes a bug where multiple forward declarations 
"struct Foo; struct Foo;" would incorrectly issue an error on the
second one.
2012-06-21 16:44:04 -07:00
Matt Pharr
5a2c8342eb Allow structs with no members.
Issue #289.
2012-06-21 16:07:31 -07:00
Matt Pharr
50eb4bf53a Change print() implementation to accumulate string locally before printing.
The string to be printed is accumulated into a local buffer before being sent to
puts().  This ensure that if multiple threads are running and printing at the
same time, their output won't be interleaved (across individual print statements--
it still may be interleaved across different print statements, just like in C).

Issue #293.
2012-06-21 14:41:53 -07:00
Matt Pharr
3c10ddd46a Fix declaration of size_t.
It should be an unsigned integer type.
2012-06-21 14:40:24 -07:00
Matt Pharr
0b7f9acc70 Align <16 x i1> vectors to just 16 bits for generic targets.
Partially addresses issue #259.
2012-06-21 10:25:33 -07:00
Matt Pharr
10fbaec247 Fix C++ output for unordered fp compares.
Fixes a bug introduced in 46716aada3.
2012-06-21 09:57:19 -07:00
Matt Pharr
007a734595 Add support for 'unmasked' function qualifier. 2012-06-20 15:36:00 -07:00
Matt Pharr
46716aada3 Switch to unordered floating point compares.
In particular, this gives us desired behavior for NaNs (all compares
involving a NaN evaluate to true).  This in turn allows writing the
canonical isnan() function as "v != v".

Added isnan() to the standard library as well.
2012-06-20 13:25:53 -07:00
Matt Pharr
3bc66136b2 Add foreach_unique iteration construct.
Idea via Ingo Wald / IVL compiler.
2012-06-20 10:04:24 -07:00
Matt Pharr
fae47e0dfc Update stdlib to not use "in" as a variable name.
Preparation for foreach_unique, which uses that as a keyword.
2012-06-20 10:04:24 -07:00
Matt Pharr
bd52e86486 Issue error on attempt to derefernce void pointer types.
Issue #288.
2012-06-18 19:51:19 -07:00
Matt Pharr
b2f6ed7209 Fix usage of CastType 2012-06-18 16:26:31 -07:00
Matt Pharr
4b334fd2e2 Fix linkage for programIndex et al. when not debugging.
We now use InternalLinkage for the 'programIndex' symbol (and similar)
if we're not compiling with debugging symbols.  This prevents those
symbol names/definitions from polluting the global namespace for
the common case.

Basically addresses Issue #274.
2012-06-15 11:50:16 -07:00
Matt Pharr
a23a7006e3 Don't issue error incorrectly with forward decl. of exported function.
Issue #281.
2012-06-15 10:54:50 -07:00
Matt Pharr
f47171a17c Don't check for "all off" mask at function entry.
We should never be running with an all off mask and thus should never
enter a function with an all off mask.  No performance change from
removing this, however.

Issue #282.
2012-06-15 10:14:53 -07:00
Matt Pharr
4945dc3682 Add contributors link to docs HTML templates 2012-06-13 06:11:08 -07:00
Matt Pharr
ada66b5313 Make more attempts to pull out constant offsets for gather/scatter.
The "base+offsets" variants of gather decompose the integer offsets into
compile-time constant and compile-time unknown elements.  (The coalescing
optimization, then, depends on this decomposition being done well--having
as much as possible in the constant component.)  We now make multiple
efforts to improve this decomposition as we run optimization passes; in
some cases we're able to move more over to the constant side than was
first possible.

This in particular fixes issue #276, a case where coalescing was expected
but didn't actually happen.
2012-06-12 16:21:14 -07:00
Matt Pharr
96450e17a3 Do all memory op improvements in a single optimization pass.
Rather than having separate passes to do conversion, when possible, of:

- General gather/scatter of a vector of pointers to g/s of
  a base pointer and integer offsets
- Gather/scatter to masked load/store, load+broadcast
- Masked load/store to regular load/store

Now all are done in a single ImproveMemoryOps pass.  This change was in
particular to address some phase ordering issues that showed up with
multidimensional array access wherein after determining that an outer
dimension had the same index value, we previously weren't able to take
advantage of the uniformity of the resulting pointer.
2012-06-12 13:56:17 -07:00
Matt Pharr
40a295e951 Fix bug where "avx-x2" target would cause AVX1.1 to be used. 2012-06-12 13:37:38 -07:00
Matt Pharr
d6c6f95373 Do all replacements of __pseudo* memory ops in a single optimization pass.
Collected the old PseudoGSToGSPass and PseudoMaskedStorePass into a single
pass, ReplacePseudoMemoryOpsPass, which handles both of their tasks.
2012-06-12 13:10:03 -07:00
Matt Pharr
19b46be20d Remove load_and_broadcast from built-ins.
Now that we never ever run with the mask all off, we no longer need
that logic in a built-in function so that we can check the mask.  In
the one place where it was used (turning gathers to the same location
into a load and broadcast), we now just emit the code for that
directly.
2012-06-12 12:30:57 -07:00
Ingo Wald
789e04ce90 Add support for host/device stub functions for offload. 2012-06-12 10:23:49 -07:00
Matt Pharr
dd4f0a600b Update AVX1.1 targets to not include declarations of half/float routines in bit code. 2012-06-08 15:57:36 -07:00
Matt Pharr
6c7df4cb6b Add initial support for "avx1.1" targets for Ivy Bridge.
So far, only the use of the float/half conversion instructions distinguishes
this from the "avx1" target.

Partial work on issue #263.
2012-06-08 15:55:00 -07:00
Matt Pharr
79e0a9f32a Fix codegen bug with foreach_tiled.
When the outermost dimension(s) were partially active, but the innermost
dimension was all on, we'd inadvertently use an incorrect "all on"
execution mask.

Fixes issues #177 and #200.
2012-06-08 14:56:18 -07:00
Matt Pharr
6c9bc63a1c Improve SourcePos reporting of the origin of the gather for gather warnings. 2012-06-08 13:33:11 -07:00
Matt Pharr
28a821df7d Improve wording of gather/scatter performance warnings. 2012-06-08 13:32:57 -07:00
Matt Pharr
27e39954d6 Fix a number of issues in examples/intrinsics/sse4.h.
This had gotten fairly out of date, after recent changes to C++ output.
Roughly 15 tests still fail with this target.

Issue #278.
2012-06-08 12:52:36 -07:00
Matt Pharr
e730a5364b Issue error if any complex assignment operator is used with a struct type.
Issue #275.
2012-06-08 11:29:02 -07:00
Matt Pharr
92b3ae41dd Don't print request to file bug on fatal error twice. 2012-06-08 11:23:45 -07:00
Matt Pharr
89a2566e01 Add separate variants of memory built-ins for floats and doubles.
Previously, we'd bitcast e.g. a vector of floats to a vector of i32s and then
use the i32 variant of masked_load/masked_store/gather/scatter.  Now, we have
separate float/double variants of each of those.
2012-06-07 14:47:16 -07:00
Matt Pharr
1ac3e03171 Gather/scatter function improvements in builtins.
More naming consistency: _i32 rather than i32, now.

Also improved the m4 macros to generate these sequences to not require as
many parameters.
2012-06-07 14:19:23 -07:00
Matt Pharr
b86d40091a Improve naming of masked load/store instructions in builtins.
Now, use _i32 suffixes, rather than _32, etc.  Also cleaned up the m4
macro to generate these functions, using WIDTH to get the target width,
etc.
2012-06-07 13:58:31 -07:00
Matt Pharr
91d22d150f Update load_and_broadcast built-in
Change function suffix to "_i32", etc, from "_32"

Improve load_and_broadcast macro in util.m4 to grab vector width from 
WIDTH variable rather than taking it as a parameter.
2012-06-07 13:33:17 -07:00
Matt Pharr
1d29991268 Indentation fixes in builtins/ 2012-06-07 13:23:07 -07:00
Matt Pharr
6f0a2686dc Use %a format for printf() for float constants on non-Windows platforms. 2012-06-07 13:20:03 -07:00
Matt Pharr
f06caabb07 Generate better code for break statements in varying loops (sometimes).
If we have a simple varying 'if' statement where the only code in the body is
a single 'break', then emit special case code that just updates the execution
mask directly.

Surprisingly, this leads to better generated code (e.g. Mandelbrot 7.1x on AVX
vs 5.8x before).  It's not clear why the general code generation path for
break doesn't generate the equivalent code; this topic should be investigated
further.  (Issue #277).
2012-06-06 11:08:42 -07:00
Matt Pharr
3c869802fb Always store multiply-used vector compares in temporary variables (C++ output). 2012-06-06 11:08:42 -07:00
Matt Pharr
7b6bd90903 Remove various equality checks between GetInternalMask() and LLVMMaskAllOn
These were never kicking in, since GetInternalMask() always loads from the
mask storage memory.
2012-06-06 11:08:42 -07:00
Matt Pharr
967bfa9c92 Silence compiler warning. 2012-06-06 08:08:55 -07:00
Matt Pharr
592affb984 Add experimental (and undocumented for now) export syntax.
This allows adding types to the list that are included in the automatically-generated
header files.

struct Foo { . . . };
struct Bar { . . . };

export { Foo, Bar };
2012-06-05 12:51:21 -07:00
Matt Pharr
96aaf6d53b Fix build with LLVM top of tree. 2012-06-05 12:28:05 -07:00
Matt Pharr
1397dbdabc Don't generate colorized output escapes when stderr isn't a TTY.
When piping to a pile, more/less, etc, this is generally undesirable.

This behavior can be overridden with the --colorized-output command-line
flag.
2012-06-04 09:20:57 -07:00
Matt Pharr
6118643232 Handle more error cases if the user tries to declare a method. 2012-06-04 09:07:13 -07:00
Matt Pharr
71198a0b54 Don't indent too much in errors/warnings if the filename is long. 2012-06-04 08:53:43 -07:00
Matt Pharr
22cb80399f Issue error if user tries to declare a method. 2012-06-04 08:50:13 -07:00
Jean-Luc Duprat
fa1fd8a576 Merged Upstream 2012-06-01 11:13:16 -07:00
Matt Pharr
6df7d31a5b Fix incorrect assertion.
Issue #272.
2012-05-30 16:34:59 -07:00
Matt Pharr
ef049e92ef Handle undefined struct types when generating headers. 2012-05-30 16:28:21 -07:00
Matt Pharr
fe8b109ca5 Fix more tests for 32 and 64-wide execution. 2012-05-30 13:06:07 -07:00
Matt Pharr
8fd9b84a80 Update seed_rng() in stdlib to take a varying seed.
Previously, we were trying to take a uniform seed and then shuffle that
around to initialize the state for each of the program instances.  This
was becoming increasingly untenable and brittle.

Now a varying seed is expected and used.
2012-05-30 10:35:41 -07:00
Matt Pharr
5cb53f52c3 Fix various tests/[frs]* files to be correct with 32 and 64-wide targets.
Still todo: tests/c*, tests/test-*
2012-05-30 10:31:12 -07:00
Matt Pharr
d86653668e Fix a number of tests to work correctly with 32/64-wide targets.
Still to be reviewed/fixed: tests/test-*, tests/[cfrs]*
2012-05-29 10:16:43 -07:00
Matt Pharr
5084712a15 Fix bugs in examples/intrinsics/generic-64.h
There were a number of situations where we were left-shifting 1 by a
lane index that were failing due to shifting beyond 32-bits.  Fixed
by shifting the 64-bit constant value 1ull.
2012-05-29 08:31:10 -07:00
Jean-Luc Duprat
ece65cab18 Fix some tests for up to 64-wide gangs 2012-05-29 07:52:50 -07:00
Matt Pharr
1f6075506c Fix linux build (Jean-Luc Duprat) 2012-05-28 19:45:16 -07:00
Matt Pharr
51ade48e3d Fix some of the reduce-* tests for 32 and 64-wide targets 2012-05-25 14:47:06 -07:00
Matt Pharr
21c43737fe Fix bug in examples/intrinsics/generic-32.h 2012-05-25 14:27:30 -07:00
Matt Pharr
6c7bcf00e7 Add examples/intrinsics/generic-64.h. 2012-05-25 14:27:19 -07:00
Matt Pharr
7a2142075c Add examples/intrinsics/generic-32.h implementation.
Roughly 100 tests fail with this; all the tests need to be audited
for assumptions that 16 is the widest width possible…
2012-05-25 12:37:59 -07:00
Matt Pharr
e8e9baa417 Update test_static.cpp to handle up to 64-wide 2012-05-25 12:14:58 -07:00
Matt Pharr
449d956966 Add support for generic-64 target. 2012-05-25 11:57:28 -07:00
Matt Pharr
90db01d038 Represent MOVMSK'ed masks with int64s rather than int32s.
This allows us to scale up to 64-wide execution.
2012-05-25 11:57:23 -07:00
Matt Pharr
38cea6dc71 Issue error if "typedef" is inadvertently included in function definition.
Issue #267.
2012-05-25 11:09:26 -07:00
Matt Pharr
64807dfb3b Add AssertPos() macro that provides rough source location in error
It can sometimes be useful to know the general place we were in the program
when an assertion hit; when the position is available / applicable, this
macro is now used.

Issue #268.
2012-05-25 10:59:45 -07:00
Matt Pharr
d943455e10 Issue error on overloaded "export"ed functions.
Issue #270.
2012-05-25 10:35:34 -07:00
Matt Pharr
fd03ba7586 Export reference parameters as C++ references, not pointers. 2012-05-24 07:12:48 -07:00
Matt Pharr
2c5a57e386 Fix bugs related to varying pointers to functions that return void. 2012-05-23 14:29:17 -07:00
Matt Pharr
e8858150cb Allow redundant semicolons at global scope. (Ingo Wald) 2012-05-23 14:20:20 -07:00
Matt Pharr
333f901187 Fix build with LLVM 3.2 dev top-of-tree 2012-05-23 14:19:50 -07:00
Matt Pharr
7dd4d6c75e Update for LLVM 3.2dev API change 2012-05-22 15:53:14 -07:00
Matt Pharr
99f57cfda6 Issue more sensible error message for varying pointers in exported functions. 2012-05-18 12:00:11 -07:00
Matt Pharr
4d1eb94dfd Fix bug in AddElementOffset() error checking. 2012-05-18 11:57:05 -07:00
Matt Pharr
22d584f302 Don't issue perf. warnings for various conversions with generic target. 2012-05-18 11:56:11 -07:00
Matt Pharr
72c41f104e Fix various malformed program crashes. 2012-05-18 10:44:45 -07:00
Matt Pharr
8d3ac3ac1e Fix build with LLVM ToT 2012-05-18 10:09:09 -07:00
Matt Pharr
299ae186f1 Expect support for half and transcendentals from all generic targets 2012-05-18 06:13:45 -07:00
Matt Pharr
f4df2fb176 Improvements to mask update code for generic targets.
Rather than XOR'ing with a temporary 'all-on' vector, we call
__not.  Also, we call out to __and_not1 and __and_not2, for an
AND where the first or second operand, respectively, has had
NOT applied to it.
2012-05-16 13:52:51 -07:00
Matt Pharr
625fbef613 Fix Windows build 2012-05-15 12:19:10 -07:00
Matt Pharr
fbed0ac56b Remove allOffMaskIsSafe from Target
The intent of this was to indicate whether it was safe to run code
with an 'all of' mask on the given target (and then sometimes be
more flexible about e.g. running both true and false blocks of if
statements, etc.)

The problem is that even if the architecture has full native mask support,
it's still not safe to run 'uniform' memory operations with the mask all
off.  Even more tricky, we sometimes transform masked varying memory operations
to uniform ones during optimization (e.g. gather->load and broadcast).

This fixes a number of the tests/switch-* tests that were failing on the
generic targets due to this issue.
2012-05-09 14:18:47 -07:00
Matt Pharr
dc120f3962 Fix regression in masked_store_blend for generic target.
In ee1fe3aa9f, the LLVM_VERSION define was updated to never
have the 'svn' suffix and the build was updated to handle LLVM
3.2.  This file had a check for LLVM_3_1svn that was no longer
hitting.

This fixes some issues with unnecessary loads and stores
in generated C++ code for the generic targets.
2012-05-09 14:18:47 -07:00
Matt Pharr
4f053e5b83 Pass OPT flags when linking 2012-05-08 13:25:09 -07:00
Matt Pharr
c6241581a0 Add an extra parameter to __smear functions to encode return type.
Now, the __smear* functions in generated C++ code have an unused first
parameter of the desired return type; this allows us to have headers
that include variants of __smear for multiple target widths.  (This
approach is necessary since we can't overload by return type in C++.)

Issue #256.
2012-05-08 09:54:23 -07:00
Nipunn Koorapati
041ade66d5 Placated compiler by initializing variable 2012-05-06 06:59:17 -07:00
Nipunn Koorapati
067a2949ba Added syntax highlighting for 'uniform' and 'varying' types. 2012-05-06 06:58:53 -07:00
Matt Pharr
55c754750e Remove a number of redundant/unneeded optimization passes.
Performance and code quality of performance suite is unchanged,
compilation times are improved by another 20% or so for simple
programs (e.g. rt.ispc).  One very complex programs compiles
about 2.4x faster now.
2012-05-05 15:47:24 -07:00
Matt Pharr
72b6c12856 Notify LLVM pass mgr that the MakeInternalFuncsStaticPass doesn't change the CFG. 2012-05-05 15:47:24 -07:00
Matt Pharr
15ea0af687 Add -f option to run_tests.py
This allows providing additional command-line arguments to ispc,
e.g. to force compilation with -O1, -g, etc.
2012-05-05 15:47:24 -07:00
Matt Pharr
ee7e367981 Do global dead code elimination early in optimization.
This gives a 15-20% speedup in compilation time for simple
programs (but only ~2% for the big 21k monster program).
2012-05-05 15:47:19 -07:00
Matt Pharr
8006589828 Use llvm::SmallVectors for struct member types and function types.
Further reduction of dynamic memory allocation...
2012-05-04 13:55:38 -07:00
Matt Pharr
413264eaae Make return values const &s to save copying. 2012-05-04 13:55:38 -07:00
Matt Pharr
7db8824da2 Reduce dynamic memory allocation in getting unif/varying variants of AtomicTypes 2012-05-04 13:55:38 -07:00
Matt Pharr
e1bc010bd1 More reduction of dynamic allocations in lDoTypeConv() 2012-05-04 13:55:38 -07:00
Matt Pharr
bff02017da Cache const/non-const variants of Atomic and ReferenceTypes.
More reduction of dynamic memory allocation.
2012-05-04 13:55:38 -07:00
Matt Pharr
c0019bd8e5 Cache type and lvalue type in IndexExpr and MemberExpr
This saves a bunch of redundant work and unnecessary duplicated
memory allocations.
2012-05-04 13:55:38 -07:00
Matt Pharr
e495ef2c48 Reduce dynamic memory allocation by reusing scope maps in symbol table. 2012-05-04 13:55:38 -07:00
Matt Pharr
78d62705cc Cache element types in StructType.
Previously, GetElementType() would end up causing dynamic allocation to
happen to compute the final element type (turning types with unbound
variability into the same type with the struct's variability) each it was
called, which was wasteful and slow.  Now we cache the result.

Another 20% perf on compiling that problematic program.
2012-05-04 13:55:38 -07:00
Matt Pharr
2791bd0015 Improve performance of lCheckTypeEquality()
We don't need to explicitly create the non-const Types to do type
comparison when ignoring const-ness in the check.

We can also save some unnecessary dynamic memory allocation by
keeping strings returned from GetStructName() as references to strings.

This gives another 10% on front-end perf on that big program.
2012-05-04 13:55:38 -07:00
Matt Pharr
7cf66eb61f Small optimizations to various AtomicType methods. 2012-05-04 13:55:38 -07:00
Matt Pharr
944c53bff1 Stop using dynamic_cast for Types.
We now have a set of template functions CastType<AtomicType>, etc., that in
turn use a new typeId field in each Type instance, allowing them to be inlined
and to be quite efficient.

This improves front-end performance for a particular large program by 28%.
2012-05-04 13:55:38 -07:00
Matt Pharr
c756c855ea Compile with -O2 by default on Linux/OSX. 2012-05-04 13:55:37 -07:00
Matt Pharr
58bb2826b2 Perf: cache connection between const/non-const struct variants.
In one very large program, we were spending quite a bit of time repeatedly
getting const variants of StructTypes.  This speeds up the front-end by
about 40% for that test case.

(This is something of a band-aid, pending uniquing types.)
2012-05-04 13:55:37 -07:00
Nipunn Koorapati
b7bef87a4d Added README for vim syntax highlighting. 2012-05-03 14:23:33 -07:00
Matt Pharr
0c1b206185 Pass log/exp/pow transcendentals through to targets that support them.
Currently, this is the generic targets.
2012-05-03 13:49:56 -07:00
Matt Pharr
7d7e99a92c Update ISPC_MINOR_VERSION to 2
(This should have been done with the 1.2.0 release!)
2012-05-03 12:04:24 -07:00
Matt Pharr
1ba8d7ef74 Fix test that had undefined behavior. 2012-05-03 11:11:21 -07:00
Matt Pharr
d99bd279e8 Add generic-32 target. 2012-05-03 11:11:06 -07:00
Matt Pharr
ee1fe3aa9f Update build to handle existence of LLVM 3.2 dev branch.
We now compile with LLVM 3.0, 3.1, and 3.2svn.
2012-05-03 08:25:25 -07:00
Matt Pharr
c4b1d79c5c When a function is defined, set its symbol's position to the code position.
Before, if the function was declared before being defined, then the symbol's
SourcePos would be left set to the position of the declaration.  This ended
up getting the debugging symbols mixed up in this case, which was undesirable.
2012-04-28 20:28:39 -07:00
Matt Pharr
a1a43cdfe0 Fix bug so that programIndex (et al.) are available in the debugger.
It's now possible to successfully print out the value of programIndex,
programCount, etc., in the debugger.  The issue was that they were
defined as having InternalLinkage, which meant that DCE removed them
at the end of compilation.  Now they're declared to have WeakODRLinkage,
which ensures that one copy survives (but there aren't multiply-defined
symbols when compiling multiple files.)
2012-04-28 17:12:57 -07:00
Matt Pharr
27b62781cc Fix bug in lStripUnusedDebugInfo().
This was causing an assert to hit in llvm's DwarfDebug.cpp.
2012-04-28 13:06:29 -10:00
Matt Pharr
0c5d7ff8f2 Add rygorous's float->srgb8 conversion routine to the stdlib.
Issue #230
2012-04-27 10:03:19 -10:00
Matt Pharr
0e2b315ded Add FAQ about foreach code generation.
(i.e. "why's there that extra stuff at the end and what can I do
about it if it's not necessary?)

Issue #231.
2012-04-27 09:35:37 -10:00
Matt Pharr
3e74d1c544 Fix documentation bug with typedef. 2012-04-25 17:15:20 -10:00
Matt Pharr
da690acce5 Fix build with LLVM 3.0 2012-04-25 14:27:33 -10:00
Matt Pharr
0baa2b484d Fix multiple bugs related to DIBuilder::createFunction() call.
The DIType passed to this method should correspond to the
FunctionType of the function, not its return type.

The first parameter should be the DIScope for the compile unit,
not the DIFile.

We previously had the unmangled function name and the mangled
function name interchanged.

The argument corresponding to "first line number of the function" was
missing, which in turn led to subsequent arguments being off, and thus
providing bogus values vs. what was supposed to be passed.

Rename FunctionEmitContext::diFunction to diSubprogram, to better
reflect its type.
2012-04-25 08:43:11 -10:00
Matt Pharr
260d7298c3 Strip unused debugging metadata after done with compilation.
Debugging information for functions that are inlined or static and
not used still hangs around after compilation; now we go through the
debugging info and remove the entries for any DISubprograms that
don't have their original functions left in the Module after
optimization.
2012-04-25 08:43:11 -10:00
Matt Pharr
d5cc2ad643 Call Verify() methods of various debugging llvm::DI* types after creation. 2012-04-25 08:43:11 -10:00
Matt Pharr
12706cd37f Debugging optimization pass updates
Don't run mem2reg with -O0 anymore, but do run the intrinsics opt pass, which
allows some CFG simplification due to the mask being all on, etc.
2012-04-25 08:43:11 -10:00
Matt Pharr
7167442d6e Debugging info: include parameter number for function params. 2012-04-25 08:43:11 -10:00
Matt Pharr
8547101c4b Debugging info: produce more descriptive producer string 2012-04-25 08:43:11 -10:00
Matt Pharr
5d58a9e4c2 Merge pull request #250 from jfpoole/master
Fix 32-bit samples on Mac OS X.
2012-04-23 17:12:46 -07:00
John Poole
cd98a29a4b Fix 32-bit samples on Mac OS X.
On Mac OS X and Linux rdtsc() didn't save and restore 32-bit registers.

This patch fixes issue #87.
2012-04-23 16:00:07 -07:00
Matt Pharr
903714fd40 Merge pull request #248 from nipunn1313/master
Goto with incorrect label now suggests labels based on string distance
2012-04-21 14:43:57 -07:00
Nipunn Koorapati
138c7acf22 Error() and Warning() functions for reporting compiler errors/warnings now respects newlines as part of valid error messages. 2012-04-21 01:44:10 -04:00
Matt Pharr
03b2b8ae8f Bump version number to 1.2.3dev 2012-04-20 14:31:46 -07:00
Matt Pharr
016b502d46 Update release notes for 1.2.2, bump version number in doxygen 2012-04-20 14:26:00 -07:00
Matt Pharr
c5f6653564 Bump version number to 1.2.2 2012-04-20 11:54:12 -07:00
Matt Pharr
cf9a4e209e Fix malformed program crash. 2012-04-20 11:53:43 -07:00
Nipunn Koorapati
040421942f Goto statements with a bad label produces error message.
Now it also produces a short list of suggestions based on string distance.
2012-04-20 14:42:14 -04:00
Matt Pharr
4dfc596d38 Fix MSVC warnings. 2012-04-20 10:50:39 -07:00
Matt Pharr
fe83ef7635 Merge pull request #247 from nipunn1313/master
Fixed compiler warning
2012-04-20 09:26:57 -07:00
Nipunn Koorapati
db8b08131f Fixed compile error which shows up on LLVM 3.0 2012-04-20 12:17:09 -04:00
Matt Pharr
32815e628d Improve naming of llvm Instructions created.
We now try harder to keep the names of instructions related to the
initial names of variables they're derived from and so forth.  This
is useful for making both LLVM IR as well as generated C++ code
easier to correlate back to the original ispc source code.

Issue #244.
2012-04-19 16:36:46 -07:00
Matt Pharr
71bdc67a45 Add LLVMGetName() utility routines.
Infrastructure for issue #244.
2012-04-19 16:24:40 -07:00
Matt Pharr
cb9f50ef63 C++ backend: mangle variable names less.
This makes the generated code a little easier to connect with the
original program.
2012-04-19 13:11:47 -07:00
Matt Pharr
12c754c92b Improved handling of splatted constant vectors in C++ backend.
Now, when we're printing out a constant vector value, we check to see
if it's a splat and call out to one of the __splat_* functions in
the generated code if to.
2012-04-19 13:11:15 -07:00
Matt Pharr
e4b3d03da5 When available, use ANSI escapes to colorize diagnostic output.
Issue #245.
2012-04-19 11:36:28 -07:00
Matt Pharr
cc26b66e99 Improve source position reporting for scatters.
Now, we only highlight the memory write--not both sides of the
assignment expression.
2012-04-19 11:23:20 -07:00
Matt Pharr
34d81fa522 Fix bugs in tests.
These two tests were walking past the end of the aFOO[] array, which
in turn was leading to failures with the generic-16/c++ output path.
2012-04-19 10:33:33 -07:00
Matt Pharr
49f1a5c2b3 Add print() statements to tests to indicate failure details.
These tests all fail with generic-16/c++ output currently; however, the
output indicates that it's just small floating-point differences.
(Though the question remains, why are those differences popping up?)
2012-04-19 10:32:55 -07:00
Matt Pharr
326c45fa17 Fix bugs in LLVMExtractFirstVectorElement().
When we're manually scalarizing the extraction of the first element
of a vector value, we need to be careful about handling constant values
and about where new instructions are inserted.  The old code was
sloppy about this, which in turn lead to invalid IR in some cases.
For example, the two bugs below were essentially due to generating
an extractelement inst from a zeroinitializer value and then inserting
it in the wrong bblock such that a phi node that used that value was
malformed.

Fixes issues #240 and #229.
2012-04-19 09:45:04 -07:00
Matt Pharr
a2bb899a6b Opt debug printing improvement
Now, just match the prefix of the provided function name of interest,
which allows us to not worry about managing details.
2012-04-19 09:34:54 -07:00
Matt Pharr
9fedb1674e Improve basic block dumping from optimization passes.
Now done via a macro, which is cleaner.  It's also now possible to
specify a single function to watch, which is useful for debugging.
2012-04-18 15:46:18 -07:00
Matt Pharr
7c91b01125 Handle more forms of constant vectors in lGetMask().
Various optimization passes depend on turning a compile-time constant
mask into a bit vector; it turns out that in LLVM3.1, constant vectors
of ints/floats are represented with llvM::ConstantDataVector, but
constant vectors of bools use llvm::ConstantVector (which is what LLVM
3.0 uses for all constant vectors).  Now lGetMask() always does the
llvm::ConstantVector path, to cover this case.

This improves generated C++ code by eliminating things like select
with an all on/off mask, turning movmask calls with constants into
constant values, etc.
2012-04-18 11:39:11 -07:00
Matt Pharr
c202e9e106 Add debugging printing code to optimization passes.
Now all of the passes dump out the basic block before and after
they do their thing when --debug is enabled.
2012-04-18 11:39:10 -07:00
Matt Pharr
645a8c9349 Fix serious bug in VSelMovmskOpt
When the mask was all off, we'd choose the incorrect operand!

(This bug was masked since this optimization wasn't triggering as
intended, due to other issues to be fixed in a forthcoming commit.
2012-04-18 11:39:10 -07:00
Jean-Luc Duprat
093fdcf3df Fixed bad integration 2012-04-18 09:39:54 -07:00
Jean-Luc Duprat
7abda5e8c2 Merge branch 'master' of git://github.com/ispc/ispc 2012-04-18 09:24:35 -07:00
Matt Pharr
abf7c423bb Fix build with LLVM 3.0 2012-04-18 06:14:55 -07:00
Matt Pharr
55d5c07d00 Issue errors when doing illegal things with incomplete struct types.
Issue an error, rather than crashing, if the user has declared a
struct type but not defined it and subsequently tries to:

- dynamically allocate an instance of the struct type
- do pointer math with a pointer to the struct type
- compute the size of the struct type
2012-04-18 06:08:05 -07:00
Jean-Luc Duprat
0a9b272fe4 Merge branch 'master' of git://github.com/ispc/ispc 2012-04-17 15:34:36 -07:00
Matt Pharr
b9d6ba2aa0 Always set target info, even when compiling to generic targets.
This allows the SROA pass eliminate a lot of allocas and loads and
stores, which helps a lot for performance.
2012-04-17 15:10:30 -07:00
Matt Pharr
a0c9f7823b C++ backend fixes.
Handle calls to llvm.trap()
Declare functions before globals
Handle memset()
2012-04-17 15:09:42 -07:00
Jean-Luc Duprat
4477a9c59a Merge branch 'master' of git://github.com/ispc/ispc
Conflicts:
	decl.cpp
2012-04-17 10:38:07 -07:00
Matt Pharr
99a27fe241 Add support for forward declarations of structures.
Now a declaration like 'struct Foo;' can be used to establish the
name of a struct type, without providing a definition.  One can
pass pointers to such types around the system, but can't do much
else with them (as in C/C++).

Issue #125.
2012-04-16 06:27:21 -07:00
Matt Pharr
fefa86e0cf Remove LLVM_TYPE_CONST #define / usage.
Now with LLVM 3.0 and beyond, types aren't const.
2012-04-15 20:11:27 -07:00
Matt Pharr
098c4910de Remove support for building with LLVM 2.9.
A forthcoming change uses some features of LLVM 3.0's new type
system, and it's not worth back-porting this to also all work
with LLVM 2.9.
2012-04-15 20:08:51 -07:00
Matt Pharr
17b7148300 Initial implementation of FunctionType::GetDIType 2012-04-13 19:50:45 -07:00
Matt Pharr
f4a2ef28e3 Fix crashes from malformed programs. 2012-04-13 19:42:07 -07:00
Matt Pharr
f0d013ee76 Fix incorrect assert. Issue #241 2012-04-12 20:19:41 -07:00
Matt Pharr
5ece6fec04 Substantial rewrite (again) of decl handling.
The decl.* code now no longer interacts with Symbols, but just returns
names, types, initializer expressions, etc., as needed.  This makes the
code a bit more understandable.

Fixes issues #171 and #130.
2012-04-12 17:28:30 -07:00
Matt Pharr
d88dbf3612 Fix two bugs with resolving unbound variability.
We still need to call ResolveUnboundVariability even if the
type returns false from HasUnboundVariability; we may have,
for example, a pointer type where the pointer is resolved,
but the pointed-to type is unresolved.

Fixes issue #228.
2012-04-12 11:40:28 -07:00
Matt Pharr
2a18efef82 Use type conversion machinery when processing expr lists for initializers.
Once we're down to something that's not another nested expr list, use 
TypeConvertExpr() to convert the expression to the type we need.  This should
allow simplifying a number of the GetConstant() implementations, to remove
partial reimplementation of type conversion there.

For now, this change finishes off issue #220.
2012-04-12 11:23:02 -07:00
Matt Pharr
fd846fbe77 Fix bug in __gather_base_offsets_32.
In short, we weren't correctly zeroing the compile-time constant portion
of the offsets for lanes that aren't executing. (!)

Fixes issue #235.
2012-04-12 10:28:15 -07:00
Matt Pharr
ca7cc4744e Fix bug with taking references of temporaries.
Previously, the compiler would crash if e.g. the program passed a
temporary value to a function taking a const reference.  This change
fixes ReferenceExpr::GetValue() to handle this case and allocate
temporary storage for the temporary so that the pointer to that
storage can be used for the reference value.
2012-04-12 06:08:19 -07:00
Matt Pharr
491fa239bd Add atomic swap and cmpxchg for void * as well.
Issue #232.
2012-04-11 06:12:31 -07:00
Matt Pharr
66765dc123 Fix printing of function overload candidates in error message. 2012-04-11 06:11:52 -07:00
Matt Pharr
70a5348f43 Add size_t, ptrdiff_t, and [u]intptr_t types. 2012-04-11 05:32:53 -07:00
Matt Pharr
2aa61007c6 Remove memory_barrier() calls from atomics.
This was unnecessary overhead to impose on all callers; the user
should handle these as needed on their own.

Also added some explanatory text to the documentation that highlights
that memory_barrier() is only needed across HW threads/cores, not
across program instances in a gang.
2012-04-10 19:37:03 -07:00
Matt Pharr
acfbe77ffc Fix typo. 2012-04-10 19:27:37 -07:00
Matt Pharr
08696653ca Don't include struct member types in mangled string.
Not only was this quite verbose, it was unnecessary since we do type
equality by name.  This also needed to be fixed before we could
handle structs declared like "struct Foo;", when we then e.g. have
other structs with Foo * members.
2012-04-10 19:27:31 -07:00
Matt Pharr
8a1a214ca9 Provide required alignment when generating debug info for pointer types. 2012-04-09 14:36:39 -07:00
Matt Pharr
7aaeb27e0f Remove duplicate test. 2012-04-09 14:23:17 -07:00
Matt Pharr
972043c146 Fix serious bug in handling constant-valued initializers.
In InitSymbol(), we try to be smart and emit a memcpy when there
are a number of values to store (e.g. for arrays, structs, etc.)

Unfortunately, this wasn't working as desired for bools (i.e. i1 types),
since the SizeOf() call that tried to figure out how many bytes to
copy would return 0 bytes, due to dividing the number of bits to copy
by 8.

Fixes issue #234.
2012-04-09 14:23:08 -07:00
Matt Pharr
8475dc082a Bump version number to 1.2.2dev 2012-04-06 16:16:50 -07:00
Matt Pharr
d0e583b29c Release notes and doxygen version nubmer bump for 1.2.1 2012-04-06 16:02:19 -07:00
Matt Pharr
c8feee238b Bump release number to 1.2.1 2012-04-06 15:30:54 -07:00
Matt Pharr
6712ecd928 Merge pull request #233 from nipunn1313/master
Ability to point build to custom version of llvm and clang
2012-04-06 15:24:12 -07:00
Nipunn Koorapati
d0c7b5d35c Merge remote-tracking branch 'upstream/master' 2012-04-06 17:58:21 -04:00
Nipunn Koorapati
802add1f97 Added to the Makefile the ability to point to a
custom installation of llvm and clang.
2012-04-06 17:54:55 -04:00
Matt Pharr
95556811fa Fix linux build 2012-04-05 20:39:39 -07:00
Matt Pharr
581472564d Print "friendly" ispc message when abort/seg fault signal is thrown.
Make crashes that happen in LLVM less inscrutable.

Issue #222.
2012-04-05 15:51:44 -07:00
Matt Pharr
c7dc8862a5 Add FAQs about various language details.
One of these finishes off issue #225.
2012-04-05 15:24:26 -07:00
Matt Pharr
4f8cf019ca Add pass to verify module before starting optimizations. 2012-04-05 08:49:39 -07:00
Matt Pharr
4c9ac7fcf1 Fix build with LLVM 2.9. 2012-04-05 08:22:40 -07:00
Matt Pharr
1dac05960a Fix build with LLVM 3.1 ToT 2012-04-05 08:17:56 -07:00
Matt Pharr
c27418da77 Add checks about references to non-lvalues.
Both ReturnStmt and DeclStmt now check the values being associated
with references to make sure that they are legal (e.g. it's illegal
to assign a varying lvalue, or a compile-time constant to a reference
type).  Previously we didn't catch this and would end up hitting
assertions in LLVM when code did this stuff.

Mostly fixes issue #225 (except for adding a FAQ about what this
error message means.)
2012-04-04 05:56:22 -07:00
Matt Pharr
637d076e99 Remove half/float conversion functions from AVX2 output.
(We were leaving around unused/unnecessary __half_to_float_uniform 
and the like, which in turn called out to the corresponding instruction.)
2012-04-03 12:18:38 -07:00
Matt Pharr
391678a5b3 Update function overload resolution logic.
Closer compatibility with C++: given a non-reference type, treat matching
to a non-const reference of that type as a better match than a const
reference of that type (rather than both being equal cost).

Issue #224.
2012-04-03 10:40:41 -07:00
Matt Pharr
4cd0cf1650 Revamp handling of function types, conversion to function ptr types.
Implicit conversion to function types is now a more standard part of
the type conversion infrastructure, rather than special cases of things
like FunctionSymbolExpr immediately returning a pointer type, etc.

Improved AddressOfExpr::TypeCheck() to actually issue errors in cases
where it's illegal to take the address of an expression.

Added AddressOfExpr::GetConstant() implementation that handles taking
the address of functions.

Issue #223.
2012-04-03 10:09:07 -07:00
Matt Pharr
b813452d33 Don't issue a slew of warnings if a bogus cpu type is specified.
Issue #221.
2012-04-03 06:13:28 -07:00
Matt Pharr
eb85da81e1 Further improvements to error reporting with function types.
Issue #219.
2012-04-03 05:55:50 -07:00
Matt Pharr
920cf63201 Improve error message about incompatible function types.
When reporting that a function has illegally been overloaded only
by return type, include "task", "export", and "extern "C"", as appropriate
in the error message to make clear what the issue is.

Finishes issue #216.
2012-04-03 05:43:23 -07:00
Matt Pharr
dc09d46bf4 Don't emit type declarations for extern'ed globals in generated header files.
This actually wasn't a good idea, since we'd like ispc programs to be able to
have varying globals that it uses internally among ispc code, without having
errors about varying globals when generating headers.

Issue #214.
2012-04-03 05:36:21 -07:00
Matt Pharr
05d1b06eeb Fixes to get the C++ backend more working again. 2012-03-30 16:56:30 -07:00
Matt Pharr
c1661eb06b Allow calling GetAs{Non}ConstType() for FunctionTypes.
It's just a no-op, though, rather than an assertion failure as before.
2012-03-30 16:56:30 -07:00
Jean-Luc Duprat
e9626a1d10 Added macro PRId64 to opt.cpp for compilation on Windows 2012-03-30 16:56:30 -07:00
Matt Pharr
560bf5ca09 Updated logic for selecting target ISA when not specified.
Now, if the user specified a CPU then we base the ISA choice on that--only
if no CPU and no target is specified do we use the CPUID-based check to
pick a vector ISA.

Improvement to fix to #205.
2012-03-30 16:36:12 -07:00
Jean-Luc Duprat
512f8d8b60 Fixed binary AND to logical AND 2012-03-29 17:03:22 -07:00
Matt Pharr
87c8a89349 Make 'export' a type qualifier, not a storage class.
In particular, this makes it legal to do "extern export foo()", among
other things.

Partially addresses issue #216.
2012-03-29 13:16:55 -07:00
Matt Pharr
255791f18e Fix to get correct variable names for extern globals that are later defined. 2012-03-29 11:50:15 -07:00
Matt Pharr
d5e3416e8e Fix bug in default argument handling introduced in 540fc6c2f3 2012-03-28 14:29:58 -07:00
Matt Pharr
5b2d43f665 Fix global variable code to correctly handle extern declarations.
When we have an "extern" global, now we no longer inadvertently define
storage for it.  Further, we now successfully do define storage when we
encounter a definition following one or more extern declarations.

Issues #215 and #217.
2012-03-28 14:15:49 -07:00
Matt Pharr
540fc6c2f3 Fix bugs with default parameter values for pointer-typed function parameters.
In particular "void foo(int * ptr = NULL)" and the like work now.

Issue #197.
2012-03-28 11:51:56 -07:00
Matt Pharr
b3c5043dcc Don't enable llvm's UnsafeFPMath option when --opt=fast-math is supplied.
This was causing functions like round() to fail on SSE2, since it has code
that does:

    x += 0x1.0p23f;
    x -= 0x1.0p23f;

which was in turn being undesirably optimized away.

Fixes issue #211.
2012-03-28 10:26:39 -07:00
Matt Pharr
d0d9aae968 Fix parser so that spaces aren't needed around "..." in foreach statements.
Issue #207.
2012-03-28 10:10:51 -07:00
Matt Pharr
3270e2bf5a Call CPUID to more reliably detect level of SSE/AVX that the host supports.
Fixes, I hope, issue #205.
2012-03-28 09:20:06 -07:00
Matt Pharr
013a3e7567 Support concatenation of adjacent string literals in the parser.
Fixes issue #208.
2012-03-28 08:52:09 -07:00
Matt Pharr
8368ba8539 Add missing checks for NULL current basic block in stmt code.
Fixes crashes if, for example, these statement types appeared after early
returns in the middle of functions.
2012-03-28 08:48:33 -07:00
Matt Pharr
ca0310e335 Merge pull request #213 from nipunn1313/master
Fixed compiler warning in expression type caster
2012-03-28 06:41:00 -07:00
Nipunn Koorapati
4690a678c1 Added parentheses around a || b && c statement in TypeCastExpr
to placate the compiler warning and make the code easier to understand.
2012-03-28 02:44:28 -04:00
Matt Pharr
f8a39402a2 Implement new, simpler function overload resolution algorithm.
We now give each conversion a cost and then find the minimum sum
of costs for all of the possible overloads.

Fixes issue #194.
2012-03-27 13:25:11 -07:00
Jean-Luc Duprat
b923e4daea Added macro PRId64 to opt.cpp for compilation on Windows 2012-03-27 12:46:59 -07:00
Matt Pharr
247775d1ec Fix type conversion to allow array -> void * conversions.
Fixes issue #193.
2012-03-27 10:07:54 -07:00
Matt Pharr
6e9fea377d Type convert NULL to other pointer types for function call arguments.
Fixes issue #198.
2012-03-27 09:50:21 -07:00
Matt Pharr
ca5c65d032 Fix bugs where typecasting an expression to void would cause it to disappear.
This was obviously problematic in cases where the expression was a function
call or the like, with side effects.

Fixes issue #199.
2012-03-27 09:33:43 -07:00
Matt Pharr
f9dc621ebe Fix bug when doing pointer math with varying integer offsets.
We were incorrectly trying to type convert the varying offset to a
uniform value, which in turn led to an incorrect compile-time error.

Fixes issue #201.
2012-03-27 09:17:40 -07:00
Matt Pharr
ffe484c31e Implement simpler approach for header file struct emission.
Rather than explicitly building a DAG and doing a topological sort,
just traverse structs recursively and emit declarations for all of
their dependent structs before emitting the original struct declaration.

Not only is this simpler than the previous implementation, but it
fixes a bug where we'd hit an assert if we had a struct with multiple
contained members of another struct type.
2012-03-27 09:06:10 -07:00
Matt Pharr
62cd3418ca Add test for the bug in issue #204. 2012-03-27 09:04:45 -07:00
Matt Pharr
d8a8f3a996 For symbols that are references, return uniform ptr type as lvalue type.
Fixes issue #204.
2012-03-27 08:52:14 -07:00
Matt Pharr
0ad8dbbfc9 Fix documentation bug: atan2 arguments were reversed.
Issue #203.
2012-03-27 08:03:02 -07:00
Matt Pharr
e15a1946c6 Documentation: add ISPC_TARGET_AVX2 as a possible target #define 2012-03-27 08:02:39 -07:00
Matt Pharr
8878826661 Add non-short-circuiting and(), or(), select() to stdlib. 2012-03-26 09:37:59 -07:00
Matt Pharr
95a8b6e5e8 Fix & vs. && in logical test.
Issue #196.
2012-03-25 17:38:34 -07:00
Matt Pharr
388d0d2cfd Add #include <string.h>
Fixes build on linux and windows.  (Strangely, this didn't break the
OSX build.)

Issue #195.
2012-03-25 17:38:15 -07:00
Matt Pharr
d3a374e71c Fix malformed program crasher. 2012-03-25 13:10:23 -07:00
Matt Pharr
1da2834b1e Allow the last member of a struct to be an unsized/zero-length array.
This enables the C truck of allocating a dynamic amount of storage for
the struct in order to extend out the array to the desired length.
2012-03-25 13:10:12 -07:00
Matt Pharr
ca3100874f Add FAQ about why varying values can't be passed to exported functions. 2012-03-25 11:35:28 -07:00
Matt Pharr
117f48a331 Don't include foreach stmts in cost estimates from EstimateCost().
Because they reestablish an 'all on' mask inside their body, it doesn't
make sense to include their cost when evaluating whether it's worth
re-establishing an 'all on' mask dynamically.  (This does mean that
EstimateCost()'s return value isn't the most obvious thing, but currently
in all the cases where we need it, this is the more appropriate value to
return.)
2012-03-25 10:32:44 -07:00
Matt Pharr
89bbceefee Make sure that foreach() statements never execute with an "all off" mask. 2012-03-25 10:07:12 -07:00
Matt Pharr
7e18f0e247 Small improvement to float->half function in stdlib.
Rewrite things to be able to do a float MINPS, for slightly
better code on SSE2 (which has that but not an signed int
min).  SSE2 code now 23 instructions (vs 21 intrinsics).
2012-03-23 16:09:32 -07:00
Jean-Luc Duprat
29c2f24faf Merge branch 'master' of git://github.com/ispc/ispc 2012-03-22 16:33:05 -07:00
Matt Pharr
3bb2dee275 Update float_to_half() with more efficient version from @rygorous 2012-03-22 13:36:26 -07:00
Matt Pharr
88cd5584e8 Add Debug() statement to report on if stmt cost/safety test results. 2012-03-22 13:36:26 -07:00
Jean-Luc Duprat
41f9ce2560 Merge branch 'master' of git://github.com/ispc/ispc 2012-03-22 10:02:05 -07:00
Matt Pharr
20044f5749 Distinguish between dereferencing pointers and references.
We now have separate Expr implementations for dereferencing pointers
and automatically dereferencing references.  This is in particular
necessary so that we can detect attempts to dereference references
with the '*' operator in programs and issue an error in that case.

Fixes issue #192.
2012-03-22 06:48:02 -07:00
Jean-Luc Duprat
833f0a6aa7 Merge branch 'master' of git://github.com/ispc/ispc 2012-03-21 17:07:18 -07:00
Matt Pharr
10c5ba140c Much more efficient half_to_float() code, via @rygorous.
Also, switch deferred shading example to use it. (Rather than
the "fast" half to float that doesn't handle deforms, etc.)
2012-03-21 16:13:04 -07:00
Matt Pharr
316de0b880 Make various Expr::EstimateCost() implementations return 0 if operand(s) are constants.
(Assume that constant folding will make these be free.)
2012-03-21 16:12:35 -07:00
Matt Pharr
989966f81b Annotate std lib functions with __declspec safe, cost, as appropriate. 2012-03-21 16:12:32 -07:00
Matt Pharr
ccd550dc52 __declspec support for function declarations.
safe: indicates that the function can safely be called with an "all off"
execution mask.

costN: (N an integer) overrides the cost estimate for the function with
the given value.
2012-03-21 16:11:50 -07:00
Matt Pharr
ddf350839a Add ability to parse __declspec lists to parser. 2012-03-21 16:11:50 -07:00
Matt Pharr
6a7dd2787a Fix bug in check for varying parameters in exported functions.
In particular, we weren't checking to see if the pointed-to type of
pointer parameters was varying.

Fixes issue #191.
2012-03-21 10:06:53 -07:00
Jean-Luc Duprat
385771e73e Merge branch 'master' of git://github.com/ispc/ispc 2012-03-20 13:31:31 -07:00
Matt Pharr
349ab0b9c5 Bump version number to 1.2.1dev 2012-03-20 12:46:23 -07:00
Matt Pharr
b5e6c6a2f3 update news to include paper 2012-03-20 12:05:23 -07:00
Matt Pharr
2832ea641f Release notes, bump doxygen version for 1.2.0 release 2012-03-20 11:58:39 -07:00
Matt Pharr
cb7edf2725 Set version to 1.2.0 for release builds 2012-03-20 11:13:50 -07:00
Matt Pharr
f1f1be2822 Remove twine op that caused crash on Windows, fix warning 2012-03-20 11:13:02 -07:00
Matt Pharr
7dffd65609 Add __foreach_active statement to loop over active prog. instances.
For now this has the __ prefix, as an experimental feature currently only
used in the standard library implementation.  It's probably worth making
something along these lines an official feature, but I'm not sure if this
in its current form is quite the right thing.
2012-03-20 08:46:00 -07:00
Matt Pharr
2c8a44e28b Merge pull request #189 from guanqun/fix-extern-c-error
calls to C/C++ functions should not be mangled.
2012-03-20 05:55:09 -07:00
Matt Pharr
39bb95a6ee Merge pull request #190 from guanqun/fix-output-option
fix --outfile option eror
2012-03-20 05:54:29 -07:00
Lu Guanqun
da9dba80a0 fix --outfile option eror 2012-03-20 09:44:49 +08:00
Lu Guanqun
12f3285f9b calls to C/C++ functions should not be mangled.
Otherwise, linker will never find the correct function.
2012-03-20 09:27:57 +08:00
Matt Pharr
7e954e4248 Don't issue gather/scatter warnigns in the 'extra' bits of foreach loops.
With AOS data, we can often coalesce the accesses into gathers for the main
part of foreach loops but only fail on the last bits where the mask is not
all on (since the coalescing code doesn't handle mixed masks, yet.) Before,
we'd report success with coalescing and then also report that gathers were needed
for the same accesses that were coalesced, which was a) confusing, and b)
didn't accurately represent what was going on for the majority of the loop
iterations.
2012-03-19 15:08:35 -07:00
Matt Pharr
d74cc6397b Fix significant bug in mask management in code generated for 'foreach'.
In particular, we 1. weren't setting the function mask to 'all on', such that
any mixed function mask would in turn apply inside the foreach loop, and 2.
weren't always setting the internal mask to 'all on' before doing any additional
masking based on the iteration variables.
2012-03-19 15:06:35 -07:00
Matt Pharr
777343331e Print numeric version number with --verison. 2012-03-19 14:41:25 -07:00
Matt Pharr
a062653743 Add patterns to better-match code generated when accessing SOA data.
In particular, LLVMVectorIsLinear() and LLVMVectorValuesAllEqual() are able
to reason a bit about the effects of the shifts and the ANDs that are
generated from SOA indexing calculations, so that they can detect more cases
where a linear sequence of locations are in fact being accessed in
the presence of SOA data.
2012-03-19 12:04:39 -07:00
Matt Pharr
57af0eb64f Still do the gather/scatter -> load store pass even if leaving 'pseudo' mem opts unchanged. 2012-03-19 12:04:38 -07:00
Matt Pharr
60aae16752 Move check for linear vector to LLVMVectorIsLinear() function. 2012-03-19 11:57:04 -07:00
Matt Pharr
e264d95019 LLVMVectorValuesAllEqual() improvements.
Clean up the API, so the caller doesn't have to pass in a vector so
the function can track PHI nodes (do that internally instead.)

Handle casts in lValuesAreEqual().
2012-03-19 11:54:18 -07:00
Matt Pharr
0664f5a724 Add LLVMExtractVectorInts() function, use it in the opt code. 2012-03-19 11:48:38 -07:00
Matt Pharr
17c6a19527 Add LLVMExtractFirstVectorElement() function (and use it).
For cases where it turns out that we just need the first element of
a vector (e.g. because we've determined that all of the values are
equal), it's often more efficient to only compute that one value
with scalar operations than to compute the whole vector's worth and
then just use one value.  This function tries to rewrite a vector
computation to the scalar equivalent, if possible.

(Partial work-around to http://llvm.org/bugs/show_bug.cgi?id=11775.)

Note that sometimes this is the wrong thing to do--if we need the entire
vector value for other purposes, for example.
2012-03-19 11:48:33 -07:00
Matt Pharr
cbc8b8259b Use LLVMIntAsType() in opt code instead of locally-defined equivalent. 2012-03-19 11:36:00 -07:00
Matt Pharr
1067a2e4be Add LLVMShuffleVectors() and LLVMConcatVectors() functions.
These were local functions in opt.cpp that are now public via the
llvmutil.* files.
2012-03-19 11:34:52 -07:00
Matt Pharr
74a031a759 Small improvements to debug info printing in opt.cpp 2012-03-19 11:32:08 -07:00
Matt Pharr
ee437193fb Add LLVMDumpValue() utility routine 2012-03-19 11:31:27 -07:00
Matt Pharr
436c53037e Fix assertion in FunctionEmitContext::storeUniformToSOA() 2012-03-19 11:29:14 -07:00
Matt Pharr
f55ba9d3cb Remove (highly verbose) Debug() call for type conversions. 2012-03-19 11:28:55 -07:00
Matt Pharr
8adb99b768 Improve source locations reported with warnings. 2012-03-19 11:28:34 -07:00
Matt Pharr
13c42412d2 Issue perf. warning if SOA width narrower than gang size is used. 2012-03-19 11:28:16 -07:00
Matt Pharr
75507d8b35 Remove error message if old 'reference' keyword is used. 2012-03-19 11:27:53 -07:00
Matt Pharr
ddfe4932ac Fix parsing of 'launch' so that angle brackets can be removed.
Issue #6.
2012-03-19 11:27:32 -07:00
Jean-Luc Duprat
cf208cc2e3 Merge branch 'master' of git://github.com/ispc/ispc 2012-03-17 12:59:43 -07:00
Matt Pharr
28ac016928 Fix bugs in checks for varying parameters in exported functions.
In short, we inadvertently weren't checking whether pointers themselves
were varying, which in turn led to an assertion later if an exported
function did have a varying parameter.

Issue #187.
2012-03-15 07:20:36 -05:00
Jean-Luc Duprat
f4ae41d006 Merge branch 'master' of git://github.com/ispc/ispc 2012-03-14 09:58:52 -07:00
Matt Pharr
9ec8e5a275 Fix compile warnings on Linux 2012-03-12 13:12:23 -07:00
Matt Pharr
a473046058 Once again fix for LLVM 3.1 TOT API changes 2012-03-11 15:04:26 -07:00
Matt Pharr
a69b7a5a01 Fix build with LLVM 3.1 TOT 2012-03-10 13:06:53 -08:00
Matt Pharr
640918bcc0 Call fclose() in deferred example. (Andy Zhang). 2012-03-07 08:50:10 -08:00
Matt Pharr
f39fbdb3fc Add various new functions to "internal" functions list.
Building with multiple compilation targets in a single binary was
broken due to multiple symbol definitions.
2012-03-05 16:41:20 -08:00
Matt Pharr
50d4d81062 Add file in docs/ for news page on website 2012-03-05 16:10:20 -08:00
Matt Pharr
3b95452481 Add memcpy(), memmove() and memset() to the standard library.
Issue #183.
2012-03-05 16:09:00 -08:00
Matt Pharr
c152ae3c32 Add single-precision asin() and acos() to stdlib.
Issue #184.
2012-03-05 13:32:13 -08:00
Matt Pharr
f6cbaa78e8 Update stdlib documentation to match recent pointed-to default variability changes 2012-03-05 13:32:12 -08:00
Matt Pharr
7adb250b59 Added tests and documentation for soa<> rate qualifier. 2012-03-05 09:58:10 -08:00
Matt Pharr
db5db5aefd Add native support for (AO)SOA data layout.
There's now a SOA variability class (in addition to uniform,
varying, and unbound variability); the SOA factor must be a
positive power of 2.

When applied to a type, the leaf elements of the type (i.e.
atomic types, pointer types, and enum types) are widened out
into arrays of the given SOA factor.  For example, given

struct Point { float x, y, z; };

Then "soa<8> Point" has a memory layout of "float x[8], y[8],
z[8]".

Furthermore, array indexing syntax has been augmented so that
when indexing into arrays of SOA-variability data, the two-stage
indexing (first into the array of soa<> elements and then into
the leaf arrays of SOA data) is performed automatically.
2012-03-05 09:58:10 -08:00
Matt Pharr
8fdf84de04 Disable debugging printing code. 2012-03-05 09:58:09 -08:00
Matt Pharr
ff5cbe80d1 Add more files to .gitignore 2012-03-05 09:58:09 -08:00
Matt Pharr
e013e0a374 Handle extract instructions in the lGetBasePtrAndOffsets() pattern matching code. 2012-03-05 09:58:09 -08:00
Matt Pharr
b7df312ca7 Small improvements to error location reporting, assertions in expr.cpp 2012-03-05 09:58:09 -08:00
Matt Pharr
ce82c3c0ae Return from function after storing initializer value. 2012-03-05 09:58:09 -08:00
Matt Pharr
2f958cfbda Fix cases where malformed program could cause crash. 2012-03-05 09:58:09 -08:00
Matt Pharr
8ef41dfd97 Represent variability with small helper class rather than an enum.
This provides part of the basis for representing SOA width in terms
of variability, but there should be no functional changes in this
checkin.
2012-03-05 09:58:09 -08:00
Matt Pharr
3082ea4765 Require Type::Equal() for all type equality comparisons.
Previously, we uniqued AtomicTypes, so that they could be compared
by pointer equality, but with forthcoming SOA variability changes,
this would become too unwieldy (lacking a more general / ubiquitous
type uniquing implementation.)
2012-03-05 09:58:09 -08:00
Matt Pharr
e482d29951 Add LLVM{U}IntAsType() utility routine 2012-03-05 09:58:09 -08:00
Matt Pharr
ff48dd7bfb Remove unused SOAArrayType class and Type::GetSOAType() methods. 2012-03-05 09:58:09 -08:00
Matt Pharr
7bf9c11822 Add uniform variants of RNG functions to stdlib 2012-03-05 09:56:30 -08:00
Matt Pharr
f7937f1e4b Fix build with LLVM2.9/3.0 2012-03-03 10:30:56 -08:00
Matt Pharr
0115eeabfe Update deferred example to take advantage of new pointer variability rules. 2012-02-29 14:27:53 -08:00
Matt Pharr
4b9c3ec0da Fix bug in StructType::GetElementType().
We were only resolving unbound variability for the top-level type,
which isn't enough if we have e.g. an unbound-variability pointer
pointing to some type with unbound variability.
2012-02-29 14:27:53 -08:00
Matt Pharr
55b81e35a7 Modify rules for default variability of pointed-to types.
Now, the pointed-to type is always uniform by default (if an explicit
rate qualifier isn't provided).  This rule is easier to remember and
seems to work well in more cases than the previous rule from 6d7ff7eba2.
2012-02-29 14:27:53 -08:00
Matt Pharr
2a1c7f2d47 Fix bug with indexing into varying pointer w/uniform index.
Issue #182.
2012-02-25 10:19:21 -08:00
Matt Pharr
8603f9838f Issue an error if "uniform" or "varying" qualifiers are applied to void types.
Issue #179.
2012-02-21 12:26:42 -08:00
Matt Pharr
95224f3f11 Improve detection of cases where 32-bit gather/scatter can be used.
Previously, we weren't noticing that an <n x i64> zero vector could
be represented as an <n x i32> without error.
2012-02-21 12:13:25 -08:00
Matt Pharr
f81acbfe80 Implement unbound varibility for struct types.
Now, if a struct member has an explicit 'uniform' or 'varying'
qualifier, then that member has that variability, regardless of
the variability of the struct's variability.  Members without
'uniform' or 'varying' have unbound variability, and in turn
inherit the variability of the struct.

As a result of this, now structs can properly be 'varying' by default,
just like all the other types, while still having sensible semantics.
2012-02-21 10:28:31 -08:00
Matt Pharr
6d7ff7eba2 Update defaults for variability of pointed-to types.
Now, if rate qualifiers aren't used to specify otherwise, varying
pointers point to uniform types by default.  As before, uniform
pointers point to varying types by default.

   float *foo;  // varying pointer to uniform float
   float * uniform foo;  // uniform pointer to varying float

These defaults seem to require the least amount of explicit
uniform/varying qualifiers for most common cases, though TBD if it
would be easier to have a single rule that e.g. the pointed-to type
is always uniform by default.
2012-02-21 06:27:34 -08:00
Matt Pharr
ad429db7e8 Generate more efficient code for variable initializers.
If the initializer is a compile-time constant (or at least a part of it
is), then store the constant value in a module-local constant global
value and then memcpy the value into the variable.  This, in turn,
turns into much better assembly in the end.

Issue #176.
2012-02-14 13:51:23 -08:00
Matt Pharr
4c07abbaf4 Support returning NULL pointer values from ConstExpr::GetConstant() 2012-02-14 13:49:18 -08:00
Matt Pharr
e3c0551129 Handle uniform short-vector types in ExprList::GetConstant() 2012-02-14 13:48:43 -08:00
Matt Pharr
8971baa42b Fix silly bug in ConstExpr::GetConstant() with enum types.
(They would be incorrectly matched as int8 types.)
2012-02-14 13:48:10 -08:00
Matt Pharr
317a1f51f7 Allow fewer initializer values in initializer expr lists than expected.
We now match C's behavior, where if we have an initializer list with
too-few values for the underlying type, any additional elements are
initialized to zero.

Fixes issue #123.
2012-02-14 13:47:11 -08:00
Matt Pharr
c63d139482 Add FunctionEmitContext::MemcpyInst() 2012-02-14 13:43:59 -08:00
Matt Pharr
9e682362e9 Fix bug in ArrayType::SizeUnsizedArrays().
If given an initializer list with too many elements for the actual array
size, in some cases we would incorrectly resize the explicitly sized array
to be the size implied by the initializer list.
2012-02-14 13:43:38 -08:00
Matt Pharr
56ec939692 Add perfbench to examples.sln for Windows 2012-02-14 10:07:08 -08:00
Matt Pharr
a86b942730 Fix cases in coalesce opt where offsets would be truncated to 32 bits 2012-02-14 10:05:07 -08:00
Matt Pharr
52eb4c6014 Fix warnings with Windows build 2012-02-14 10:01:45 -08:00
Matt Pharr
f4adbbf90c Merge a number of cbackend changes from the LLVM dev tree.
This fixes a number of failing tests with LLVM 3.1svn when
using the generic targets.

Issue #175.
2012-02-13 16:52:38 -08:00
Matt Pharr
cc86e4a7d2 Disable coalescing optimizations when using generic target.
The main issue is that they end up generating a number of smaller
vector ops (e.g. 4-wide and 8-wide on the 16-wide generic target,
which the examples/intrinsics implementations don't currently
support.

This fixes a number of failing tests for now; it may be worth
generalizing the stuff in examples/intrinsics at some point,
since as a general principle, e.g. if generating LLVM IR output,
the coalescing optimizations are still desirable.

Issue #175.
2012-02-13 16:52:01 -08:00
Matt Pharr
e864447e4a Fix silly bug in vector scale extraction optimization.
(Introduced in f20a2d2ee.  How did this ever pass tests?)
2012-02-13 12:06:45 -08:00
Matt Pharr
73bf552cd6 Add support for coalescing memory accesses from gathers.
There are two related optimizations that happen now.  (These
currently only apply for gathers where the mask is known to be
all on, and to gathers that are accessing 32-bit sized elements,
but both of these may be generalized in the future.)

First, for any single gather, we are now more flexible in mapping it
to individual memory operations.  Previously, we would only either map
it to a general gather (one scalar load per SIMD lane), or an 
unaligned vector load (if the program instances could be determined
to be accessing a sequential set of locations in memory.)

Now, we are able to break gathers into scalar, 2-wide (i.e. 64-bit),
4-wide, or 8-wide loads.  Further, we now generate code that shuffles
these loads around.  Doing fewer, larger loads in this manner, when
possible, can be more efficient.

Second, we can coalesce memory accesses across multiple gathers. If 
we have a series of gathers without any memory writes in the middle,
then we try to analyze their reads collectively and choose an efficient
set of loads for them.  Not only does this help if different gathers
reuse values from the same location in memory, but it's specifically
helpful when data with AOS layout is being accessed; in this case,
we're often able to generate wide vector loads and appropriate shuffles
automatically.
2012-02-10 13:10:39 -08:00
Matt Pharr
f20a2d2ee9 Generalize code to extract scales by 2/4/8 from addressing calculations.
Now, if we have a scale by 16, say, we extract out the scalar scale
of 8 and leave an explicit scale by 2.
2012-02-10 12:35:44 -08:00
Matt Pharr
0c25bc063c Add lGEPInst() utility routine to opt.cpp.
Deal with the messiness of LLVM API changes when creating
these in a single place.
2012-02-10 12:32:15 -08:00
Matt Pharr
db72781d2a Fix C++ backend to not assert with LLVM 3.1 svn builds. 2012-02-10 12:30:31 -08:00
Matt Pharr
0c8ad09040 Fix placement of ParserInit() call
This makes it possible to use fuzz testing even without --nostdlib!
2012-02-10 12:29:57 -08:00
Matt Pharr
49880ab761 Constant fold more cases in SelectExpr::Optimize()
Specifically, if both of the expressions are compile-time constants
and the condition is a varying compile-time constant (even if not 
all true or all false), then we can assemble a compile-time constant
result.
2012-02-10 12:28:54 -08:00
Matt Pharr
fe2d9aa600 Add perfbench to examples: a few small microbenchmarks. 2012-02-10 12:27:13 -08:00
Matt Pharr
1dead425e4 Don't indent *too* much on continued lines with warnings/errors. 2012-02-10 12:26:35 -08:00
Matt Pharr
adb1e47a59 Add FAQ about how to cross-inline ispc and C/C++ code. 2012-02-10 12:26:19 -08:00
Matt Pharr
ffba8580c1 Make sure that non-zero exit code is returned when input file not found.
Fixes issue #174.
2012-02-08 19:53:05 -08:00
Alex Reece
ea18427d29 Remove UnwindInst
Code no longer builds against head of LLVM branch after revision 149906
removed the unwind instruction.
2012-02-07 15:46:22 -08:00
Jean-Luc Duprat
97d42f5c53 Merge remote-tracking branch 'matt/master' 2012-02-07 12:50:31 -08:00
Matt Pharr
f3089df086 Improve error handling and reporting in the parser.
Add a number of additional error cases in the grammar.

Enable bison's extended error reporting, to get better messages about the
context of errors and the expected (but not found) tokens at errors.

Improve the printing of these by providing an implementation of yytnamerr
that rewrites things like "TOKEN_MUL_ASSIGN" to "*=" in error messages.

Print the source location (using Error() when yyerror() is called; wiring
this up seems to require no longer building a 'pure parser' but having
yylloc as a global, which in turn led to having to update all of the uses of
it (which previously accessed it as a pointer).

Updated a number of tests_errors for resulting changesin error text.
2012-02-07 11:13:32 -08:00
Matt Pharr
157e7c97ae Fix a variety of cases in the parser that could crash with malformed programs. 2012-02-07 11:08:00 -08:00
Matt Pharr
bb8e13e3c9 Add support for -I command-line argument to specify #include search directories. 2012-02-07 08:39:01 -08:00
Matt Pharr
5b4673e8eb Fix build with LLVM 2.9. 2012-02-07 08:37:13 -08:00
Matt Pharr
5b9de8cc07 Fix test to account for updated error message. 2012-02-07 08:36:56 -08:00
Matt Pharr
33ea934c8f Fix over-aggressive check in DereferenceExpr::TypeCheck()
(Reference types are allowed as well.)
2012-02-07 08:18:33 -08:00
Matt Pharr
6b3e14b0a4 Add command-line option to enable debugging output from parser. 2012-02-06 15:35:43 -08:00
Matt Pharr
098ceb5567 Issue error on attempted type convert from/to function type. 2012-02-06 15:35:43 -08:00
Matt Pharr
8e2b0632e8 Issue an error if an array of references is declared.
(More malformed program fixes.)
2012-02-06 15:35:43 -08:00
Matt Pharr
420d373d89 Move assert so that an error is issued for "break" outside of loops. 2012-02-06 15:35:43 -08:00
Matt Pharr
a59fd7eeb3 Fix a missing return value in the parser. 2012-02-06 15:35:43 -08:00
Matt Pharr
ee91fa1228 Make sure the program doesn't have a dereference of a non-pointer type. 2012-02-06 15:35:43 -08:00
Matt Pharr
a2b5ce0172 Add --help-dev option, only print developer options when it is used. 2012-02-06 15:35:43 -08:00
Matt Pharr
3efbc71a01 Add fuzz testing of input programs.
When the --fuzz-test command-line option is given, the input program
will be randomly perturbed by the lexer in an effort to trigger
assertions or crashes in the compiler (neither of which should ever
happen, even for malformed programs.)
2012-02-06 15:34:47 -08:00
Matt Pharr
b7c5af7e64 Prohibit returning functions from functions.
(Fix malformed program crasher)
2012-02-06 14:46:03 -08:00
Matt Pharr
f939015b97 Default to int32 for declarations without specified types.
(e.g. "uniform foo" == "uniform int32 foo")
2012-02-06 14:46:03 -08:00
Matt Pharr
a9ed71f553 Bug fixes to avoid NULL pointer derefs with malformed programs. 2012-02-06 14:45:58 -08:00
Matt Pharr
96a429694f 80 column fixes 2012-02-06 14:44:55 -08:00
Matt Pharr
fddc5e022e Fix typo in IfStmt::EstimateCost() 2012-02-06 14:44:54 -08:00
Matt Pharr
2236d53def Issue error if &=, |=, ^=, <<=, or >>= used with floats. 2012-02-06 14:44:54 -08:00
Matt Pharr
4e018d0a20 Improve tracking of source position in the presence of /* */ comments.
Don't let the preprocessor remove comments anymore, so that the rules
in lex.ll can handle them.  Fix lCComment() to update the source
position as it eats characters in comments.
2012-02-06 14:44:54 -08:00
Matt Pharr
977b983771 Issue error on "void" typed variable, function parameter, or struct member. 2012-02-06 14:44:48 -08:00
Matt Pharr
fa7a7fe23e Fix error handling in type code. 2012-02-06 12:39:14 -08:00
Matt Pharr
724a843bbd Add --quiet option to supress all diagnostic output 2012-02-06 12:39:09 -08:00
Matt Pharr
a9ec745275 Release notes, bump doxygen release number for 1.1.4 2012-02-04 15:38:17 -08:00
Matt Pharr
c2ecc15b93 Add missing "varying/varying" atomic_compare_exchange_global() functions. 2012-02-03 13:19:15 -08:00
Matt Pharr
83c8650b36 Add support for "local" atomics.
Also updated aobench example to use them, which in turn allows using
foreach() and thence a much cleaner implementation.

Issue #58.
2012-02-03 13:15:21 -08:00
Matt Pharr
89cb809922 Short-circuit evaluation of ? : operator for varying tests.
? : now short-circuits evaluation of the expressions following
the boolean test for varying test types.  (It already did this
for uniform tests).

Issue #169.
2012-02-01 11:03:58 -08:00
Matt Pharr
fdb4eaf437 Fix bug in &&/|| short-circuiting.
Use full mask, not internal mask when checking "any lanes running"
before evaluating expressions.

Added some more tests to try to cover this case.
2012-02-01 08:17:25 -08:00
Matt Pharr
0432f97555 Fix build with LLVM 3.1 TOT 2012-01-31 14:10:07 -08:00
Matt Pharr
8d1631b714 Constant fold in SelectExpr::Optimize().
Resolves issue #170.
2012-01-31 12:22:11 -08:00
Matt Pharr
dac091552d Fix errors in tests for scalar target.
Issue #167.
2012-01-31 11:57:12 -08:00
Matt Pharr
ea027a95a8 Fix various places in deferred shading example that assumed programCount >= 4.
This gets deferred closer to working with the scalar target, but there are still
some issues.  (Partially in gamma correction / final clamping, it seems.)

This fix causes a ~0.5% performance degradation with e.g. the AVX target, 
though it's not clear that it's worth having a separate code path in order to
not lose this small amount of perf.

(Partially addresses issue #167)
2012-01-31 11:46:33 -08:00
Matt Pharr
f73abb05a7 Fix bug in handling scatters where all instances go to the same location.
Previously, we'd pick one lane and generate a regular store for its value.
This was the wrong thing to do, since we also should have been checking
that the mask was on (for the lane that was chosen).  This bug didn't
become evident until the scalar target was added, since many stores fall
into this case with that target.

Now, we just leave those as regular scatters.

Fixes most of the failing tests for the scalar target listed in issue #167.
2012-01-31 11:06:14 -08:00
Matt Pharr
d71c49494f Missed pass that should be skipped when pseudo memory ops are supposed to be left unchanged. 2012-01-31 11:02:23 -08:00
Matt Pharr
25665f0841 Implement NullPointerExpr::GetConstant()
Also reworked TypeCastExpr::GetConstant() to just forward the request along
and moved the code that was previously there to handle uniform->varying
smears of function pointers to FunctionSymbolExpr::GetConstant().

Fixes issue #168.
2012-01-31 09:37:39 -08:00
Matt Pharr
1eec27f890 Scalar target fixes.
Don't issue warnings about all instances writing to the same location if
there is only one program instance in the gang.

Be sure to report that all values are equal in one-element vectors in
LLVMVectorValuesAllEqual().

Issue #166.
2012-01-31 08:52:11 -08:00
Matt Pharr
950f86200b Fix examples/tasksys.cpp to compile with 32-bit targets.
(Change a cmpxchgd to cmpxchl.)  Note that a number of the examples
still don't work with 32-bit compilation, why still TBD.
2012-01-30 15:03:54 -08:00
Matt Pharr
e19f4931d1 Short-circuit evaluation of && and || operators.
We now follow C's approach of evaluating these: we don't evaluate
the second expression in the operator if the value of the first one
determines the overall result.  Thus, these can now be used 
idiomatically like (index < limit && array[index] > 0) and such.

For varying expressions, the mask is set appropriately when evaluating
the second expression.

(For expressions that can be determined to be both simple and safe to
evaluate with the mask all off, we still evaluate both sides and compute
the logical op result directly, which saves a number of branches and tests.
However, the effect of this should never be visible to the programmer.)

Issue #4.
2012-01-30 05:58:41 -08:00
Matt Pharr
0575b1f38d Update run_tests and examples makefile for scalar target.
Fixed a number of tests that didn't handle the programCount == 1
case correctly.
2012-01-29 16:22:25 -08:00
Matt Pharr
f6cd01f7cf Windows build support for scalar target. 2012-01-29 13:48:01 -08:00
Matt Pharr
f2fbc168af Scalar target builtins bugfixes.
Typo in __max_varying_double.
Add declarations for half functions.
Use the gen_scatter macro to get the scatter functions.
2012-01-29 13:47:44 -08:00
Matt Pharr
b50f6f1730 Fix RNG seed code in stdlib for scalar target. 2012-01-29 13:46:57 -08:00
Matt Pharr
f8a7120d9c Detect division by 0 during constant folding and issue a sensible error. 2012-01-29 13:46:38 -08:00
Matt Pharr
20dbf59420 Don't lose source position when returning values of constant symbols. 2012-01-29 13:46:17 -08:00
Gabe Weisz
c67a286aa6 Add support for 1-wide scalar target.
Issue #40.
2012-01-29 06:36:07 -08:00
Matt Pharr
c96fef6bc8 Fix silly error in generic-16.h example C++ bindings. 2012-01-27 17:04:57 -08:00
Matt Pharr
bba02f87ea Improve implementations of unsigned <=, >= in sse4 intrinsics file. 2012-01-27 16:49:41 -08:00
Matt Pharr
12dc3f5c28 Fixes to c++ backend for new and delete
Don't include declarations of malloc/free in the generated code (get
the standard ones from system headers instead).

Add a cast to (uint8_t *) before calls to malloc, which C++ requires,
since proper malloc returns a void *.
2012-01-27 16:49:09 -08:00
Matt Pharr
0f01a5dcbe Handle undef values in LLVMVectorValuesAllEqual() 2012-01-27 16:48:14 -08:00
Matt Pharr
664dc3bdda Add support for "new" and "delete" to the language.
Issue #139.
2012-01-27 14:47:06 -08:00
Matt Pharr
bdba3cd97d Bugfix: add per-lane offsets when accessing varying data through a pointer! 2012-01-27 14:44:52 -08:00
Matt Pharr
d9c0f9315a Fix generic targets: half conversion functions weren't declared.
(Broken by 1867b5b31).
2012-01-27 14:44:43 -08:00
Matt Pharr
b7f17d435f Fix crash in gather/scatter optimization pass. 2012-01-27 14:44:35 -08:00
Matt Pharr
37cdc18639 Issue error instead of crashing given attempted function call through non-function.
Fixes issue #163.
2012-01-27 10:01:06 -08:00
Matt Pharr
5893a9c49d Remove incorrect assert 2012-01-27 09:14:45 -08:00
Matt Pharr
24f58fa16a Update per_lane macro to not use ID for lane number in macro expansion
This was leading to unintended consequences if WIDTH was used in macro code,
which was undesirable.
2012-01-27 09:12:13 -08:00
Matt Pharr
56ffc78fa4 Require semicolons after sync, assert, and print statements.
(Silly parser oversight.)
2012-01-27 09:12:13 -08:00
Matt Pharr
061e68bc77 Fix compiler crash from malformed program. 2012-01-27 09:12:13 -08:00
Matt Pharr
177e6312b4 Fix build with LLVM ToT (ConstantVector::getVectorElements() is gone now). 2012-01-27 09:07:58 -08:00
Matt Pharr
1acf4032c2 Merge branch 'master' of https://github.com/jduprat/ispc 2012-01-26 14:18:25 -08:00
Jean-Luc Duprat
0db752f3a2 Merge branch 'master' of github.com:jduprat/ispc 2012-01-26 13:43:15 -08:00
Jean-Luc Duprat
9c5444698e run_tests.py fixes:
- Python 3 fixes (can't use print)
 - Fixed for running tests on Windows
2012-01-26 13:39:54 -08:00
Matt Pharr
65f3252760 Various fixes to test running script for Windows.
Also, removed the --valgrind option and replaced it with a more
general --wrap-exe option, which can be used both for running
Valgrind and SDE.
2012-01-26 10:56:29 -08:00
Matt Pharr
e612abe4ba Fix parsing of 64-bit integer constants on Windows.
(i.e., use the 64-bit unsigned integer parsing function,
not the 64-bit signed one.)

Fixes bug #68.
2012-01-26 10:56:28 -08:00
Jean-Luc Duprat
ee8b6ebbf6 Merge remote-tracking branch 'matt/master' 2012-01-26 10:41:13 -08:00
Jean-Luc Duprat
34352e4e0e beefed up stdin.h on Windows so it compiles ispc 1.1.3 2012-01-25 15:04:19 -08:00
Matt Pharr
1867b5b317 Use native float/half conversion instructions with the AVX2 target. 2012-01-24 15:33:38 -08:00
Matt Pharr
a5b7fca7e0 Extract constant offsets from gather/scatter base+offsets offset vectors.
When we're able to turn a general gather/scatter into the "base + offsets"
form, we now try to extract out any constant components of the offsets and
then pass them as a separate parameter to the gather/scatter function
implementation.

We then in turn carefully emit code for the addressing calculation so that
these constant offsets match LLVM's patterns to detect this case, such that
we get the constant offsets directly encoded in the instruction's addressing
calculation in many cases, saving arithmetic instructions to do these
calculations.

Improves performance of stencil by ~15%.  Other workloads unchanged.
2012-01-24 14:41:15 -08:00
Matt Pharr
7be2c399b1 Rename various optimization passes to have more descriptive names.
No functionality change.
2012-01-23 14:49:48 -08:00
Matt Pharr
d6337b3b22 Code cleanups in opt.cpp; no functional change 2012-01-23 14:36:32 -08:00
Matt Pharr
d2f8b0ace5 Add __clock to list of symbols to make internal from builtins. 2012-01-23 06:19:16 -08:00
Matt Pharr
d805e8b183 Add clock() function to standard library.
Also corrected the declaration of num_cores() to return a
uniform value.
2012-01-22 13:05:27 -08:00
Matt Pharr
1f0f2ec05f Include AVX2 in supported ISAs 2012-01-22 07:05:47 -08:00
Matt Pharr
91ac3b9d7c Back out WIP changes to opt.cpp that were inadvertently checked in. 2012-01-21 07:34:53 -08:00
Matt Pharr
d65bf2eb2f Doxygen number bump and release notes for 1.1.3 2012-01-20 17:04:16 -08:00
Matt Pharr
1bba9d4307 Improve atomic_swap_global() to take advantage of associativity.
We now do a single atomic hardware swap and then effectively do 
swaps between the running program instances such that the result
is the same as if they had happened to run a particular ordering
of hardware swaps themselves.

Also cleaned up __atomic_swap_uniform_* built-in implementations
to not take the mask, which they weren't using anyway.

Finishes Issue #56.
2012-01-20 10:37:33 -08:00
Matt Pharr
4388338dad Fix performance regression introduced in be0c77d556
Effectively, the patterns that detected when given a gather or
scatter in base+offsets form, the offsets were actually a multiple
of 2/4/8, were no longer working.

This change not only fixes this, but also expands the set of
patterns that are matched by this.  For example, given offsets of
the form 4*v1 + 16*v2, it identifies a scale of 4 and new offsets
of v1 + 4*v2.

This fix makes the volume renderer run 1.19x faster, and noise 1.54x
faster.
2012-01-19 17:57:59 -08:00
Matt Pharr
2fb59c90cf Fix C++ backend bug introduced in d14a2de168.
(This was causing a number of tests to fail with the generic
targets.)
2012-01-19 11:35:02 -07:00
Matt Pharr
68f6ea8def For << and >> with C++, detect when all instances are shifting by the same amount.
In this case, we now emit calls to potentially-specialized functions for the
left/right shifts that take a single integer value for the shift amount.  These
in turn can be matched to the corresponding intrinsics for the SSE target.

Issue #145.
2012-01-19 10:04:32 -07:00
Matt Pharr
3f89295d10 Update RNG code in stdlib to use -> operator where appropriate. 2012-01-19 10:02:47 -07:00
Matt Pharr
748b292e77 Improve code for uniform switches with a 'break' under varying control flow.
Previously, when we had a switch statement with a uniform switch condition
but a 'break' statement that was under varying control flow inside the
switch, we'd promote the switch condition to be varying so that the
break would work correctly.

Now, we leave the condition as uniform and are thus able to use the
more-efficient LLVM switch instruction in this case.

Issue #156.
2012-01-19 08:41:19 -07:00
Matt Pharr
6451c3d99d Fix bug with code for initializers for static arrays in generated C++ code.
(This was preventing aobench from compiling successfully with the generic
target.)
2012-01-18 16:55:09 -07:00
Matt Pharr
d14a2de168 Fix generic code emission when building with LLVM3.0/2.9.
Specifically, don't use vector select for masked store blend there,
but emit a call to a undefined __masked_store_blend_*() functions.

Added implementations of these functions to the sse4.h and generic-16.h
in examples/instrinsics.  (Calls to these will never be generated with
LLVM 3.1).
2012-01-17 23:42:22 -07:00
Matt Pharr
642150095d Include LLVM version used to build in version info printed out. 2012-01-17 23:42:22 -07:00
Matt Pharr
3bf3ac7922 Be more conservative about using blending in place of masked store.
More specifically, we do a proper masked store (rather than a load-
blend-store) unless we can determine that we're accessing a stack-allocated
"varying" variable.  This fixes a number of nefarious bugs where given
code like:

    uniform float a[21];
    foreach (i = 0 … 21)
        a[i] = 0;

We'd use a blend and in turn read past the end of a[] in the last
iteration.

Also made slight changes to inlining in aobench; this keeps compiles
to ~5s, versus ~45s without them (with this change).

Fixes issue #160.
2012-01-17 23:42:22 -07:00
Matt Pharr
c6d1cebad4 Update masked_load/store implementations for generic targets to take void *s
(Fixes compile errors when we try to actually use these!)
2012-01-17 23:42:22 -07:00
Matt Pharr
08189ce08c Update "inline" qualifiers in a few examples. 2012-01-17 23:42:22 -07:00
Matt Pharr
7013d7d52f Small documentation updates and cleanups 2012-01-17 23:42:21 -07:00
Matt Pharr
7045b76f84 Improvements to code generation for "foreach"
Specialize the code for the innermost loop to not do any masking
computations for the innermost dimension for the iterations where
we are certainly working on a full vector's worth of data.

This fix improves performance/code quality of "foreach" such that
it's essentially the same as the equivalent "for" loop.

Fixes issue #151.
2012-01-17 11:34:00 -08:00
Matt Pharr
58a0b4a20d Add separate set of builtins for AVX2.
(i.e., stop just reusing the ones for AVX1).

For now the only difference is that the int/uint min/max
functions call the new intrinsic for that.  Once gather is
available from LLVM, that will go here as well.
2012-01-13 14:40:01 -08:00
Matt Pharr
0f8eee9809 Fix cases in optimization code to not inadvertently match calls to func ptrs.
If we call a function pointer, CallInst::getCalledFunction() returns NULL; we
need to be careful about this case when we're matching various function calls
in optimization passes.

(Fixes a crash.)
2012-01-12 10:33:06 -08:00
Matt Pharr
0740299860 Fix switch test 2012-01-12 09:45:31 -08:00
Matt Pharr
652215861e Update dynamic target dispatch code to support AVX2. 2012-01-12 08:37:18 -08:00
Matt Pharr
602209e5a8 Tiny updates to documentation, comment for switch stuff. 2012-01-12 05:55:42 -08:00
Matt Pharr
b60f8b4f70 Fix merge conflicts 2012-01-11 17:13:51 -08:00
Jean-Luc Duprat
f2b99ccb08 Made run_tests.py executable 2012-01-11 10:06:41 -08:00
Matt Pharr
b67446d998 Add support for "switch" statements.
Switches with both uniform and varying "switch" expressions are
supported.  Switch statements with varying expressions and very
large numbers of labels may not perform well; some issues to be
filed shortly will track opportunities for improving these.
2012-01-11 09:16:31 -08:00
Matt Pharr
9670ab0887 Add missing cases to watch out for in lCheckAllOffSafety()
Previously, we weren't checking for member expressions that dereferenced
a pointer or pointer dereference expressions--only array indexing!
2012-01-11 09:16:31 -08:00
Matt Pharr
0223bb85ee Fix bug in StmtList::EmitCode()
Previously, we would return immediately if the current basic block
was NULL; however, this is the wrong thing to do in that goto labels
and case/default labels in switch statements will establish a new
current basic block even if the current one is NULL.
2012-01-11 09:14:39 -08:00
Jean-Luc Duprat
fd81255db1 Removed mutex support for OSX 10.5
Allow to run from the build directory even if it is not on the path
properly decode subprocess stdout/stderr as UTF-8
Added newlines that were mistakenly left out of print->sys.stdout.wriote() conversion in previous CL
Python 3:
 - fixed error message comparison
 - explicit list creation
Windows:
 - forward/back slash annoyances
 - added stdint.h with definitions for int32_t, int64_t
 - compile_error_files and run_error_files were being appended to improperly
2012-01-10 16:55:00 -08:00
Matt Pharr
8a8e1a7f73 Fix bug with multiple EmitCode() calls due to missing braces.
In short, we were inadvertently trying to emit each function's
code a second time if the function had a mask check at the start
of it.  StmtList::EmitCode() was covering this error up by
not emitting code if the current basic block is NULL.
2012-01-10 16:50:13 -08:00
Jean-Luc Duprat
ef05fbf424 run_tests.py more compatible with python 3.x
except for the mutex class...
2012-01-10 13:12:38 -08:00
Jean-Luc Duprat
fa01b63fa5 Remove assumption that . is in the PATH in run_tests.py 2012-01-10 11:41:08 -08:00
Jean-Luc Duprat
63d3d25030 Fixed off by one error in array size generated by bitcode2cpp.py 2012-01-10 11:22:13 -08:00
Jean-Luc Duprat
a8db866228 Python build compatible on both python 2 and 3 2012-01-10 10:42:15 -08:00
Jean-Luc Duprat
0519eea951 Makefile does not hardcode link paths on Linux
Link statically for both x86 and x86-64
2012-01-10 10:34:57 -08:00
Jean-Luc Duprat
5d67252ed0 Python scripts now compatible with both 2.x and 3.x releases of python 2012-01-09 13:56:05 -08:00
Jean-Luc Duprat
59f4c9985e Python files compatible with python 3 2012-01-06 16:56:09 -08:00
1031 changed files with 192298 additions and 17225 deletions

25
.gitignore vendored
View File

@@ -1,8 +1,31 @@
*.pyc
*~
tags
depend
ispc
ispc_test
ispc_ref
llvm/
objs
docs/doxygen
docs/ispc.html
docs/*.html
tests*/*cpp
tests*/*run
tests*/*.o
tests_ispcpp/*.h
tests_ispcpp/*pre*
logs/
notify_log.log
alloy_results_*
examples/*/*.png
examples/*/*.ppm
examples/*/objs/*
examples/*/ref
examples/*/test
*.swp
check_isa.exe
.vscode
configure
ispc.dSYM

View File

@@ -1,4 +1,4 @@
Copyright (c) 2010-2011, Intel Corporation
Copyright (c) 2010-2016, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -77,7 +77,7 @@ covered by the following license:
University of Illinois/NCSA
Open Source License
Copyright (c) 2003-2010 University of Illinois at Urbana-Champaign.
Copyright (c) 2003-2014 University of Illinois at Urbana-Champaign.
All rights reserved.
Developed by:
@@ -141,3 +141,46 @@ INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
---------------------------------------------------------------------------
The ptxtools use parts of the PTX parser code from GPU Ocelot project
(https://code.google.com/p/gpuocelot/), which is covered by the following
license:
Copyright 2011
GEORGIA TECH RESEARCH CORPORATION
ALL RIGHTS RESERVED
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimers.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimers in the
documentation and/or other materials provided with the
distribution.
* Neither the name of GEORGIA TECH RESEARCH CORPORATION nor the
names of its contributors may be used to endorse or promote
products derived from this software without specific prior
written permission.
THIS SOFTWARE IS PROVIDED BY GEORGIA TECH RESEARCH CORPORATION ''AS IS''
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL GEORGIA TECH RESEARCH
CORPORATION BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
You agree that the Software will not be shipped, transferred, exported,
or re-exported directly into any country prohibited by the United States
Export Administration Act and the regulations thereunder nor will be
used for any purpose prohibited by the Act.

299
Makefile
View File

@@ -1,7 +1,86 @@
#
# Copyright (c) 2010-2016, Intel Corporation
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# ispc Makefile
#
define newline
endef
define WARNING_BODY
============================== !!! WARNING !!! =============================== \n
Location of LLVM files in your PATH is different than path in LLVM_HOME \n
variable (or LLVM_HOME is not set). The most likely this means that you are \n
using default LLVM installation on your system, which is very bad sign. \n
Note, that ISPC uses LLVM optimizer and is highly dependent on it. We recommend \n
using *patched* version of LLVM 3.8. Patches are availible in \n
llvm_patches folder. You can build LLVM manually, or run our scripts, which \n
will do all the work for you. Do the following: \n
1. Create a folder, where LLVM will reside and set LLVM_HOME variable to its \n
path. \n
2. Set ISPC_HOME variable to your ISPC location (probably current folder).
3. Run alloy.py tool to checkout and build LLVM: \n
alloy.py -b --version=3.8 \n
4. Add $$LLVM_HOME/bin-3.8/bin path to your PATH. \n
==============================================================================
endef
# If you have your own special version of llvm and/or clang, change
# these variables to match.
LLVM_CONFIG=$(shell which llvm-config)
CLANG_INCLUDE=$(shell $(LLVM_CONFIG) --includedir)
RIGHT_LLVM = $(WARNING_BODY)
ifdef LLVM_HOME
ifeq ($(findstring $(LLVM_HOME), $(LLVM_CONFIG)), $(LLVM_HOME))
RIGHT_LLVM = LLVM from $$LLVM_HOME is used.
endif
endif
# Enable ARM by request
# To enable: make ARM_ENABLED=1
ARM_ENABLED=0
# Disable NVPTX by request
# To enable: make NVPTX_ENABLED=1
NVPTX_ENABLED=0
# Add llvm bin to the path so any scripts run will go to the right llvm-config
LLVM_BIN= $(shell $(LLVM_CONFIG) --bindir)
export PATH:=$(LLVM_BIN):$(PATH)
ARCH_OS = $(shell uname)
ifeq ($(ARCH_OS), Darwin)
ARCH_OS2 = "OSX"
@@ -10,29 +89,51 @@ else
endif
ARCH_TYPE = $(shell arch)
ifeq ($(shell llvm-config --version), 3.1svn)
LLVM_LIBS=-lLLVMAsmParser -lLLVMInstrumentation -lLLVMLinker \
-lLLVMArchive -lLLVMBitReader -lLLVMDebugInfo -lLLVMJIT -lLLVMipo \
-lLLVMBitWriter -lLLVMTableGen -lLLVMCBackendInfo \
-lLLVMX86Disassembler -lLLVMX86CodeGen -lLLVMSelectionDAG \
-lLLVMAsmPrinter -lLLVMX86AsmParser -lLLVMX86Desc -lLLVMX86Info \
-lLLVMX86AsmPrinter -lLLVMX86Utils -lLLVMMCDisassembler -lLLVMMCParser \
-lLLVMCodeGen -lLLVMScalarOpts -lLLVMInstCombine -lLLVMTransformUtils \
-lLLVMipa -lLLVMAnalysis -lLLVMMCJIT -lLLVMRuntimeDyld \
-lLLVMExecutionEngine -lLLVMTarget -lLLVMMC -lLLVMObject -lLLVMCore \
-lLLVMSupport
else
LLVM_LIBS=$(shell llvm-config --libs)
DNDEBUG_FLAG=$(shell $(LLVM_CONFIG) --cxxflags | grep -o "\-DNDEBUG")
LLVM_CXXFLAGS=$(shell $(LLVM_CONFIG) --cppflags) $(DNDEBUG_FLAG)
LLVM_VERSION=LLVM_$(shell $(LLVM_CONFIG) --version | sed -e 's/svn//' -e 's/\./_/' -e 's/\..*//')
LLVM_VERSION_DEF=-D$(LLVM_VERSION)
LLVM_COMPONENTS = engine ipo bitreader bitwriter instrumentation linker
# Component "option" was introduced in 3.3 and starting with 3.4 it is required for the link step.
# We check if it's available before adding it (to not break 3.2 and earlier).
ifeq ($(shell $(LLVM_CONFIG) --components |grep -c option), 1)
LLVM_COMPONENTS+=option
endif
ifneq ($(ARM_ENABLED), 0)
LLVM_COMPONENTS+=arm
endif
ifneq ($(NVPTX_ENABLED), 0)
LLVM_COMPONENTS+=nvptx
endif
LLVM_LIBS=$(shell $(LLVM_CONFIG) --libs $(LLVM_COMPONENTS))
CLANG=clang
CLANG_LIBS = -lclangFrontend -lclangDriver \
-lclangSerialization -lclangParse -lclangSema \
-lclangAnalysis -lclangAST -lclangLex -lclangBasic
-lclangAnalysis -lclangAST -lclangBasic \
-lclangEdit -lclangLex
ISPC_LIBS=$(shell llvm-config --ldflags) $(CLANG_LIBS) $(LLVM_LIBS) \
ISPC_LIBS=$(shell $(LLVM_CONFIG) --ldflags) $(CLANG_LIBS) $(LLVM_LIBS) \
-lpthread
ifeq ($(LLVM_VERSION),LLVM_3_4)
ISPC_LIBS += -lcurses
endif
# There is no logical OR in GNU make.
# This 'ifneq' acts like if( !($(LLVM_VERSION) == LLVM_3_2 || $(LLVM_VERSION) == LLVM_3_3 || $(LLVM_VERSION) == LLVM_3_4))
ifeq (,$(filter $(LLVM_VERSION), LLVM_3_2 LLVM_3_3 LLVM_3_4))
ISPC_LIBS += -lcurses -lz
# This is here because llvm-config fails to report dependency on tinfo library in some case.
# This is described in LLVM bug 16902.
ifeq ($(ARCH_OS),Linux)
ifneq ($(shell ldconfig -p |grep -c tinfo), 0)
ISPC_LIBS += -ltinfo
endif
endif
endif
ifeq ($(ARCH_OS),Linux)
ISPC_LIBS += -ldl
endif
@@ -41,28 +142,52 @@ ifeq ($(ARCH_OS2),Msys)
ISPC_LIBS += -lshlwapi -limagehlp -lpsapi
endif
LLVM_CXXFLAGS=$(shell llvm-config --cppflags)
LLVM_VERSION=LLVM_$(shell llvm-config --version | sed s/\\./_/)
LLVM_VERSION_DEF=-D$(LLVM_VERSION)
# Define build time stamp and revision.
# For revision we use GIT or SVN info.
BUILD_DATE=$(shell date +%Y%m%d)
BUILD_VERSION=$(shell git log --abbrev-commit --abbrev=16 | head -1)
GIT_REVISION:=$(shell git log --abbrev-commit --abbrev=16 2>/dev/null | head -1)
ifeq (${GIT_REVISION},)
SVN_REVISION:=$(shell svn log -l 1 2>/dev/null | grep -o \^r[[:digit:]]\* )
ifeq (${SVN_REVISION},)
# Failed to get revision info
BUILD_VERSION:="no_version_info"
else
# SVN revision info
BUILD_VERSION:=$(SVN_REVISION)
endif
else
# GIT revision info
BUILD_VERSION:=$(GIT_REVISION)
endif
CXX=g++
CPP=cpp
OPT=-g3
CXXFLAGS=$(OPT) $(LLVM_CXXFLAGS) -I. -Iobjs/ -Wall $(LLVM_VERSION_DEF) \
-DBUILD_DATE="\"$(BUILD_DATE)\"" -DBUILD_VERSION="\"$(BUILD_VERSION)\""
CXX=clang++
OPT=-O2
CXXFLAGS=$(OPT) $(LLVM_CXXFLAGS) -I. -Iobjs/ -I$(CLANG_INCLUDE) \
$(LLVM_VERSION_DEF) \
-Wall \
-DBUILD_DATE="\"$(BUILD_DATE)\"" -DBUILD_VERSION="\"$(BUILD_VERSION)\"" \
-Wno-sign-compare -Wno-unused-function -Werror
# if( !($(LLVM_VERSION) == LLVM_3_2 || $(LLVM_VERSION) == LLVM_3_3 || $(LLVM_VERSION) == LLVM_3_4))
ifeq (,$(filter $(LLVM_VERSION), LLVM_3_2 LLVM_3_3 LLVM_3_4))
CXXFLAGS+=-std=c++11 -Wno-c99-extensions -Wno-deprecated-register -fno-rtti
endif
ifneq ($(ARM_ENABLED), 0)
CXXFLAGS+=-DISPC_ARM_ENABLED
endif
ifneq ($(NVPTX_ENABLED), 0)
CXXFLAGS+=-DISPC_NVPTX_ENABLED
endif
LDFLAGS=
ifeq ($(ARCH_OS),Linux)
# try to link everything statically under Linux (including libstdc++) so
# that the binaries we generate will be portable across distributions...
ifeq ($(ARCH_TYPE),x86_64)
LDFLAGS=-static -L/usr/lib/gcc/x86_64-linux-gnu/4.4
else
LDFLAGS=-L/usr/lib/gcc/i686-redhat-linux/4.6.0
endif
# LDFLAGS=-static
# Linking everything statically isn't easy (too many things are required),
# but linking libstdc++ and libgcc is necessary when building with relatively
# new gcc, when going to distribute to old systems.
# LDFLAGS=-static-libgcc -static-libstdc++
endif
LEX=flex
@@ -75,26 +200,39 @@ CXX_SRC=ast.cpp builtins.cpp cbackend.cpp ctx.cpp decl.cpp expr.cpp func.cpp \
type.cpp util.cpp
HEADERS=ast.h builtins.h ctx.h decl.h expr.h func.h ispc.h llvmutil.h module.h \
opt.h stmt.h sym.h type.h util.h
TARGETS=avx avx-x2 sse2 sse2-x2 sse4 sse4-x2 generic-4 generic-8 generic-16
BUILTINS_SRC=$(addprefix builtins/target-, $(addsuffix .ll, $(TARGETS))) \
builtins/dispatch.ll
BUILTINS_OBJS=$(addprefix builtins-, $(notdir $(BUILTINS_SRC:.ll=.o))) \
builtins-c-32.cpp builtins-c-64.cpp
TARGETS=avx2-i64x4 avx11-i64x4 avx1-i64x4 avx1 avx1-x2 avx11 avx11-x2 avx2 avx2-x2 \
sse2 sse2-x2 sse4-8 sse4-16 sse4 sse4-x2 \
generic-4 generic-8 generic-16 generic-32 generic-64 generic-1 knl skx
ifneq ($(ARM_ENABLED), 0)
TARGETS+=neon-32 neon-16 neon-8
endif
ifneq ($(NVPTX_ENABLED), 0)
TARGETS+=nvptx
endif
# These files need to be compiled in two versions - 32 and 64 bits.
BUILTINS_SRC_TARGET=$(addprefix builtins/target-, $(addsuffix .ll, $(TARGETS)))
# These are files to be compiled in single version.
BUILTINS_SRC_COMMON=builtins/dispatch.ll
BUILTINS_OBJS_32=$(addprefix builtins-, $(notdir $(BUILTINS_SRC_TARGET:.ll=-32bit.o)))
BUILTINS_OBJS_64=$(addprefix builtins-, $(notdir $(BUILTINS_SRC_TARGET:.ll=-64bit.o)))
BUILTINS_OBJS=$(addprefix builtins-, $(notdir $(BUILTINS_SRC_COMMON:.ll=.o))) \
$(BUILTINS_OBJS_32) $(BUILTINS_OBJS_64) \
builtins-c-32.cpp builtins-c-64.cpp
BISON_SRC=parse.yy
FLEX_SRC=lex.ll
OBJS=$(addprefix objs/, $(CXX_SRC:.cpp=.o) $(BUILTINS_OBJS) \
stdlib_generic_ispc.o stdlib_x86_ispc.o \
stdlib_mask1_ispc.o stdlib_mask8_ispc.o stdlib_mask16_ispc.o stdlib_mask32_ispc.o stdlib_mask64_ispc.o \
$(BISON_SRC:.yy=.o) $(FLEX_SRC:.ll=.o))
default: ispc
.PHONY: dirs clean depend doxygen print_llvm_src
.PHONY: dirs clean depend doxygen print_llvm_src llvm_check
.PRECIOUS: objs/builtins-%.cpp
depend: $(CXX_SRC) $(HEADERS)
depend: llvm_check $(CXX_SRC) $(HEADERS)
@echo Updating dependencies
@gcc -MM $(CXXFLAGS) $(CXX_SRC) | sed 's_^\([a-z]\)_objs/\1_g' > depend
@$(CXX) -MM $(CXXFLAGS) $(CXX_SRC) | sed 's_^\([a-z]\)_objs/\1_g' > depend
-include depend
@@ -102,8 +240,18 @@ dirs:
@echo Creating objs/ directory
@/bin/mkdir -p objs
print_llvm_src:
llvm_check:
@llvm-config --version > /dev/null || \
(echo; \
echo "******************************************"; \
echo "ERROR: llvm-config not found in your PATH"; \
echo "******************************************"; \
echo; exit 1)
@echo -e '$(subst $(newline), ,$(RIGHT_LLVM))'
print_llvm_src: llvm_check
@echo Using LLVM `llvm-config --version` from `llvm-config --libdir`
@echo Using compiler to build: `$(CXX) --version | head -1`
clean:
/bin/rm -rf objs ispc
@@ -114,7 +262,25 @@ doxygen:
ispc: print_llvm_src dirs $(OBJS)
@echo Creating ispc executable
@$(CXX) $(LDFLAGS) -o $@ $(OBJS) $(ISPC_LIBS)
@$(CXX) $(OPT) $(LDFLAGS) -o $@ $(OBJS) $(ISPC_LIBS)
# Use clang as a default compiler, instead of gcc
# This is default now.
clang: ispc
clang: CXX=clang++
# Use gcc as a default compiler
gcc: ispc
gcc: CXX=g++
# Build ispc with address sanitizer instrumentation using clang compiler
# Note that this is not portable build
asan: clang
asan: OPT+=-fsanitize=address
# Do debug build, i.e. -O0 -g
debug: ispc
debug: OPT=-O0 -g
objs/%.o: %.cpp
@echo Compiling $<
@@ -124,6 +290,10 @@ objs/cbackend.o: cbackend.cpp
@echo Compiling $<
@$(CXX) -fno-rtti -fno-exceptions $(CXXFLAGS) -o $@ -c $<
objs/opt.o: opt.cpp
@echo Compiling $<
@$(CXX) -fno-rtti $(CXXFLAGS) -o $@ -c $<
objs/%.o: objs/%.cpp
@echo Compiling $<
@$(CXX) $(CXXFLAGS) -o $@ -c $<
@@ -144,24 +314,47 @@ objs/lex.o: objs/lex.cpp $(HEADERS) objs/parse.cc
@echo Compiling $<
@$(CXX) $(CXXFLAGS) -o $@ -c $<
objs/builtins-%.cpp: builtins/%.ll builtins/util.m4 $(wildcard builtins/*common.ll)
objs/builtins-dispatch.cpp: builtins/dispatch.ll builtins/util.m4 builtins/util-nvptx.m4 builtins/svml.m4 $(wildcard builtins/*common.ll)
@echo Creating C++ source from builtins definition file $<
@m4 -Ibuiltins/ -DLLVM_VERSION=$(LLVM_VERSION) $< | python bitcode2cpp.py $< > $@
@m4 -Ibuiltins/ -DLLVM_VERSION=$(LLVM_VERSION) -DBUILD_OS=UNIX $< | python bitcode2cpp.py $< > $@
objs/builtins-%-32bit.cpp: builtins/%.ll builtins/util.m4 builtins/util-nvptx.m4 builtins/svml.m4 $(wildcard builtins/*common.ll)
@echo Creating C++ source from builtins definition file $< \(32 bit version\)
@m4 -Ibuiltins/ -DLLVM_VERSION=$(LLVM_VERSION) -DBUILD_OS=UNIX -DRUNTIME=32 $< | python bitcode2cpp.py $< 32bit > $@
objs/builtins-%-64bit.cpp: builtins/%.ll builtins/util.m4 builtins/util-nvptx.m4 builtins/svml.m4 $(wildcard builtins/*common.ll)
@echo Creating C++ source from builtins definition file $< \(64 bit version\)
@m4 -Ibuiltins/ -DLLVM_VERSION=$(LLVM_VERSION) -DBUILD_OS=UNIX -DRUNTIME=64 $< | python bitcode2cpp.py $< 64bit > $@
objs/builtins-c-32.cpp: builtins/builtins.c
@echo Creating C++ source from builtins definition file $<
@$(CLANG) -m32 -emit-llvm -c $< -o - | llvm-dis - | python bitcode2cpp.py c-32 > $@
@$(CLANG) -m32 -emit-llvm -c $< -o - | llvm-dis - | python bitcode2cpp.py c 32 > $@
objs/builtins-c-64.cpp: builtins/builtins.c
@echo Creating C++ source from builtins definition file $<
@$(CLANG) -m64 -emit-llvm -c $< -o - | llvm-dis - | python bitcode2cpp.py c-64 > $@
@$(CLANG) -m64 -emit-llvm -c $< -o - | llvm-dis - | python bitcode2cpp.py c 64 > $@
objs/stdlib_generic_ispc.cpp: stdlib.ispc
@echo Creating C++ source from $< for generic
@$(CLANG) -E -x c -DISPC_TARGET_GENERIC=1 -DISPC=1 -DPI=3.1415926536 $< -o - | \
python stdlib2cpp.py generic > $@
objs/stdlib_mask1_ispc.cpp: stdlib.ispc
@echo Creating C++ source from $< for mask1
@$(CLANG) -E -x c -DISPC_MASK_BITS=1 -DISPC=1 -DPI=3.14159265358979 $< -o - | \
python stdlib2cpp.py mask1 > $@
objs/stdlib_x86_ispc.cpp: stdlib.ispc
@echo Creating C++ source from $< for x86
@$(CLANG) -E -x c -DISPC=1 -DPI=3.1415926536 $< -o - | \
python stdlib2cpp.py x86 > $@
objs/stdlib_mask8_ispc.cpp: stdlib.ispc
@echo Creating C++ source from $< for mask8
@$(CLANG) -E -x c -DISPC_MASK_BITS=8 -DISPC=1 -DPI=3.14159265358979 $< -o - | \
python stdlib2cpp.py mask8 > $@
objs/stdlib_mask16_ispc.cpp: stdlib.ispc
@echo Creating C++ source from $< for mask16
@$(CLANG) -E -x c -DISPC_MASK_BITS=16 -DISPC=1 -DPI=3.14159265358979 $< -o - | \
python stdlib2cpp.py mask16 > $@
objs/stdlib_mask32_ispc.cpp: stdlib.ispc
@echo Creating C++ source from $< for mask32
@$(CLANG) -E -x c -DISPC_MASK_BITS=32 -DISPC=1 -DPI=3.14159265358979 $< -o - | \
python stdlib2cpp.py mask32 > $@
objs/stdlib_mask64_ispc.cpp: stdlib.ispc
@echo Creating C++ source from $< for mask64
@$(CLANG) -E -x c -DISPC_MASK_BITS=64 -DISPC=1 -DPI=3.14159265358979 $< -o - | \
python stdlib2cpp.py mask64 > $@

View File

@@ -47,7 +47,7 @@ remarkable `LLVM Compiler Infrastructure <http://llvm.org>`_ for back-end
code generation and optimization and is `hosted on
github <http://github.com/ispc/ispc/>`_. It supports Windows, Mac, and
Linux, with both x86 and x86-64 targets. It currently supports the SSE2,
SSE4, and AVX instruction sets.
SSE4, AVX1, and AVX2 instruction sets.
Features
--------

1124
alloy.py Executable file

File diff suppressed because it is too large Load Diff

341
ast.cpp
View File

@@ -1,5 +1,5 @@
/*
Copyright (c) 2011, Intel Corporation
Copyright (c) 2011-2015, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -28,12 +28,14 @@
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file ast.cpp
@brief
*/
@brief General functionality related to abstract syntax trees and
traversal of them.
*/
#include "ast.h"
#include "expr.h"
@@ -53,10 +55,17 @@ ASTNode::~ASTNode() {
// AST
void
AST::AddFunction(Symbol *sym, const std::vector<Symbol *> &args, Stmt *code) {
AST::AddFunction(Symbol *sym, Stmt *code) {
if (sym == NULL)
return;
functions.push_back(new Function(sym, args, code));
Function *f = new Function(sym, code);
if (f->IsPolyFunction()) {
FATAL("This is a good start, but implement me!");
} else {
functions.push_back(f);
}
}
@@ -83,79 +92,105 @@ WalkAST(ASTNode *node, ASTPreCallBackFunc preFunc, ASTPostCallBackFunc postFunc,
////////////////////////////////////////////////////////////////////////////
// Handle Statements
if (dynamic_cast<Stmt *>(node) != NULL) {
if (llvm::dyn_cast<Stmt>(node) != NULL) {
ExprStmt *es;
DeclStmt *ds;
IfStmt *is;
DoStmt *dos;
ForStmt *fs;
ForeachStmt *fes;
ForeachActiveStmt *fas;
ForeachUniqueStmt *fus;
CaseStmt *cs;
DefaultStmt *defs;
SwitchStmt *ss;
ReturnStmt *rs;
LabeledStmt *ls;
StmtList *sl;
PrintStmt *ps;
AssertStmt *as;
DeleteStmt *dels;
UnmaskedStmt *ums;
if ((es = dynamic_cast<ExprStmt *>(node)) != NULL)
if ((es = llvm::dyn_cast<ExprStmt>(node)) != NULL)
es->expr = (Expr *)WalkAST(es->expr, preFunc, postFunc, data);
else if ((ds = dynamic_cast<DeclStmt *>(node)) != NULL) {
else if ((ds = llvm::dyn_cast<DeclStmt>(node)) != NULL) {
for (unsigned int i = 0; i < ds->vars.size(); ++i)
ds->vars[i].init = (Expr *)WalkAST(ds->vars[i].init, preFunc,
ds->vars[i].init = (Expr *)WalkAST(ds->vars[i].init, preFunc,
postFunc, data);
}
else if ((is = dynamic_cast<IfStmt *>(node)) != NULL) {
else if ((is = llvm::dyn_cast<IfStmt>(node)) != NULL) {
is->test = (Expr *)WalkAST(is->test, preFunc, postFunc, data);
is->trueStmts = (Stmt *)WalkAST(is->trueStmts, preFunc,
is->trueStmts = (Stmt *)WalkAST(is->trueStmts, preFunc,
postFunc, data);
is->falseStmts = (Stmt *)WalkAST(is->falseStmts, preFunc,
is->falseStmts = (Stmt *)WalkAST(is->falseStmts, preFunc,
postFunc, data);
}
else if ((dos = dynamic_cast<DoStmt *>(node)) != NULL) {
dos->testExpr = (Expr *)WalkAST(dos->testExpr, preFunc,
else if ((dos = llvm::dyn_cast<DoStmt>(node)) != NULL) {
dos->testExpr = (Expr *)WalkAST(dos->testExpr, preFunc,
postFunc, data);
dos->bodyStmts = (Stmt *)WalkAST(dos->bodyStmts, preFunc,
dos->bodyStmts = (Stmt *)WalkAST(dos->bodyStmts, preFunc,
postFunc, data);
}
else if ((fs = dynamic_cast<ForStmt *>(node)) != NULL) {
else if ((fs = llvm::dyn_cast<ForStmt>(node)) != NULL) {
fs->init = (Stmt *)WalkAST(fs->init, preFunc, postFunc, data);
fs->test = (Expr *)WalkAST(fs->test, preFunc, postFunc, data);
fs->step = (Stmt *)WalkAST(fs->step, preFunc, postFunc, data);
fs->stmts = (Stmt *)WalkAST(fs->stmts, preFunc, postFunc, data);
}
else if ((fes = dynamic_cast<ForeachStmt *>(node)) != NULL) {
else if ((fes = llvm::dyn_cast<ForeachStmt>(node)) != NULL) {
for (unsigned int i = 0; i < fes->startExprs.size(); ++i)
fes->startExprs[i] = (Expr *)WalkAST(fes->startExprs[i], preFunc,
fes->startExprs[i] = (Expr *)WalkAST(fes->startExprs[i], preFunc,
postFunc, data);
for (unsigned int i = 0; i < fes->endExprs.size(); ++i)
fes->endExprs[i] = (Expr *)WalkAST(fes->endExprs[i], preFunc,
fes->endExprs[i] = (Expr *)WalkAST(fes->endExprs[i], preFunc,
postFunc, data);
fes->stmts = (Stmt *)WalkAST(fes->stmts, preFunc, postFunc, data);
}
else if (dynamic_cast<BreakStmt *>(node) != NULL ||
dynamic_cast<ContinueStmt *>(node) != NULL ||
dynamic_cast<GotoStmt *>(node) != NULL) {
else if ((fas = llvm::dyn_cast<ForeachActiveStmt>(node)) != NULL) {
fas->stmts = (Stmt *)WalkAST(fas->stmts, preFunc, postFunc, data);
}
else if ((fus = llvm::dyn_cast<ForeachUniqueStmt>(node)) != NULL) {
fus->expr = (Expr *)WalkAST(fus->expr, preFunc, postFunc, data);
fus->stmts = (Stmt *)WalkAST(fus->stmts, preFunc, postFunc, data);
}
else if ((cs = llvm::dyn_cast<CaseStmt>(node)) != NULL)
cs->stmts = (Stmt *)WalkAST(cs->stmts, preFunc, postFunc, data);
else if ((defs = llvm::dyn_cast<DefaultStmt>(node)) != NULL)
defs->stmts = (Stmt *)WalkAST(defs->stmts, preFunc, postFunc, data);
else if ((ss = llvm::dyn_cast<SwitchStmt>(node)) != NULL) {
ss->expr = (Expr *)WalkAST(ss->expr, preFunc, postFunc, data);
ss->stmts = (Stmt *)WalkAST(ss->stmts, preFunc, postFunc, data);
}
else if (llvm::dyn_cast<BreakStmt>(node) != NULL ||
llvm::dyn_cast<ContinueStmt>(node) != NULL ||
llvm::dyn_cast<GotoStmt>(node) != NULL) {
// nothing
}
else if ((ls = dynamic_cast<LabeledStmt *>(node)) != NULL)
else if ((ls = llvm::dyn_cast<LabeledStmt>(node)) != NULL)
ls->stmt = (Stmt *)WalkAST(ls->stmt, preFunc, postFunc, data);
else if ((rs = dynamic_cast<ReturnStmt *>(node)) != NULL)
rs->val = (Expr *)WalkAST(rs->val, preFunc, postFunc, data);
else if ((sl = dynamic_cast<StmtList *>(node)) != NULL) {
else if ((rs = llvm::dyn_cast<ReturnStmt>(node)) != NULL)
rs->expr = (Expr *)WalkAST(rs->expr, preFunc, postFunc, data);
else if ((sl = llvm::dyn_cast<StmtList>(node)) != NULL) {
std::vector<Stmt *> &sls = sl->stmts;
for (unsigned int i = 0; i < sls.size(); ++i)
sls[i] = (Stmt *)WalkAST(sls[i], preFunc, postFunc, data);
}
else if ((ps = dynamic_cast<PrintStmt *>(node)) != NULL)
else if ((ps = llvm::dyn_cast<PrintStmt>(node)) != NULL)
ps->values = (Expr *)WalkAST(ps->values, preFunc, postFunc, data);
else if ((as = dynamic_cast<AssertStmt *>(node)) != NULL)
else if ((as = llvm::dyn_cast<AssertStmt>(node)) != NULL)
as->expr = (Expr *)WalkAST(as->expr, preFunc, postFunc, data);
else if ((dels = llvm::dyn_cast<DeleteStmt>(node)) != NULL)
dels->expr = (Expr *)WalkAST(dels->expr, preFunc, postFunc, data);
else if ((ums = llvm::dyn_cast<UnmaskedStmt>(node)) != NULL)
ums->stmts = (Stmt *)WalkAST(ums->stmts, preFunc, postFunc, data);
else
FATAL("Unhandled statement type in WalkAST()");
}
else {
///////////////////////////////////////////////////////////////////////////
// Handle expressions
Assert(dynamic_cast<Expr *>(node) != NULL);
Assert(llvm::dyn_cast<Expr>(node) != NULL);
UnaryExpr *ue;
BinaryExpr *be;
AssignExpr *ae;
@@ -166,60 +201,73 @@ WalkAST(ASTNode *node, ASTPreCallBackFunc preFunc, ASTPostCallBackFunc postFunc,
MemberExpr *me;
TypeCastExpr *tce;
ReferenceExpr *re;
DereferenceExpr *dre;
PtrDerefExpr *ptrderef;
RefDerefExpr *refderef;
SizeOfExpr *soe;
AddressOfExpr *aoe;
NewExpr *newe;
if ((ue = dynamic_cast<UnaryExpr *>(node)) != NULL)
if ((ue = llvm::dyn_cast<UnaryExpr>(node)) != NULL)
ue->expr = (Expr *)WalkAST(ue->expr, preFunc, postFunc, data);
else if ((be = dynamic_cast<BinaryExpr *>(node)) != NULL) {
else if ((be = llvm::dyn_cast<BinaryExpr>(node)) != NULL) {
be->arg0 = (Expr *)WalkAST(be->arg0, preFunc, postFunc, data);
be->arg1 = (Expr *)WalkAST(be->arg1, preFunc, postFunc, data);
}
else if ((ae = dynamic_cast<AssignExpr *>(node)) != NULL) {
else if ((ae = llvm::dyn_cast<AssignExpr>(node)) != NULL) {
ae->lvalue = (Expr *)WalkAST(ae->lvalue, preFunc, postFunc, data);
ae->rvalue = (Expr *)WalkAST(ae->rvalue, preFunc, postFunc, data);
}
else if ((se = dynamic_cast<SelectExpr *>(node)) != NULL) {
else if ((se = llvm::dyn_cast<SelectExpr>(node)) != NULL) {
se->test = (Expr *)WalkAST(se->test, preFunc, postFunc, data);
se->expr1 = (Expr *)WalkAST(se->expr1, preFunc, postFunc, data);
se->expr2 = (Expr *)WalkAST(se->expr2, preFunc, postFunc, data);
}
else if ((el = dynamic_cast<ExprList *>(node)) != NULL) {
else if ((el = llvm::dyn_cast<ExprList>(node)) != NULL) {
for (unsigned int i = 0; i < el->exprs.size(); ++i)
el->exprs[i] = (Expr *)WalkAST(el->exprs[i], preFunc,
el->exprs[i] = (Expr *)WalkAST(el->exprs[i], preFunc,
postFunc, data);
}
else if ((fce = dynamic_cast<FunctionCallExpr *>(node)) != NULL) {
else if ((fce = llvm::dyn_cast<FunctionCallExpr>(node)) != NULL) {
fce->func = (Expr *)WalkAST(fce->func, preFunc, postFunc, data);
fce->args = (ExprList *)WalkAST(fce->args, preFunc, postFunc, data);
fce->launchCountExpr = (Expr *)WalkAST(fce->launchCountExpr, preFunc,
for (int k = 0; k < 3; k++)
fce->launchCountExpr[0] = (Expr *)WalkAST(fce->launchCountExpr[0], preFunc,
postFunc, data);
}
else if ((ie = dynamic_cast<IndexExpr *>(node)) != NULL) {
else if ((ie = llvm::dyn_cast<IndexExpr>(node)) != NULL) {
ie->baseExpr = (Expr *)WalkAST(ie->baseExpr, preFunc, postFunc, data);
ie->index = (Expr *)WalkAST(ie->index, preFunc, postFunc, data);
}
else if ((me = dynamic_cast<MemberExpr *>(node)) != NULL)
else if ((me = llvm::dyn_cast<MemberExpr>(node)) != NULL)
me->expr = (Expr *)WalkAST(me->expr, preFunc, postFunc, data);
else if ((tce = dynamic_cast<TypeCastExpr *>(node)) != NULL)
else if ((tce = llvm::dyn_cast<TypeCastExpr>(node)) != NULL)
tce->expr = (Expr *)WalkAST(tce->expr, preFunc, postFunc, data);
else if ((re = dynamic_cast<ReferenceExpr *>(node)) != NULL)
else if ((re = llvm::dyn_cast<ReferenceExpr>(node)) != NULL)
re->expr = (Expr *)WalkAST(re->expr, preFunc, postFunc, data);
else if ((dre = dynamic_cast<DereferenceExpr *>(node)) != NULL)
dre->expr = (Expr *)WalkAST(dre->expr, preFunc, postFunc, data);
else if ((soe = dynamic_cast<SizeOfExpr *>(node)) != NULL)
else if ((ptrderef = llvm::dyn_cast<PtrDerefExpr>(node)) != NULL)
ptrderef->expr = (Expr *)WalkAST(ptrderef->expr, preFunc, postFunc,
data);
else if ((refderef = llvm::dyn_cast<RefDerefExpr>(node)) != NULL)
refderef->expr = (Expr *)WalkAST(refderef->expr, preFunc, postFunc,
data);
else if ((soe = llvm::dyn_cast<SizeOfExpr>(node)) != NULL)
soe->expr = (Expr *)WalkAST(soe->expr, preFunc, postFunc, data);
else if ((aoe = dynamic_cast<AddressOfExpr *>(node)) != NULL)
else if ((aoe = llvm::dyn_cast<AddressOfExpr>(node)) != NULL)
aoe->expr = (Expr *)WalkAST(aoe->expr, preFunc, postFunc, data);
else if (dynamic_cast<SymbolExpr *>(node) != NULL ||
dynamic_cast<ConstExpr *>(node) != NULL ||
dynamic_cast<FunctionSymbolExpr *>(node) != NULL ||
dynamic_cast<SyncExpr *>(node) != NULL ||
dynamic_cast<NullPointerExpr *>(node) != NULL) {
// nothing to do
else if ((newe = llvm::dyn_cast<NewExpr>(node)) != NULL) {
newe->countExpr = (Expr *)WalkAST(newe->countExpr, preFunc,
postFunc, data);
newe->initExpr = (Expr *)WalkAST(newe->initExpr, preFunc,
postFunc, data);
}
else
else if (llvm::dyn_cast<SymbolExpr>(node) != NULL ||
llvm::dyn_cast<ConstExpr>(node) != NULL ||
llvm::dyn_cast<FunctionSymbolExpr>(node) != NULL ||
llvm::dyn_cast<SyncExpr>(node) != NULL ||
llvm::dyn_cast<NullPointerExpr>(node) != NULL) {
// nothing to do
}
else
FATAL("Unhandled expression type in WalkAST().");
}
@@ -279,18 +327,191 @@ TypeCheck(Stmt *stmt) {
}
struct CostData {
CostData() { cost = foreachDepth = 0; }
int cost;
int foreachDepth;
};
static bool
lCostCallback(ASTNode *node, void *c) {
int *cost = (int *)c;
*cost += node->EstimateCost();
lCostCallbackPre(ASTNode *node, void *d) {
CostData *data = (CostData *)d;
if (llvm::dyn_cast<ForeachStmt>(node) != NULL)
++data->foreachDepth;
if (data->foreachDepth == 0)
data->cost += node->EstimateCost();
return true;
}
static ASTNode *
lCostCallbackPost(ASTNode *node, void *d) {
CostData *data = (CostData *)d;
if (llvm::dyn_cast<ForeachStmt>(node) != NULL)
--data->foreachDepth;
return node;
}
int
EstimateCost(ASTNode *root) {
int cost = 0;
WalkAST(root, lCostCallback, NULL, &cost);
return cost;
CostData data;
WalkAST(root, lCostCallbackPre, lCostCallbackPost, &data);
return data.cost;
}
/** Given an AST node, check to see if it's safe if we happen to run the
code for that node with the execution mask all off.
*/
static bool
lCheckAllOffSafety(ASTNode *node, void *data) {
bool *okPtr = (bool *)data;
FunctionCallExpr *fce;
if ((fce = llvm::dyn_cast<FunctionCallExpr>(node)) != NULL) {
if (fce->func == NULL)
return false;
const Type *type = fce->func->GetType();
const PointerType *pt = CastType<PointerType>(type);
if (pt != NULL)
type = pt->GetBaseType();
const FunctionType *ftype = CastType<FunctionType>(type);
Assert(ftype != NULL);
if (ftype->isSafe == false) {
*okPtr = false;
return false;
}
}
if (llvm::dyn_cast<AssertStmt>(node) != NULL) {
// While it's fine to run the assert for varying tests, it's not
// desirable to check an assert on a uniform variable if all of the
// lanes are off.
*okPtr = false;
return false;
}
if (llvm::dyn_cast<PrintStmt>(node) != NULL) {
*okPtr = false;
return false;
}
if (llvm::dyn_cast<NewExpr>(node) != NULL ||
llvm::dyn_cast<DeleteStmt>(node) != NULL) {
// We definitely don't want to run the uniform variants of these if
// the mask is all off. It's also worth skipping the overhead of
// executing the varying versions of them in the all-off mask case.
*okPtr = false;
return false;
}
if (llvm::dyn_cast<ForeachStmt>(node) != NULL ||
llvm::dyn_cast<ForeachActiveStmt>(node) != NULL ||
llvm::dyn_cast<ForeachUniqueStmt>(node) != NULL ||
llvm::dyn_cast<UnmaskedStmt>(node) != NULL) {
// The various foreach statements also shouldn't be run with an
// all-off mask. Since they can re-establish an 'all on' mask,
// this would be pretty unintuitive. (More generally, it's
// possibly a little strange to allow foreach in the presence of
// any non-uniform control flow...)
//
// Similarly, the implementation of foreach_unique assumes as a
// precondition that the mask won't be all off going into it, so
// we'll enforce that here...
*okPtr = false;
return false;
}
if (llvm::dyn_cast<BinaryExpr>(node) != NULL) {
BinaryExpr* binaryExpr = llvm::dyn_cast<BinaryExpr>(node);
if (binaryExpr->op == BinaryExpr::Mod || binaryExpr->op == BinaryExpr::Div) {
*okPtr = false;
return false;
}
}
IndexExpr *ie;
if ((ie = llvm::dyn_cast<IndexExpr>(node)) != NULL && ie->baseExpr != NULL) {
const Type *type = ie->baseExpr->GetType();
if (type == NULL)
return true;
if (CastType<ReferenceType>(type) != NULL)
type = type->GetReferenceTarget();
ConstExpr *ce = llvm::dyn_cast<ConstExpr>(ie->index);
if (ce == NULL) {
// indexing with a variable... -> not safe
*okPtr = false;
return false;
}
const PointerType *pointerType = CastType<PointerType>(type);
if (pointerType != NULL) {
// pointer[index] -> can't be sure -> not safe
*okPtr = false;
return false;
}
const SequentialType *seqType = CastType<SequentialType>(type);
Assert(seqType != NULL);
int nElements = seqType->GetElementCount();
if (nElements == 0) {
// Unsized array, so we can't be sure -> not safe
*okPtr = false;
return false;
}
int32_t indices[ISPC_MAX_NVEC];
int count = ce->GetValues(indices);
for (int i = 0; i < count; ++i) {
if (indices[i] < 0 || indices[i] >= nElements) {
// Index is out of bounds -> not safe
*okPtr = false;
return false;
}
}
// All indices are in-bounds
return true;
}
MemberExpr *me;
if ((me = llvm::dyn_cast<MemberExpr>(node)) != NULL &&
me->dereferenceExpr) {
*okPtr = false;
return false;
}
if (llvm::dyn_cast<PtrDerefExpr>(node) != NULL) {
*okPtr = false;
return false;
}
/*
Don't allow turning if/else to straight-line-code if we
assign to a uniform.
*/
AssignExpr *ae;
if ((ae = llvm::dyn_cast<AssignExpr>(node)) != NULL) {
if (ae->GetType()) {
if (ae->GetType()->IsUniformType()) {
*okPtr = false;
return false;
}
}
}
return true;
}
bool
SafeToRunWithMaskAllOff(ASTNode *root) {
bool safe = true;
WalkAST(root, lCheckAllOffSafety, NULL, &safe);
return safe;
}

86
ast.h
View File

@@ -1,5 +1,5 @@
/*
Copyright (c) 2011, Intel Corporation
Copyright (c) 2011-2013, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -28,11 +28,11 @@
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file ast.h
@brief
@brief
*/
#ifndef ISPC_AST_H
@@ -48,8 +48,9 @@
(Expr) and statements (Stmt) inherit from this class.
*/
class ASTNode {
const unsigned char SubclassID; // Subclass identifier (for isa/dyn_cast)
public:
ASTNode(SourcePos p) : pos(p) { }
ASTNode(SourcePos p, unsigned scid) : SubclassID(scid), pos(p) { }
virtual ~ASTNode();
/** The Optimize() method should perform any appropriate early-stage
@@ -74,18 +75,77 @@ public:
/** All AST nodes must track the file position where they are
defined. */
SourcePos pos;
/** An enumeration for keeping track of the concrete subclass of Value
that is actually instantiated.*/
enum ASTNodeTy {
/* For classes inherited from Expr */
AddressOfExprID,
AssignExprID,
BinaryExprID,
ConstExprID,
DerefExprID,
PtrDerefExprID,
RefDerefExprID,
ExprListID,
FunctionCallExprID,
FunctionSymbolExprID,
IndexExprID,
StructMemberExprID,
VectorMemberExprID,
NewExprID,
NullPointerExprID,
ReferenceExprID,
SelectExprID,
SizeOfExprID,
SymbolExprID,
SyncExprID,
TypeCastExprID,
UnaryExprID,
/* This is a convenience separator to shorten classof implementations */
MaxExprID,
/* For classes inherited from Stmt */
AssertStmtID,
BreakStmtID,
CaseStmtID,
ContinueStmtID,
DeclStmtID,
DefaultStmtID,
DeleteStmtID,
DoStmtID,
ExprStmtID,
ForeachActiveStmtID,
ForeachStmtID,
ForeachUniqueStmtID,
ForStmtID,
GotoStmtID,
IfStmtID,
LabeledStmtID,
PrintStmtID,
ReturnStmtID,
StmtListID,
SwitchStmtID,
UnmaskedStmtID
};
/** Return an ID for the concrete type of this object. This is used to
implement the classof checks. This should not be used for any
other purpose, as the values may change as ISPC evolves */
unsigned getValueID() const {
return SubclassID;
}
static inline bool classof(ASTNode const*) { return true; }
};
/** Simple representation of the abstract syntax trees for all of the
functions declared in a compilation unit.
*/
class AST {
public:
/** Add the AST for a function described by the given declaration
information and source code. */
void AddFunction(Symbol *sym, const std::vector<Symbol *> &args,
Stmt *code);
void AddFunction(Symbol *sym, Stmt *code);
/** Generate LLVM IR for all of the functions into the current
module. */
@@ -122,12 +182,12 @@ extern ASTNode *Optimize(ASTNode *root);
/** Convenience version of Optimize() for Expr *s that returns an Expr *
(rather than an ASTNode *, which would require the caller to cast back
to an Expr *). */
to an Expr *). */
extern Expr *Optimize(Expr *);
/** Convenience version of Optimize() for Expr *s that returns an Stmt *
(rather than an ASTNode *, which would require the caller to cast back
to a Stmt *). */
to a Stmt *). */
extern Stmt *Optimize(Stmt *);
/** Perform type-checking on the given AST (or portion of one), returning a
@@ -144,4 +204,8 @@ extern Stmt *TypeCheck(Stmt *);
the given root. */
extern int EstimateCost(ASTNode *root);
/** Returns true if it would be safe to run the given code with an "all
off" mask. */
extern bool SafeToRunWithMaskAllOff(ASTNode *root);
#endif // ISPC_AST_H

View File

@@ -1,7 +1,6 @@
#!/usr/bin/python
import sys
import string
import re
import subprocess
import platform
@@ -10,6 +9,8 @@ import os
length=0
src=str(sys.argv[1])
if (len(sys.argv) > 2):
runtime=str(sys.argv[2])
target = re.sub("builtins/target-", "", src)
target = re.sub(r"builtins\\target-", "", target)
@@ -20,23 +21,30 @@ target = re.sub("\.c$", "", target)
target = re.sub("-", "_", target)
llvm_as="llvm-as"
if platform.system() == 'Windows' or string.find(platform.system(), "CYGWIN_NT") != -1:
if platform.system() == 'Windows' or platform.system().find("CYGWIN_NT") != -1:
llvm_as = os.getenv("LLVM_INSTALL_DIR").replace("\\", "/") + "/bin/" + llvm_as
try:
as_out=subprocess.Popen([llvm_as, "-", "-o", "-"], stdout=subprocess.PIPE)
except IOError:
print >> sys.stderr, "Couldn't open " + src
sys.stderr.write("Couldn't open " + src)
sys.exit(1)
print "unsigned char builtins_bitcode_" + target + "[] = {"
for line in as_out.stdout.readlines():
length = length + len(line)
for c in line:
print ord(c)
print ", "
print " 0 };\n\n"
print "int builtins_bitcode_" + target + "_length = " + str(length) + ";\n"
name = target
if (len(sys.argv) > 2):
name += "_" + runtime;
width = 16;
sys.stdout.write("unsigned char builtins_bitcode_" + name + "[] = {\n")
data = as_out.stdout.read()
for i in range(0, len(data), 1):
sys.stdout.write("0x%0.2X, " % ord(data[i:i+1]))
if i%width == (width-1):
sys.stdout.write("\n")
sys.stdout.write("0x00 };\n\n")
sys.stdout.write("int builtins_bitcode_" + name + "_length = " + str(len(data)) + ";\n")
as_out.wait()

View File

@@ -2,8 +2,8 @@
REM If LLVM_INSTALL_DIR isn't set globally in your environment,
REM it can be set here_
set LLVM_INSTALL_DIR=c:\users\mmp\llvm-dev
set LLVM_VERSION=3.1svn
REM set LLVM_INSTALL_DIR=c:\users\mmp\llvm-dev
REM set LLVM_VERSION=LLVM_3_2
REM Both the LLVM binaries and python need to be in the path
set path=%LLVM_INSTALL_DIR%\bin;%PATH%;c:\cygwin\bin

File diff suppressed because it is too large Load Diff

View File

@@ -1,5 +1,5 @@
/*
Copyright (c) 2010-2011, Intel Corporation
Copyright (c) 2010-2015, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -28,11 +28,11 @@
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file builtins.h
@brief Declarations of functions related to builtins and the
@brief Declarations of functions related to builtins and the
standard library
*/
@@ -56,6 +56,7 @@ void DefineStdlib(SymbolTable *symbolTable, llvm::LLVMContext *ctx, llvm::Module
bool includeStdlib);
void AddBitcodeToModule(const unsigned char *bitcode, int length,
llvm::Module *module, SymbolTable *symbolTable = NULL);
llvm::Module *module, SymbolTable *symbolTable = NULL,
bool warn = true);
#endif // ISPC_STDLIB_H

View File

@@ -0,0 +1,163 @@
/*
Copyright (c) 2014-2015, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include <cstdio>
#define PRINT_BUF_SIZE 4096
#define uint64_t unsigned long long
static __device__ size_t d_strlen(const char *str)
{
const char *s;
for (s = str; *s; ++s)
;
return (s - str);
}
static __device__ char* d_strncat(char *dest, const char *src, size_t n)
{
size_t dest_len = d_strlen(dest);
size_t i;
for (i = 0 ; i < n && src[i] != '\0' ; i++)
dest[dest_len + i] = src[i];
dest[dest_len + i] = '\0';
return dest;
}
#define APPEND(str) \
do { \
int offset = bufp - &printString[0]; \
*bufp = '\0'; \
d_strncat(bufp, str, PRINT_BUF_SIZE-offset); \
bufp += d_strlen(str); \
if (bufp >= &printString[PRINT_BUF_SIZE]) \
goto done; \
} while (0) /* eat semicolon */
#define PRINT_SCALAR(fmt, type) \
sprintf(tmpBuf, fmt, *((type *)ptr)); \
APPEND(tmpBuf); \
break
#define PRINT_VECTOR(fmt, type) \
*bufp++ = '['; \
if (bufp == &printString[PRINT_BUF_SIZE]) break; \
for (int i = 0; i < width; ++i) { \
/* only print the value if the current lane is executing */ \
type val0 = *((type*)ptr); \
type val = val0; \
if (mask & (1ull<<i)) \
sprintf(tmpBuf, fmt, val); \
else \
sprintf(tmpBuf, "(( * )) "); \
APPEND(tmpBuf); \
*bufp++ = (i != width-1 ? ',' : ']'); \
} \
break
extern "C"
__device__ void __do_print_nvptx(const char *format, const char *types, int width, uint64_t mask,
void **args) {
char printString[PRINT_BUF_SIZE+1]; // +1 for trailing NUL
char *bufp = &printString[0];
char tmpBuf[256];
const char trueBuf[] = "true";
const char falseBuf[] = "false";
int argCount = 0;
while (*format && bufp < &printString[PRINT_BUF_SIZE]) {
// Format strings are just single percent signs.
if (*format != '%') {
*bufp++ = *format;
}
else {
if (*types) {
void *ptr = args[argCount++];
// Based on the encoding in the types string, cast the
// value appropriately and print it with a reasonable
// printf() formatting string.
switch (*types) {
case 'b': {
const char *tmpBuf1 = *((bool *)ptr) ? trueBuf : falseBuf;
APPEND(tmpBuf1);
break;
}
case 'B': {
*bufp++ = '[';
if (bufp == &printString[PRINT_BUF_SIZE])
break;
for (int i = 0; i < width; ++i) {
bool val0 = *((bool*)ptr);
bool val = val0; \
if (mask & (1ull << i)) {
const char *tmpBuf1 = val ? trueBuf : falseBuf;
APPEND(tmpBuf1);
}
else
APPEND("_________");
*bufp++ = (i != width-1) ? ',' : ']';
}
break;
}
case 'i': PRINT_SCALAR("%d", int);
case 'I': PRINT_VECTOR("%d", int);
case 'u': PRINT_SCALAR("%u", unsigned int);
case 'U': PRINT_VECTOR("%u", unsigned int);
case 'f': PRINT_SCALAR("%f", float);
case 'F': PRINT_VECTOR("%f", float);
case 'l': PRINT_SCALAR("%lld", long long);
case 'L': PRINT_VECTOR("%lld", long long);
case 'v': PRINT_SCALAR("%llu", unsigned long long);
case 'V': PRINT_VECTOR("%llu", unsigned long long);
case 'd': PRINT_SCALAR("%f", double);
case 'D': PRINT_VECTOR("%f", double);
case 'p': PRINT_SCALAR("%p", void *);
case 'P': PRINT_VECTOR("%p", void *);
default:
APPEND("UNKNOWN TYPE ");
*bufp++ = *types;
}
++types;
}
}
++format;
}
done:
*bufp = '\n'; bufp++;
*bufp = '\0';
}

View File

@@ -1,5 +1,5 @@
/*
Copyright (c) 2010-2011, Intel Corporation
Copyright (c) 2010-2013, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -28,7 +28,7 @@
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file builtins-c.c
@@ -50,6 +50,16 @@
available to ispc programs at compile time automatically.
*/
#ifdef _MSC_VER
// We do want old school sprintf and don't want secure Microsoft extensions.
// And we also don't want warnings about it, so the define.
#define _CRT_SECURE_NO_WARNINGS
#else
// Some versions of glibc has "fortification" feature, which expands sprintf
// to __builtin___sprintf_chk(..., __builtin_object_size(...), ...).
// We don't want this kind of expansion, as we don't support these intrinsics.
#define _FORTIFY_SOURCE 0
#endif
#ifndef _MSC_VER
#include <unistd.h>
@@ -59,22 +69,39 @@
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
typedef int Bool;
#define PRINT_SCALAR(fmt, type) \
printf(fmt, *((type *)ptr)); \
#define PRINT_BUF_SIZE 4096
#define APPEND(str) \
do { \
int offset = bufp - &printString[0]; \
*bufp = '\0'; \
strncat(bufp, str, PRINT_BUF_SIZE-offset); \
bufp += strlen(str); \
if (bufp >= &printString[PRINT_BUF_SIZE]) \
goto done; \
} while (0) /* eat semicolon */
#define PRINT_SCALAR(fmt, type) \
sprintf(tmpBuf, fmt, *((type *)ptr)); \
APPEND(tmpBuf); \
break
#define PRINT_VECTOR(fmt, type) \
putchar('['); \
*bufp++ = '['; \
if (bufp == &printString[PRINT_BUF_SIZE]) break; \
for (int i = 0; i < width; ++i) { \
/* only print the value if the current lane is executing */ \
if (mask & (1<<i)) \
printf(fmt, ((type *)ptr)[i]); \
if (mask & (1ull<<i)) \
sprintf(tmpBuf, fmt, ((type *)ptr)[i]); \
else \
printf("((" fmt "))", ((type *)ptr)[i]); \
putchar(i != width-1 ? ',' : ']'); \
sprintf(tmpBuf, "((" fmt "))", ((type *)ptr)[i]); \
APPEND(tmpBuf); \
*bufp++ = (i != width-1 ? ',' : ']'); \
} \
break
@@ -84,21 +111,23 @@ typedef int Bool;
@param format Print format string
@param types Encoded types of the values being printed.
(See lEncodeType()).
(See lEncodeType()).
@param width Vector width of the compilation target
@param mask Current lane mask when the print statemnt is called
@param args Array of pointers to the values to be printed
*/
void __do_print(const char *format, const char *types, int width, int mask,
void __do_print(const char *format, const char *types, int width, uint64_t mask,
void **args) {
if (mask == 0)
return;
char printString[PRINT_BUF_SIZE+1]; // +1 for trailing NUL
char *bufp = &printString[0];
char tmpBuf[256];
int argCount = 0;
while (*format) {
while (*format && bufp < &printString[PRINT_BUF_SIZE]) {
// Format strings are just single percent signs.
if (*format != '%')
putchar(*format);
if (*format != '%') {
*bufp++ = *format;
}
else {
if (*types) {
void *ptr = args[argCount++];
@@ -107,17 +136,22 @@ void __do_print(const char *format, const char *types, int width, int mask,
// printf() formatting string.
switch (*types) {
case 'b': {
printf("%s", *((Bool *)ptr) ? "true" : "false");
sprintf(tmpBuf, "%s", *((Bool *)ptr) ? "true" : "false");
APPEND(tmpBuf);
break;
}
case 'B': {
putchar('[');
*bufp++ = '[';
if (bufp == &printString[PRINT_BUF_SIZE])
break;
for (int i = 0; i < width; ++i) {
if (mask & (1<<i))
printf("%s", ((Bool *)ptr)[i] ? "true" : "false");
if (mask & (1ull << i)) {
sprintf(tmpBuf, "%s", ((Bool *)ptr)[i] ? "true" : "false");
APPEND(tmpBuf);
}
else
printf("_________");
putchar(i != width-1 ? ',' : ']');
APPEND("_________");
*bufp++ = (i != width-1) ? ',' : ']';
}
break;
}
@@ -136,17 +170,96 @@ void __do_print(const char *format, const char *types, int width, int mask,
case 'p': PRINT_SCALAR("%p", void *);
case 'P': PRINT_VECTOR("%p", void *);
default:
printf("UNKNOWN TYPE ");
putchar(*types);
APPEND("UNKNOWN TYPE ");
*bufp++ = *types;
}
++types;
}
}
++format;
}
done:
*bufp = '\0';
fputs(printString, stdout);
fflush(stdout);
}
/* this is print for PTX target only */
int __puts_nvptx(const char *);
void __do_print_nvptx(const char *format, const char *types, int width, uint64_t mask,
void **args) {
#if 0
char printString[PRINT_BUF_SIZE+1]; // +1 for trailing NUL
char *bufp = &printString[0];
char tmpBuf[256];
int argCount = 0;
while (*format && bufp < &printString[PRINT_BUF_SIZE]) {
// Format strings are just single percent signs.
if (*format != '%') {
*bufp++ = *format;
}
else {
if (*types) {
void *ptr = args[argCount++];
// Based on the encoding in the types string, cast the
// value appropriately and print it with a reasonable
// printf() formatting string.
switch (*types) {
case 'b': {
sprintf(tmpBuf, "%s", *((Bool *)ptr) ? "true" : "false");
APPEND(tmpBuf);
break;
}
case 'B': {
*bufp++ = '[';
if (bufp == &printString[PRINT_BUF_SIZE])
break;
for (int i = 0; i < width; ++i) {
if (mask & (1ull << i)) {
sprintf(tmpBuf, "%s", ((Bool *)ptr)[i] ? "true" : "false");
APPEND(tmpBuf);
}
else
APPEND("_________");
*bufp++ = (i != width-1) ? ',' : ']';
}
break;
}
case 'i': PRINT_SCALAR("%d", int);
case 'I': PRINT_VECTOR("%d", int);
case 'u': PRINT_SCALAR("%u", unsigned int);
case 'U': PRINT_VECTOR("%u", unsigned int);
case 'f': PRINT_SCALAR("%f", float);
case 'F': PRINT_VECTOR("%f", float);
case 'l': PRINT_SCALAR("%lld", long long);
case 'L': PRINT_VECTOR("%lld", long long);
case 'v': PRINT_SCALAR("%llu", unsigned long long);
case 'V': PRINT_VECTOR("%llu", unsigned long long);
case 'd': PRINT_SCALAR("%f", double);
case 'D': PRINT_VECTOR("%f", double);
case 'p': PRINT_SCALAR("%p", void *);
case 'P': PRINT_VECTOR("%p", void *);
default:
APPEND("UNKNOWN TYPE ");
*bufp++ = *types;
}
++types;
}
}
++format;
}
done:
*bufp = '\n'; bufp++;
*bufp = '\0';
__puts_nvptx(printString);
#else
__puts_nvptx("---nvptx printing is not support---\n");
#endif
}
int __num_cores() {
#if defined(_MSC_VER) || defined(__MINGW32__)

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2011, Intel Corporation
;; Copyright (c) 2011-2016, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -41,30 +41,97 @@
@__system_best_isa = internal global i32 -1
declare void @abort() noreturn
;; The below is the result of running "clang -O2 -emit-llvm -c -o -" on the
;; following code... Specifically, __get_system_isa should return a value
;; corresponding to one of the Target::ISA enumerant values that gives the
;; most capable ISA that the curremt system can run.
;;
;; #ifdef _MSC_VER
;; extern void __stdcall __cpuid(int info[4], int infoType);
;; #else
;;
;; #include <stdint.h>
;; #include <stdlib.h>
;;
;; static void __cpuid(int info[4], int infoType) {
;; __asm__ __volatile__ ("cpuid"
;; : "=a" (info[0]), "=b" (info[1]), "=c" (info[2]), "=d" (info[3])
;; : "0" (infoType));
;; }
;; #endif
;;
;; // Save %ebx in case it's the PIC register.
;; static void __cpuid_count(int info[4], int level, int count) {
;; __asm__ __volatile__ ("xchg{l}\t{%%}ebx, %1\n\t"
;; "cpuid\n\t"
;; "xchg{l}\t{%%}ebx, %1\n\t"
;; : "=a" (info[0]), "=r" (info[1]), "=c" (info[2]), "=d" (info[3])
;; : "0" (level), "2" (count));
;; }
;;
;; static int __os_has_avx_support() {
;; // Check xgetbv; this uses a .byte sequence instead of the instruction
;; // directly because older assemblers do not include support for xgetbv and
;; // there is no easy way to conditionally compile based on the assembler used.
;; int rEAX, rEDX;
;; __asm__ __volatile__ (".byte 0x0f, 0x01, 0xd0" : "=a" (rEAX), "=d" (rEDX) : "c" (0));
;; return (rEAX & 6) == 6;
;; }
;;
;; static int __os_has_avx512_support() {
;; // Check if the OS saves the XMM, YMM and ZMM registers, i.e. it supports AVX2 and AVX512.
;; // See section 2.1 of software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
;; // Check xgetbv; this uses a .byte sequence instead of the instruction
;; // directly because older assemblers do not include support for xgetbv and
;; // there is no easy way to conditionally compile based on the assembler used.
;; int rEAX, rEDX;
;; __asm__ __volatile__ (".byte 0x0f, 0x01, 0xd0" : "=a" (rEAX), "=d" (rEDX) : "c" (0));
;; return (rEAX & 0xE6) == 0xE6;
;; }
;;
;; int32_t __get_system_isa() {
;; int info[4];
;; __cpuid(info, 1);
;; /* NOTE: the values returned below must be the same as the
;; corresponding enumerant values in Target::ISA. */
;; if ((info[2] & (1 << 28)) != 0)
;; return 2; // AVX
;;
;; // Call cpuid with eax=7, ecx=0
;; int info2[4];
;; __cpuid_count(info2, 7, 0);
;;
;; // NOTE: the values returned below must be the same as the
;; // corresponding enumerant values in Target::ISA.
;; if ((info[2] & (1 << 27)) != 0 && // OSXSAVE
;; (info2[1] & (1 << 5)) != 0 && // AVX2
;; (info2[1] & (1 << 16)) != 0 && // AVX512 F
;; __os_has_avx512_support()) {
;; // We need to verify that AVX2 is also available,
;; // as well as AVX512, because our targets are supposed
;; // to use both.
;;
;; if ((info2[1] & (1 << 17)) != 0 && // AVX512 DQ
;; (info2[1] & (1 << 28)) != 0 && // AVX512 CDI
;; (info2[1] & (1 << 30)) != 0 && // AVX512 BW
;; (info2[1] & (1 << 31)) != 0) { // AVX512 VL
;; return 6; // SKX
;; }
;; else if ((info2[1] & (1 << 26)) != 0 && // AVX512 PF
;; (info2[1] & (1 << 27)) != 0 && // AVX512 ER
;; (info2[1] & (1 << 28)) != 0) { // AVX512 CDI
;; return 5; // KNL_AVX512
;; }
;; // If it's unknown AVX512 target, fall through and use AVX2
;; // or whatever is available in the machine.
;; }
;;
;; if ((info[2] & (1 << 27)) != 0 && // OSXSAVE
;; (info[2] & (1 << 28)) != 0 &&
;; __os_has_avx_support()) {
;; if ((info[2] & (1 << 29)) != 0 && // F16C
;; (info[2] & (1 << 30)) != 0) { // RDRAND
;; // So far, so good. AVX2?
;; if ((info2[1] & (1 << 5)) != 0)
;; return 4;
;; else
;; return 3;
;; }
;; // Regular AVX
;; return 2;
;; }
;; else if ((info[2] & (1 << 19)) != 0)
;; return 1; // SSE4
;; else if ((info[3] & (1 << 26)) != 0)
@@ -73,42 +140,112 @@ declare void @abort() noreturn
;; abort();
;; }
%0 = type { i32, i32, i32, i32 }
define i32 @__get_system_isa() nounwind ssp {
%1 = tail call %0 asm sideeffect "cpuid", "={ax},={bx},={cx},={dx},0,~{dirflag},~{fpsr},~{flags}"(i32 1) nounwind
%2 = extractvalue %0 %1, 2
%3 = extractvalue %0 %1, 3
%4 = and i32 %2, 268435456
%5 = icmp eq i32 %4, 0
br i1 %5, label %6, label %13
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; <label>:6 ; preds = %0
%7 = and i32 %2, 524288
%8 = icmp eq i32 %7, 0
br i1 %8, label %9, label %13
;; LLVM has different IR for different versions since 3.7
; <label>:9 ; preds = %6
%10 = and i32 %3, 67108864
%11 = icmp eq i32 %10, 0
br i1 %11, label %12, label %13
define(`PTR_OP_ARGS',
ifelse(LLVM_VERSION, LLVM_3_7,
``$1 , $1 *'',
LLVM_VERSION, LLVM_3_8,
``$1 , $1 *'',
LLVM_VERSION, LLVM_3_9,
``$1 , $1 *'',
LLVM_VERSION, LLVM_4_0,
``$1 , $1 *'',
LLVM_VERSION, LLVM_5_0,
``$1 , $1 *'',
``$1 *''
)
)
; <label>:12 ; preds = %9
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
define i32 @__get_system_isa() nounwind uwtable {
entry:
%0 = tail call { i32, i32, i32, i32 } asm sideeffect "cpuid", "={ax},={bx},={cx},={dx},0,~{dirflag},~{fpsr},~{flags}"(i32 1) nounwind
%asmresult5.i = extractvalue { i32, i32, i32, i32 } %0, 2
%asmresult6.i = extractvalue { i32, i32, i32, i32 } %0, 3
%1 = tail call { i32, i32, i32, i32 } asm sideeffect "xchg$(l$)\09$(%$)ebx, $1\0A\09cpuid\0A\09xchg$(l$)\09$(%$)ebx, $1\0A\09", "={ax},=r,={cx},={dx},0,2,~{dirflag},~{fpsr},~{flags}"(i32 7, i32 0) nounwind
%asmresult4.i87 = extractvalue { i32, i32, i32, i32 } %1, 1
%and = and i32 %asmresult5.i, 134217728
%cmp = icmp eq i32 %and, 0
br i1 %cmp, label %if.else65, label %land.lhs.true
land.lhs.true: ; preds = %entry
%2 = and i32 %asmresult4.i87, 65568
%3 = icmp eq i32 %2, 65568
br i1 %3, label %land.lhs.true9, label %if.end39
land.lhs.true9: ; preds = %land.lhs.true
%4 = tail call { i32, i32 } asm sideeffect ".byte 0x0f, 0x01, 0xd0", "={ax},={dx},{cx},~{dirflag},~{fpsr},~{flags}"(i32 0) nounwind
%asmresult.i90 = extractvalue { i32, i32 } %4, 0
%and.i = and i32 %asmresult.i90, 230
%cmp.i = icmp eq i32 %and.i, 230
br i1 %cmp.i, label %if.then, label %if.end39
if.then: ; preds = %land.lhs.true9
%5 = and i32 %asmresult4.i87, -805175296
%6 = icmp eq i32 %5, -805175296
br i1 %6, label %return, label %if.else
if.else: ; preds = %if.then
%7 = and i32 %asmresult4.i87, 469762048
%8 = icmp eq i32 %7, 469762048
br i1 %8, label %return, label %if.end39
if.end39: ; preds = %if.else, %land.lhs.true9, %land.lhs.true
%9 = and i32 %asmresult5.i, 402653184
%10 = icmp eq i32 %9, 402653184
br i1 %10, label %land.lhs.true47, label %if.else65
land.lhs.true47: ; preds = %if.end39
%11 = tail call { i32, i32 } asm sideeffect ".byte 0x0f, 0x01, 0xd0", "={ax},={dx},{cx},~{dirflag},~{fpsr},~{flags}"(i32 0) nounwind
%asmresult.i91 = extractvalue { i32, i32 } %11, 0
%and.i92 = and i32 %asmresult.i91, 6
%cmp.i93 = icmp eq i32 %and.i92, 6
br i1 %cmp.i93, label %if.then50, label %if.else65
if.then50: ; preds = %land.lhs.true47
%12 = and i32 %asmresult5.i, 1610612736
%13 = icmp eq i32 %12, 1610612736
br i1 %13, label %if.then58, label %return
if.then58: ; preds = %if.then50
%and60 = lshr i32 %asmresult4.i87, 5
%14 = and i32 %and60, 1
%15 = add i32 %14, 3
br label %return
if.else65: ; preds = %land.lhs.true47, %if.end39, %entry
%and67 = and i32 %asmresult5.i, 524288
%cmp68 = icmp eq i32 %and67, 0
br i1 %cmp68, label %if.else70, label %return
if.else70: ; preds = %if.else65
%and72 = and i32 %asmresult6.i, 67108864
%cmp73 = icmp eq i32 %and72, 0
br i1 %cmp73, label %if.else75, label %return
if.else75: ; preds = %if.else70
tail call void @abort() noreturn nounwind
unreachable
; <label>:13 ; preds = %9, %6, %0
%.0 = phi i32 [ 2, %0 ], [ 1, %6 ], [ 0, %9 ]
ret i32 %.0
return: ; preds = %if.else70, %if.else65, %if.then58, %if.then50, %if.else, %if.then
%retval.0 = phi i32 [ 6, %if.then ], [ 5, %if.else ], [ %15, %if.then58 ], [ 2, %if.then50 ], [ 1, %if.else65 ], [ 0, %if.else70 ]
ret i32 %retval.0
}
declare void @abort() noreturn nounwind
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; This function is called by each of the dispatch functions we generate;
;; it sets @__system_best_isa if it is unset.
define void @__set_system_isa() {
entry:
%bi = load i32* @__system_best_isa
%bi = load PTR_OP_ARGS(`i32 ') @__system_best_isa
%unset = icmp eq i32 %bi, -1
br i1 %unset, label %set_system_isa, label %done

216
builtins/svml.m4 Normal file
View File

@@ -0,0 +1,216 @@
;; Copyright (c) 2013-2015, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;; svml macro
;; svml_stubs : stubs for svml calls
;; $1 - type ("float" or "double")
;; $2 - svml internal function suffix ("f" for float, "d" for double)
;; $3 - vector width
define(`svml_stubs',`
declare <$3 x $1> @__svml_sin$2(<$3 x $1>) nounwind readnone alwaysinline
declare <$3 x $1> @__svml_asin$2(<$3 x $1>) nounwind readnone alwaysinline
declare <$3 x $1> @__svml_cos$2(<$3 x $1>) nounwind readnone alwaysinline
declare void @__svml_sincos$2(<$3 x $1>, <$3 x $1> *, <$3 x $1> *) nounwind alwaysinline
declare <$3 x $1> @__svml_tan$2(<$3 x $1>) nounwind readnone alwaysinline
declare <$3 x $1> @__svml_atan$2(<$3 x $1>) nounwind readnone alwaysinline
declare <$3 x $1> @__svml_atan2$2(<$3 x $1>, <$3 x $1>) nounwind readnone alwaysinline
declare <$3 x $1> @__svml_exp$2(<$3 x $1>) nounwind readnone alwaysinline
declare <$3 x $1> @__svml_log$2(<$3 x $1>) nounwind readnone alwaysinline
declare <$3 x $1> @__svml_pow$2(<$3 x $1>, <$3 x $1>) nounwind readnone alwaysinline
')
;; svml_declare : declaration of __svml_* intrinsics
;; $1 - type ("float" or "double")
;; $2 - __svml_* intrinsic function suffix
;; float: "f4"(sse) "f8"(avx) "f16"(avx512)
;; double: "2"(sse) "4"(avx) "8"(avx512)
;; $3 - vector width
define(`svml_declare',`
declare <$3 x $1> @__svml_sin$2(<$3 x $1>) nounwind readnone
declare <$3 x $1> @__svml_asin$2(<$3 x $1>) nounwind readnone
declare <$3 x $1> @__svml_cos$2(<$3 x $1>) nounwind readnone
declare <$3 x $1> @__svml_sincos$2(<$3 x $1> *, <$3 x $1>) nounwind readnone
declare <$3 x $1> @__svml_tan$2(<$3 x $1>) nounwind readnone
declare <$3 x $1> @__svml_atan$2(<$3 x $1>) nounwind readnone
declare <$3 x $1> @__svml_atan2$2(<$3 x $1>, <$3 x $1>) nounwind readnone
declare <$3 x $1> @__svml_exp$2(<$3 x $1>) nounwind readnone
declare <$3 x $1> @__svml_log$2(<$3 x $1>) nounwind readnone
declare <$3 x $1> @__svml_pow$2(<$3 x $1>, <$3 x $1>) nounwind readnone
');
;; defintition of __svml_* internal functions
;; $1 - type ("float" or "double")
;; $2 - __svml_* intrinsic function suffix
;; float: "f4"(sse) "f8"(avx) "f16"(avx512)
;; double: "2"(sse) "4"(avx) "8"(avx512)
;; $3 - vector width
;; $4 - svml internal function suffix ("f" for float, "d" for double)
define(`svml_define',`
define <$3 x $1> @__svml_sin$4(<$3 x $1>) nounwind readnone alwaysinline {
%ret = call <$3 x $1> @__svml_sin$2(<$3 x $1> %0)
ret <$3 x $1> %ret
}
define <$3 x $1> @__svml_asin$4(<$3 x $1>) nounwind readnone alwaysinline {
%ret = call <$3 x $1> @__svml_asin$2(<$3 x $1> %0)
ret <$3 x $1> %ret
}
define <$3 x $1> @__svml_cos$4(<$3 x $1>) nounwind readnone alwaysinline {
%ret = call <$3 x $1> @__svml_cos$2(<$3 x $1> %0)
ret <$3 x $1> %ret
}
define void @__svml_sincos$4(<$3 x $1>, <$3 x $1> *, <$3 x $1> *) nounwind alwaysinline {
%s = call <$3 x $1> @__svml_sincos$2(<$3 x $1> * %2, <$3 x $1> %0)
store <$3 x $1> %s, <$3 x $1> * %1
ret void
}
define <$3 x $1> @__svml_tan$4(<$3 x $1>) nounwind readnone alwaysinline {
%ret = call <$3 x $1> @__svml_tan$2(<$3 x $1> %0)
ret <$3 x $1> %ret
}
define <$3 x $1> @__svml_atan$4(<$3 x $1>) nounwind readnone alwaysinline {
%ret = call <$3 x $1> @__svml_atan$2(<$3 x $1> %0)
ret <$3 x $1> %ret
}
define <$3 x $1> @__svml_atan2$4(<$3 x $1>, <$3 x $1>) nounwind readnone alwaysinline {
%ret = call <$3 x $1> @__svml_atan2$2(<$3 x $1> %0, <$3 x $1> %1)
ret <$3 x $1> %ret
}
define <$3 x $1> @__svml_exp$4(<$3 x $1>) nounwind readnone alwaysinline {
%ret = call <$3 x $1> @__svml_exp$2(<$3 x $1> %0)
ret <$3 x $1> %ret
}
define <$3 x $1> @__svml_log$4(<$3 x $1>) nounwind readnone alwaysinline {
%ret = call <$3 x $1> @__svml_log$2(<$3 x $1> %0)
ret <$3 x $1> %ret
}
define <$3 x $1> @__svml_pow$4(<$3 x $1>, <$3 x $1>) nounwind readnone alwaysinline {
%ret = call <$3 x $1> @__svml_pow$2(<$3 x $1> %0, <$3 x $1> %1)
ret <$3 x $1> %ret
}
')
;; svml_define_x : defintition of __svml_* internal functions operation on extended width
;; $1 - type ("float" or "double")
;; $2 - __svml_* intrinsic function suffix
;; float: "f4"(sse) "f8"(avx) "f16"(avx512)
;; double: "2"(sse) "4"(avx) "8"(avx512)
;; $3 - vector width
;; $4 - svml internal function suffix ("f" for float, "d" for double)
;; $5 - extended width, must be at least twice the native vector width
;; contigent on existing of unary$3to$5 and binary$3to$5 macros
;; *todo*: in sincos call use __svml_sincos[f][2,4,8,16] call, e.g.
;;define void @__svml_sincosf(<8 x float>, <8 x float> *,
;; <8 x float> *) nounwind alwaysinline {
;; ; call svml_sincosf4 two times with the two 4-wide sub-vectors
;; %a = shufflevector <8 x float> %0, <8 x float> undef,
;; <4 x i32> <i32 0, i32 1, i32 2, i32 3>
;; %b = shufflevector <8 x float> %0, <8 x float> undef,
;; <4 x i32> <i32 4, i32 5, i32 6, i32 7>
;;
;; %cospa = alloca <4 x float>
;; %sa = call <4 x float> @__svml_sincosf4(<4 x float> * %cospa, <4 x float> %a)
;;
;; %cospb = alloca <4 x float>
;; %sb = call <4 x float> @__svml_sincosf4(<4 x float> * %cospb, <4 x float> %b)
;;
;; %sin = shufflevector <4 x float> %sa, <4 x float> %sb,
;; <8 x i32> <i32 0, i32 1, i32 2, i32 3,
;; i32 4, i32 5, i32 6, i32 7>
;; store <8 x float> %sin, <8 x float> * %1
;;
;; %cosa = load <4 x float> * %cospa
;; %cosb = load <4 x float> * %cospb
;; %cos = shufflevector <4 x float> %cosa, <4 x float> %cosb,
;; <8 x i32> <i32 0, i32 1, i32 2, i32 3,
;; i32 4, i32 5, i32 6, i32 7>
;; store <8 x float> %cos, <8 x float> * %2
;;
;; ret void
;;}
define(`svml_define_x',`
define <$5 x $1> @__svml_sin$4(<$5 x $1>) nounwind readnone alwaysinline {
unary$3to$5(ret, $1, @__svml_sin$2, %0)
ret <$5 x $1> %ret
}
define <$5 x $1> @__svml_asin$4(<$5 x $1>) nounwind readnone alwaysinline {
unary$3to$5(ret, $1, @__svml_asin$2, %0)
ret <$5 x $1> %ret
}
define <$5 x $1> @__svml_cos$4(<$5 x $1>) nounwind readnone alwaysinline {
unary$3to$5(ret, $1, @__svml_cos$2, %0)
ret <$5 x $1> %ret
}
define void @__svml_sincos$4(<$5 x $1>,<$5 x $1>*,<$5 x $1>*) nounwind alwaysinline
{
%s = call <$5 x $1> @__svml_sin$4(<$5 x $1> %0)
%c = call <$5 x $1> @__svml_cos$4(<$5 x $1> %0)
store <$5 x $1> %s, <$5 x $1> * %1
store <$5 x $1> %c, <$5 x $1> * %2
ret void
}
define <$5 x $1> @__svml_tan$4(<$5 x $1>) nounwind readnone alwaysinline {
unary$3to$5(ret, $1, @__svml_tan$2, %0)
ret <$5 x $1> %ret
}
define <$5 x $1> @__svml_atan$4(<$5 x $1>) nounwind readnone alwaysinline {
unary$3to$5(ret, $1, @__svml_atan$2, %0)
ret <$5 x $1> %ret
}
define <$5 x $1> @__svml_atan2$4(<$5 x $1>,<$5 x $1>) nounwind readnone alwaysinline {
binary$3to$5(ret, $1, @__svml_atan2$2, %0, %1)
ret <$5 x $1> %ret
}
define <$5 x $1> @__svml_exp$4(<$5 x $1>) nounwind readnone alwaysinline {
unary$3to$5(ret, $1, @__svml_exp$2, %0)
ret <$5 x $1> %ret
}
define <$5 x $1> @__svml_log$4(<$5 x $1>) nounwind readnone alwaysinline {
unary$3to$5(ret, $1, @__svml_log$2, %0)
ret <$5 x $1> %ret
}
define <$5 x $1> @__svml_pow$4(<$5 x $1>,<$5 x $1>) nounwind readnone alwaysinline {
binary$3to$5(ret, $1, @__svml_pow$2, %0, %1)
ret <$5 x $1> %ret
}
')

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2010-2011, Intel Corporation
;; Copyright (c) 2010-2015, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -31,30 +31,16 @@
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; AVX target implementation.
;;
;; Please note that this file uses SSE intrinsics, but LLVM generates AVX
;; instructions, so it doesn't makes sense to change this implemenation.
ctlztz()
define_prefetches()
define_shuffles()
aossoa()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rcp
declare <4 x float> @llvm.x86.sse.rcp.ss(<4 x float>) nounwind readnone
define float @__rcp_uniform_float(float) nounwind readonly alwaysinline {
; uniform float iv = extract(__rcp_u(v), 0);
; return iv * (2. - v * iv);
%vecval = insertelement <4 x float> undef, float %0, i32 0
%call = call <4 x float> @llvm.x86.sse.rcp.ss(<4 x float> %vecval)
%scall = extractelement <4 x float> %call, i32 0
; do one N-R iteration
%v_iv = fmul float %0, %scall
%two_minus = fsub float 2., %v_iv
%iv_mul = fmul float %scall, %two_minus
ret float %iv_mul
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rounding floats
@@ -77,7 +63,8 @@ define float @__round_uniform_float(float) nounwind readonly alwaysinline {
; r3 = a3
;
; It doesn't matter what we pass as a, since we only need the r0 value
; here. So we pass the same register for both.
; here. So we pass the same register for both. Further, only the 0th
; element of the b parameter matters
%xi = insertelement <4 x float> undef, float %0, i32 0
%xr = call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> %xi, <4 x float> %xi, i32 8)
%rs = extractelement <4 x float> %xr, i32 0
@@ -117,7 +104,7 @@ define double @__round_uniform_double(double) nounwind readonly alwaysinline {
define double @__floor_uniform_double(double) nounwind readonly alwaysinline {
; see above for round_ss instrinsic discussion...
%xi = insertelement <2 x double> undef, double %0, i32 0
; roundpd, round down 0b01 | don't signal precision exceptions 0b1001 = 9
; roundsd, round down 0b01 | don't signal precision exceptions 0b1001 = 9
%xr = call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> %xi, <2 x double> %xi, i32 9)
%rs = extractelement <2 x double> %xr, i32 0
ret double %rs
@@ -126,12 +113,31 @@ define double @__floor_uniform_double(double) nounwind readonly alwaysinline {
define double @__ceil_uniform_double(double) nounwind readonly alwaysinline {
; see above for round_ss instrinsic discussion...
%xi = insertelement <2 x double> undef, double %0, i32 0
; roundpd, round up 0b10 | don't signal precision exceptions 0b1010 = 10
; roundsd, round up 0b10 | don't signal precision exceptions 0b1010 = 10
%xr = call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> %xi, <2 x double> %xi, i32 10)
%rs = extractelement <2 x double> %xr, i32 0
ret double %rs
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rcp
declare <4 x float> @llvm.x86.sse.rcp.ss(<4 x float>) nounwind readnone
define float @__rcp_uniform_float(float) nounwind readonly alwaysinline {
; do the rcpss call
; uniform float iv = extract(__rcp_u(v), 0);
; return iv * (2. - v * iv);
%vecval = insertelement <4 x float> undef, float %0, i32 0
%call = call <4 x float> @llvm.x86.sse.rcp.ss(<4 x float> %vecval)
%scall = extractelement <4 x float> %call, i32 0
; do one N-R iteration to improve precision, as above
%v_iv = fmul float %0, %scall
%two_minus = fsub float 2., %v_iv
%iv_mul = fmul float %scall, %two_minus
ret float %iv_mul
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rsqrt
@@ -144,6 +150,7 @@ define float @__rsqrt_uniform_float(float) nounwind readonly alwaysinline {
%vis = call <4 x float> @llvm.x86.sse.rsqrt.ss(<4 x float> %v)
%is = extractelement <4 x float> %vis, i32 0
; Newton-Raphson iteration to improve precision
; return 0.5 * is * (3. - (v * is) * is);
%v_is = fmul float %0, %is
%v_is_is = fmul float %v_is, %is
@@ -164,9 +171,18 @@ define float @__sqrt_uniform_float(float) nounwind readonly alwaysinline {
ret float %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision sqrt
declare <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double>) nounwind readnone
define double @__sqrt_uniform_double(double) nounwind alwaysinline {
sse_unary_scalar(ret, 2, double, @llvm.x86.sse2.sqrt.sd, %0)
ret double %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; fastmath
;; fast math mode
declare void @llvm.x86.sse.stmxcsr(i8 *) nounwind
declare void @llvm.x86.sse.ldmxcsr(i8 *) nounwind
@@ -175,7 +191,7 @@ define void @__fastmath() nounwind alwaysinline {
%ptr = alloca i32
%ptr8 = bitcast i32 * %ptr to i8 *
call void @llvm.x86.sse.stmxcsr(i8 * %ptr8)
%oldval = load i32 *%ptr
%oldval = load PTR_OP_ARGS(`i32 ') %ptr
; turn on DAZ (64)/FTZ (32768) -> 32832
%update = or i32 %oldval, 32832
@@ -187,33 +203,51 @@ define void @__fastmath() nounwind alwaysinline {
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float min/max
declare <4 x float> @llvm.x86.sse.max.ss(<4 x float>, <4 x float>) nounwind readnone
declare <4 x float> @llvm.x86.sse.min.ss(<4 x float>, <4 x float>) nounwind readnone
define float @__max_uniform_float(float, float) nounwind readonly alwaysinline {
sse_binary_scalar(ret, 4, float, @llvm.x86.sse.max.ss, %0, %1)
%cmp = fcmp ogt float %1, %0
%ret = select i1 %cmp, float %1, float %0
ret float %ret
}
define float @__min_uniform_float(float, float) nounwind readonly alwaysinline {
sse_binary_scalar(ret, 4, float, @llvm.x86.sse.min.ss, %0, %1)
%cmp = fcmp ogt float %1, %0
%ret = select i1 %cmp, float %0, float %1
ret float %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision min/max
define double @__min_uniform_double(double, double) nounwind readnone alwaysinline {
%cmp = fcmp ogt double %1, %0
%ret = select i1 %cmp, double %0, double %1
ret double %ret
}
define double @__max_uniform_double(double, double) nounwind readnone alwaysinline {
%cmp = fcmp ogt double %1, %0
%ret = select i1 %cmp, double %1, double %0
ret double %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
declare <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.x86.sse41.pminud(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.x86.sse41.pmaxud(<4 x i32>, <4 x i32>) nounwind readnone
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int min/max
declare <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32>, <4 x i32>) nounwind readnone
define i32 @__min_uniform_int32(i32, i32) nounwind readonly alwaysinline {
sse_binary_scalar(ret, 4, i32, @llvm.x86.sse41.pminsd, %0, %1)
%cmp = icmp sgt i32 %1, %0
%ret = select i1 %cmp, i32 %0, i32 %1
ret i32 %ret
}
define i32 @__max_uniform_int32(i32, i32) nounwind readonly alwaysinline {
sse_binary_scalar(ret, 4, i32, @llvm.x86.sse41.pmaxsd, %0, %1)
%cmp = icmp sgt i32 %1, %0
%ret = select i1 %cmp, i32 %1, i32 %0
ret i32 %ret
}
@@ -221,21 +255,20 @@ define i32 @__max_uniform_int32(i32, i32) nounwind readonly alwaysinline {
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unsigned int min/max
declare <4 x i32> @llvm.x86.sse41.pminud(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.x86.sse41.pmaxud(<4 x i32>, <4 x i32>) nounwind readnone
define i32 @__min_uniform_uint32(i32, i32) nounwind readonly alwaysinline {
sse_binary_scalar(ret, 4, i32, @llvm.x86.sse41.pminud, %0, %1)
%cmp = icmp ugt i32 %1, %0
%ret = select i1 %cmp, i32 %0, i32 %1
ret i32 %ret
}
define i32 @__max_uniform_uint32(i32, i32) nounwind readonly alwaysinline {
sse_binary_scalar(ret, 4, i32, @llvm.x86.sse41.pmaxud, %0, %1)
%cmp = icmp ugt i32 %1, %0
%ret = select i1 %cmp, i32 %1, i32 %0
ret i32 %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; horizontal ops
;; horizontal ops / reductions
declare i32 @llvm.ctpop.i32(i32) nounwind readnone
@@ -251,29 +284,10 @@ define i64 @__popcnt_int64(i64) nounwind readonly alwaysinline {
ret i64 %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision sqrt
declare <2 x double> @llvm.x86.sse.sqrt.sd(<2 x double>) nounwind readnone
define double @__sqrt_uniform_double(double) nounwind alwaysinline {
sse_unary_scalar(ret, 2, double, @llvm.x86.sse.sqrt.sd, %0)
ret double %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision min/max
;; int8/int16 builtins
declare <2 x double> @llvm.x86.sse2.max.sd(<2 x double>, <2 x double>) nounwind readnone
declare <2 x double> @llvm.x86.sse2.min.sd(<2 x double>, <2 x double>) nounwind readnone
define_avgs()
declare_nvptx()
define double @__min_uniform_double(double, double) nounwind readnone alwaysinline {
sse_binary_scalar(ret, 2, double, @llvm.x86.sse2.min.sd, %0, %1)
ret double %ret
}
define double @__max_uniform_double(double, double) nounwind readnone alwaysinline {
sse_binary_scalar(ret, 2, double, @llvm.x86.sse2.max.sd, %0, %1)
ret double %ret
}

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2010-2011, Intel Corporation
;; Copyright (c) 2010-2015, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -40,6 +40,7 @@ stdlib_core()
packed_load_and_store()
scans()
int64minmax()
saturation_arithmetic()
include(`target-avx-common.ll')
@@ -137,19 +138,14 @@ define <16 x float> @__sqrt_varying_float(<16 x float>) nounwind readonly always
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; svml
; FIXME: need either to wire these up to the 8-wide SVML entrypoints,
; or, use the macro to call the 4-wide ones 4x with our 16-wide
; vectors...
include(`svml.m4')
;; single precision
svml_declare(float,f8,8)
svml_define_x(float,f8,8,f,16)
declare <16 x float> @__svml_sin(<16 x float>)
declare <16 x float> @__svml_cos(<16 x float>)
declare void @__svml_sincos(<16 x float>, <16 x float> *, <16 x float> *)
declare <16 x float> @__svml_tan(<16 x float>)
declare <16 x float> @__svml_atan(<16 x float>)
declare <16 x float> @__svml_atan2(<16 x float>, <16 x float>)
declare <16 x float> @__svml_exp(<16 x float>)
declare <16 x float> @__svml_log(<16 x float>)
declare <16 x float> @__svml_pow(<16 x float>, <16 x float>)
;; double precision
svml_declare(double,4,4)
svml_define_x(double,4,4,d,16)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float min/max
@@ -158,51 +154,24 @@ declare <8 x float> @llvm.x86.avx.max.ps.256(<8 x float>, <8 x float>) nounwind
declare <8 x float> @llvm.x86.avx.min.ps.256(<8 x float>, <8 x float>) nounwind readnone
define <16 x float> @__max_varying_float(<16 x float>,
<16 x float>) nounwind readonly alwaysinline {
<16 x float>) nounwind readonly alwaysinline {
binary8to16(call, float, @llvm.x86.avx.max.ps.256, %0, %1)
ret <16 x float> %call
}
define <16 x float> @__min_varying_float(<16 x float>,
<16 x float>) nounwind readonly alwaysinline {
<16 x float>) nounwind readonly alwaysinline {
binary8to16(call, float, @llvm.x86.avx.min.ps.256, %0, %1)
ret <16 x float> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int min/max
define <16 x i32> @__min_varying_int32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(ret, i32, @llvm.x86.sse41.pminsd, %0, %1)
ret <16 x i32> %ret
}
define <16 x i32> @__max_varying_int32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(ret, i32, @llvm.x86.sse41.pmaxsd, %0, %1)
ret <16 x i32> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unsigned int min/max
define <16 x i32> @__min_varying_uint32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(ret, i32, @llvm.x86.sse41.pminud, %0, %1)
ret <16 x i32> %ret
}
define <16 x i32> @__max_varying_uint32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(ret, i32, @llvm.x86.sse41.pmaxud, %0, %1)
ret <16 x i32> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; horizontal ops
declare i32 @llvm.x86.avx.movmsk.ps.256(<8 x float>) nounwind readnone
define i32 @__movmsk(<16 x i32>) nounwind readnone alwaysinline {
define i64 @__movmsk(<16 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <16 x i32> %0 to <16 x float>
%mask0 = shufflevector <16 x float> %floatmask, <16 x float> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
@@ -213,9 +182,57 @@ define i32 @__movmsk(<16 x i32>) nounwind readnone alwaysinline {
%v1shift = shl i32 %v1, 8
%v = or i32 %v1shift, %v0
ret i32 %v
%v64 = zext i32 %v to i64
ret i64 %v64
}
define i1 @__any(<16 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <16 x i32> %0 to <16 x float>
%mask0 = shufflevector <16 x float> %floatmask, <16 x float> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%v0 = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %mask0) nounwind readnone
%mask1 = shufflevector <16 x float> %floatmask, <16 x float> undef,
<8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%v1 = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %mask1) nounwind readnone
%v1shift = shl i32 %v1, 8
%v = or i32 %v1shift, %v0
%cmp = icmp ne i32 %v, 0
ret i1 %cmp
}
define i1 @__all(<16 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <16 x i32> %0 to <16 x float>
%mask0 = shufflevector <16 x float> %floatmask, <16 x float> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%v0 = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %mask0) nounwind readnone
%mask1 = shufflevector <16 x float> %floatmask, <16 x float> undef,
<8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%v1 = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %mask1) nounwind readnone
%v1shift = shl i32 %v1, 8
%v = or i32 %v1shift, %v0
%cmp = icmp eq i32 %v, 65535
ret i1 %cmp
}
define i1 @__none(<16 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <16 x i32> %0 to <16 x float>
%mask0 = shufflevector <16 x float> %floatmask, <16 x float> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%v0 = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %mask0) nounwind readnone
%mask1 = shufflevector <16 x float> %floatmask, <16 x float> undef,
<8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%v1 = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %mask1) nounwind readnone
%v1shift = shl i32 %v1, 8
%v = or i32 %v1shift, %v0
%cmp = icmp eq i32 %v, 0
ret i1 %cmp
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal float ops
@@ -250,8 +267,35 @@ reduce_equal(16)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int32 ops
declare <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8>, <16 x i8>) nounwind readnone
define i16 @__reduce_add_int8(<16 x i8>) nounwind readnone alwaysinline {
%rv = call <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8> %0,
<16 x i8> zeroinitializer)
%r0 = extractelement <2 x i64> %rv, i32 0
%r1 = extractelement <2 x i64> %rv, i32 1
%r = add i64 %r0, %r1
%r16 = trunc i64 %r to i16
ret i16 %r16
}
define internal <16 x i16> @__add_varying_i16(<16 x i16>,
<16 x i16>) nounwind readnone alwaysinline {
%r = add <16 x i16> %0, %1
ret <16 x i16> %r
}
define internal i16 @__add_uniform_i16(i16, i16) nounwind readnone alwaysinline {
%r = add i16 %0, %1
ret i16 %r
}
define i16 @__reduce_add_int16(<16 x i16>) nounwind readnone alwaysinline {
reduce16(i16, @__add_varying_i16, @__add_uniform_i16)
}
define <16 x i32> @__add_varying_int32(<16 x i32>,
<16 x i32>) nounwind readnone alwaysinline {
<16 x i32>) nounwind readnone alwaysinline {
%s = add <16 x i32> %0, %1
ret <16 x i32> %s
}
@@ -279,11 +323,6 @@ define i32 @__reduce_max_int32(<16 x i32>) nounwind readnone alwaysinline {
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; horizontal uint32 ops
define i32 @__reduce_add_uint32(<16 x i32> %v) nounwind readnone alwaysinline {
%r = call i32 @__reduce_add_int32(<16 x i32> %v)
ret i32 %r
}
define i32 @__reduce_min_uint32(<16 x i32>) nounwind readnone alwaysinline {
reduce16(i32, @__min_varying_uint32, @__min_uniform_uint32)
}
@@ -361,11 +400,6 @@ define i64 @__reduce_max_int64(<16 x i64>) nounwind readnone alwaysinline {
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; horizontal uint64 ops
define i64 @__reduce_add_uint64(<16 x i64> %v) nounwind readnone alwaysinline {
%r = call i64 @__reduce_add_int64(<16 x i64> %v)
ret i64 %r
}
define i64 @__reduce_min_uint64(<16 x i64>) nounwind readnone alwaysinline {
reduce16(i64, @__min_varying_uint64, @__min_uniform_uint64)
}
@@ -379,27 +413,22 @@ define i64 @__reduce_max_uint64(<16 x i64>) nounwind readnone alwaysinline {
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unaligned loads/loads+broadcasts
load_and_broadcast(16, i8, 8)
load_and_broadcast(16, i16, 16)
load_and_broadcast(16, i32, 32)
load_and_broadcast(16, i64, 64)
; no masked load instruction for i8 and i16 types??
masked_load(16, i8, 8, 1)
masked_load(16, i16, 16, 2)
masked_load(i8, 1)
masked_load(i16, 2)
declare <8 x float> @llvm.x86.avx.maskload.ps.256(i8 *, <8 x float> %mask)
declare <4 x double> @llvm.x86.avx.maskload.pd.256(i8 *, <4 x double> %mask)
declare <8 x float> @llvm.x86.avx.maskload.ps.256(i8 *, <8 x MfORi32> %mask)
declare <4 x double> @llvm.x86.avx.maskload.pd.256(i8 *, <4 x MdORi64> %mask)
define <16 x i32> @__masked_load_32(i8 *, <16 x i32> %mask) nounwind alwaysinline {
%floatmask = bitcast <16 x i32> %mask to <16 x float>
%mask0 = shufflevector <16 x float> %floatmask, <16 x float> undef,
define <16 x i32> @__masked_load_i32(i8 *, <16 x i32> %mask) nounwind alwaysinline {
%floatmask = bitcast <16 x i32> %mask to <16 x MfORi32>
%mask0 = shufflevector <16 x MfORi32> %floatmask, <16 x MfORi32> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%val0 = call <8 x float> @llvm.x86.avx.maskload.ps.256(i8 * %0, <8 x float> %mask0)
%mask1 = shufflevector <16 x float> %floatmask, <16 x float> undef,
%val0 = call <8 x float> @llvm.x86.avx.maskload.ps.256(i8 * %0, <8 x MfORi32> %mask0)
%mask1 = shufflevector <16 x MfORi32> %floatmask, <16 x MfORi32> undef,
<8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%ptr1 = getelementptr i8 * %0, i32 32 ;; 8x4 bytes = 32
%val1 = call <8 x float> @llvm.x86.avx.maskload.ps.256(i8 * %ptr1, <8 x float> %mask1)
%ptr1 = getelementptr PTR_OP_ARGS(`i8') %0, i32 32 ;; 8x4 bytes = 32
%val1 = call <8 x float> @llvm.x86.avx.maskload.ps.256(i8 * %ptr1, <8 x MfORi32> %mask1)
%retval = shufflevector <8 x float> %val0, <8 x float> %val1,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
@@ -409,7 +438,7 @@ define <16 x i32> @__masked_load_32(i8 *, <16 x i32> %mask) nounwind alwaysinlin
}
define <16 x i64> @__masked_load_64(i8 *, <16 x i32> %mask) nounwind alwaysinline {
define <16 x i64> @__masked_load_i64(i8 *, <16 x i32> %mask) nounwind alwaysinline {
; double up masks, bitcast to doubles
%mask0 = shufflevector <16 x i32> %mask, <16 x i32> undef,
<8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3>
@@ -419,18 +448,18 @@ define <16 x i64> @__masked_load_64(i8 *, <16 x i32> %mask) nounwind alwaysinlin
<8 x i32> <i32 8, i32 8, i32 9, i32 9, i32 10, i32 10, i32 11, i32 11>
%mask3 = shufflevector <16 x i32> %mask, <16 x i32> undef,
<8 x i32> <i32 12, i32 12, i32 13, i32 13, i32 14, i32 14, i32 15, i32 15>
%mask0d = bitcast <8 x i32> %mask0 to <4 x double>
%mask1d = bitcast <8 x i32> %mask1 to <4 x double>
%mask2d = bitcast <8 x i32> %mask2 to <4 x double>
%mask3d = bitcast <8 x i32> %mask3 to <4 x double>
%mask0d = bitcast <8 x i32> %mask0 to <4 x MdORi64>
%mask1d = bitcast <8 x i32> %mask1 to <4 x MdORi64>
%mask2d = bitcast <8 x i32> %mask2 to <4 x MdORi64>
%mask3d = bitcast <8 x i32> %mask3 to <4 x MdORi64>
%val0d = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8 * %0, <4 x double> %mask0d)
%ptr1 = getelementptr i8 * %0, i32 32
%val1d = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8 * %ptr1, <4 x double> %mask1d)
%ptr2 = getelementptr i8 * %0, i32 64
%val2d = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8 * %ptr2, <4 x double> %mask2d)
%ptr3 = getelementptr i8 * %0, i32 96
%val3d = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8 * %ptr3, <4 x double> %mask3d)
%val0d = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8 * %0, <4 x MdORi64> %mask0d)
%ptr1 = getelementptr PTR_OP_ARGS(`i8') %0, i32 32
%val1d = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8 * %ptr1, <4 x MdORi64> %mask1d)
%ptr2 = getelementptr PTR_OP_ARGS(`i8') %0, i32 64
%val2d = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8 * %ptr2, <4 x MdORi64> %mask2d)
%ptr3 = getelementptr PTR_OP_ARGS(`i8') %0, i32 96
%val3d = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8 * %ptr3, <4 x MdORi64> %mask3d)
%val01 = shufflevector <4 x double> %val0d, <4 x double> %val1d,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
@@ -443,6 +472,7 @@ define <16 x i64> @__masked_load_64(i8 *, <16 x i32> %mask) nounwind alwaysinlin
ret <16 x i64> %val
}
masked_load_float_double()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; masked store
@@ -450,38 +480,38 @@ define <16 x i64> @__masked_load_64(i8 *, <16 x i32> %mask) nounwind alwaysinlin
; FIXME: there is no AVX instruction for these, but we could be clever
; by packing the bits down and setting the last 3/4 or half, respectively,
; of the mask to zero... Not sure if this would be a win in the end
gen_masked_store(16, i8, 8)
gen_masked_store(16, i16, 16)
gen_masked_store(i8)
gen_masked_store(i16)
; note that mask is the 2nd parameter, not the 3rd one!!
declare void @llvm.x86.avx.maskstore.ps.256(i8 *, <8 x float>, <8 x float>)
declare void @llvm.x86.avx.maskstore.pd.256(i8 *, <4 x double>, <4 x double>)
declare void @llvm.x86.avx.maskstore.ps.256(i8 *, <8 x MfORi32>, <8 x float>)
declare void @llvm.x86.avx.maskstore.pd.256(i8 *, <4 x MdORi64>, <4 x double>)
define void @__masked_store_32(<16 x i32>* nocapture, <16 x i32>,
<16 x i32>) nounwind alwaysinline {
define void @__masked_store_i32(<16 x i32>* nocapture, <16 x i32>,
<16 x i32>) nounwind alwaysinline {
%ptr = bitcast <16 x i32> * %0 to i8 *
%val = bitcast <16 x i32> %1 to <16 x float>
%mask = bitcast <16 x i32> %2 to <16 x float>
%mask = bitcast <16 x i32> %2 to <16 x MfORi32>
%val0 = shufflevector <16 x float> %val, <16 x float> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%val1 = shufflevector <16 x float> %val, <16 x float> undef,
<8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%mask0 = shufflevector <16 x float> %mask, <16 x float> undef,
%mask0 = shufflevector <16 x MfORi32> %mask, <16 x MfORi32> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%mask1 = shufflevector <16 x float> %mask, <16 x float> undef,
%mask1 = shufflevector <16 x MfORi32> %mask, <16 x MfORi32> undef,
<8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
call void @llvm.x86.avx.maskstore.ps.256(i8 * %ptr, <8 x float> %mask0, <8 x float> %val0)
%ptr1 = getelementptr i8 * %ptr, i32 32
call void @llvm.x86.avx.maskstore.ps.256(i8 * %ptr1, <8 x float> %mask1, <8 x float> %val1)
call void @llvm.x86.avx.maskstore.ps.256(i8 * %ptr, <8 x MfORi32> %mask0, <8 x float> %val0)
%ptr1 = getelementptr PTR_OP_ARGS(`i8') %ptr, i32 32
call void @llvm.x86.avx.maskstore.ps.256(i8 * %ptr1, <8 x MfORi32> %mask1, <8 x float> %val1)
ret void
}
define void @__masked_store_64(<16 x i64>* nocapture, <16 x i64>,
<16 x i32> %mask) nounwind alwaysinline {
define void @__masked_store_i64(<16 x i64>* nocapture, <16 x i64>,
<16 x i32> %mask) nounwind alwaysinline {
%ptr = bitcast <16 x i64> * %0 to i8 *
%val = bitcast <16 x i64> %1 to <16 x double>
@@ -494,10 +524,10 @@ define void @__masked_store_64(<16 x i64>* nocapture, <16 x i64>,
<8 x i32> <i32 8, i32 8, i32 9, i32 9, i32 10, i32 10, i32 11, i32 11>
%mask3 = shufflevector <16 x i32> %mask, <16 x i32> undef,
<8 x i32> <i32 12, i32 12, i32 13, i32 13, i32 14, i32 14, i32 15, i32 15>
%mask0d = bitcast <8 x i32> %mask0 to <4 x double>
%mask1d = bitcast <8 x i32> %mask1 to <4 x double>
%mask2d = bitcast <8 x i32> %mask2 to <4 x double>
%mask3d = bitcast <8 x i32> %mask3 to <4 x double>
%mask0d = bitcast <8 x i32> %mask0 to <4 x MdORi64>
%mask1d = bitcast <8 x i32> %mask1 to <4 x MdORi64>
%mask2d = bitcast <8 x i32> %mask2 to <4 x MdORi64>
%mask3d = bitcast <8 x i32> %mask3 to <4 x MdORi64>
%val0 = shufflevector <16 x double> %val, <16 x double> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
@@ -508,27 +538,28 @@ define void @__masked_store_64(<16 x i64>* nocapture, <16 x i64>,
%val3 = shufflevector <16 x double> %val, <16 x double> undef,
<4 x i32> <i32 12, i32 13, i32 14, i32 15>
call void @llvm.x86.avx.maskstore.pd.256(i8 * %ptr, <4 x double> %mask0d, <4 x double> %val0)
%ptr1 = getelementptr i8 * %ptr, i32 32
call void @llvm.x86.avx.maskstore.pd.256(i8 * %ptr1, <4 x double> %mask1d, <4 x double> %val1)
%ptr2 = getelementptr i8 * %ptr, i32 64
call void @llvm.x86.avx.maskstore.pd.256(i8 * %ptr2, <4 x double> %mask2d, <4 x double> %val2)
%ptr3 = getelementptr i8 * %ptr, i32 96
call void @llvm.x86.avx.maskstore.pd.256(i8 * %ptr3, <4 x double> %mask3d, <4 x double> %val3)
call void @llvm.x86.avx.maskstore.pd.256(i8 * %ptr, <4 x MdORi64> %mask0d, <4 x double> %val0)
%ptr1 = getelementptr PTR_OP_ARGS(`i8') %ptr, i32 32
call void @llvm.x86.avx.maskstore.pd.256(i8 * %ptr1, <4 x MdORi64> %mask1d, <4 x double> %val1)
%ptr2 = getelementptr PTR_OP_ARGS(`i8') %ptr, i32 64
call void @llvm.x86.avx.maskstore.pd.256(i8 * %ptr2, <4 x MdORi64> %mask2d, <4 x double> %val2)
%ptr3 = getelementptr PTR_OP_ARGS(`i8') %ptr, i32 96
call void @llvm.x86.avx.maskstore.pd.256(i8 * %ptr3, <4 x MdORi64> %mask3d, <4 x double> %val3)
ret void
}
masked_store_float_double()
masked_store_blend_8_16_by_16()
declare <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>, <8 x float>,
<8 x float>) nounwind readnone
define void @__masked_store_blend_32(<16 x i32>* nocapture, <16 x i32>,
<16 x i32>) nounwind alwaysinline {
define void @__masked_store_blend_i32(<16 x i32>* nocapture, <16 x i32>,
<16 x i32>) nounwind alwaysinline {
%maskAsFloat = bitcast <16 x i32> %2 to <16 x float>
%oldValue = load <16 x i32>* %0, align 4
%oldValue = load PTR_OP_ARGS(`<16 x i32>') %0, align 4
%oldAsFloat = bitcast <16 x i32> %oldValue to <16 x float>
%newAsFloat = bitcast <16 x i32> %1 to <16 x float>
@@ -563,9 +594,9 @@ define void @__masked_store_blend_32(<16 x i32>* nocapture, <16 x i32>,
declare <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double>, <4 x double>,
<4 x double>) nounwind readnone
define void @__masked_store_blend_64(<16 x i64>* nocapture %ptr, <16 x i64> %newi64,
<16 x i32> %mask) nounwind alwaysinline {
%oldValue = load <16 x i64>* %ptr, align 8
define void @__masked_store_blend_i64(<16 x i64>* nocapture %ptr, <16 x i64> %newi64,
<16 x i32> %mask) nounwind alwaysinline {
%oldValue = load PTR_OP_ARGS(`<16 x i64>') %ptr, align 8
%old = bitcast <16 x i64> %oldValue to <16 x double>
%old0d = shufflevector <16 x double> %old, <16 x double> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
@@ -622,17 +653,14 @@ define void @__masked_store_blend_64(<16 x i64>* nocapture %ptr, <16 x i64> %new
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather/scatter
;; scatter
gen_gather(16, i8)
gen_gather(16, i16)
gen_gather(16, i32)
gen_gather(16, i64)
gen_scatter(16, i8)
gen_scatter(16, i16)
gen_scatter(16, i32)
gen_scatter(16, i64)
gen_scatter(i8)
gen_scatter(i16)
gen_scatter(i32)
gen_scatter(float)
gen_scatter(i64)
gen_scatter(double)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision sqrt
@@ -660,3 +688,12 @@ define <16 x double> @__max_varying_double(<16 x double>, <16 x double>) nounwin
binary4to16(ret, double, @llvm.x86.avx.max.pd.256, %0, %1)
ret <16 x double> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reciprocals in double precision, if supported
rsqrtd_decl()
rcpd_decl()
transcendetals_decl()
trigonometry_decl()

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2010-2011, Intel Corporation
;; Copyright (c) 2010-2015, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -49,11 +49,10 @@ include(`target-avx-common.ll')
declare <8 x float> @llvm.x86.avx.rcp.ps.256(<8 x float>) nounwind readnone
define <8 x float> @__rcp_varying_float(<8 x float>) nounwind readonly alwaysinline {
; do one N-R iteration to improve precision
; float iv = __rcp_v(v);
; return iv * (2. - v * iv);
%call = call <8 x float> @llvm.x86.avx.rcp.ps.256(<8 x float> %0)
; do one N-R iteration
%v_iv = fmul <8 x float> %0, %call
%two_minus = fsub <8 x float> <float 2., float 2., float 2., float 2.,
float 2., float 2., float 2., float 2.>, %v_iv
@@ -61,6 +60,46 @@ define <8 x float> @__rcp_varying_float(<8 x float>) nounwind readonly alwaysinl
ret <8 x float> %iv_mul
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rsqrt
declare <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float>) nounwind readnone
define <8 x float> @__rsqrt_varying_float(<8 x float> %v) nounwind readonly alwaysinline {
; float is = __rsqrt_v(v);
%is = call <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float> %v)
; Newton-Raphson iteration to improve precision
; return 0.5 * is * (3. - (v * is) * is);
%v_is = fmul <8 x float> %v, %is
%v_is_is = fmul <8 x float> %v_is, %is
%three_sub = fsub <8 x float> <float 3., float 3., float 3., float 3.,
float 3., float 3., float 3., float 3.>, %v_is_is
%is_mul = fmul <8 x float> %is, %three_sub
%half_scale = fmul <8 x float> <float 0.5, float 0.5, float 0.5, float 0.5,
float 0.5, float 0.5, float 0.5, float 0.5>, %is_mul
ret <8 x float> %half_scale
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; sqrt
declare <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float>) nounwind readnone
define <8 x float> @__sqrt_varying_float(<8 x float>) nounwind readonly alwaysinline {
%call = call <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float> %0)
ret <8 x float> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision sqrt
declare <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double>) nounwind readnone
define <8 x double> @__sqrt_varying_double(<8 x double>) nounwind alwaysinline {
unary4to8(ret, double, @llvm.x86.avx.sqrt.pd.256, %0)
ret <8 x double> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rounding floats
@@ -94,63 +133,15 @@ define <8 x double> @__round_varying_double(<8 x double>) nounwind readonly alwa
}
define <8 x double> @__floor_varying_double(<8 x double>) nounwind readonly alwaysinline {
; roundpd, round down 0b01 | don't signal precision exceptions 0b1000 = 9
; roundpd, round down 0b01 | don't signal precision exceptions 0b1001 = 9
round4to8double(%0, 9)
}
define <8 x double> @__ceil_varying_double(<8 x double>) nounwind readonly alwaysinline {
; roundpd, round up 0b10 | don't signal precision exceptions 0b1000 = 10
; roundpd, round up 0b10 | don't signal precision exceptions 0b1010 = 10
round4to8double(%0, 10)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rsqrt
declare <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float>) nounwind readnone
define <8 x float> @__rsqrt_varying_float(<8 x float> %v) nounwind readonly alwaysinline {
; float is = __rsqrt_v(v);
%is = call <8 x float> @llvm.x86.avx.rsqrt.ps.256(<8 x float> %v)
; return 0.5 * is * (3. - (v * is) * is);
%v_is = fmul <8 x float> %v, %is
%v_is_is = fmul <8 x float> %v_is, %is
%three_sub = fsub <8 x float> <float 3., float 3., float 3., float 3.,
float 3., float 3., float 3., float 3.>, %v_is_is
%is_mul = fmul <8 x float> %is, %three_sub
%half_scale = fmul <8 x float> <float 0.5, float 0.5, float 0.5, float 0.5,
float 0.5, float 0.5, float 0.5, float 0.5>, %is_mul
ret <8 x float> %half_scale
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; sqrt
declare <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float>) nounwind readnone
define <8 x float> @__sqrt_varying_float(<8 x float>) nounwind readonly alwaysinline {
%call = call <8 x float> @llvm.x86.avx.sqrt.ps.256(<8 x float> %0)
ret <8 x float> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; svml
; FIXME: need either to wire these up to the 8-wide SVML entrypoints,
; or, use the macro to call the 4-wide ones twice with our 8-wide
; vectors...
declare <8 x float> @__svml_sin(<8 x float>)
declare <8 x float> @__svml_cos(<8 x float>)
declare void @__svml_sincos(<8 x float>, <8 x float> *, <8 x float> *)
declare <8 x float> @__svml_tan(<8 x float>)
declare <8 x float> @__svml_atan(<8 x float>)
declare <8 x float> @__svml_atan2(<8 x float>, <8 x float>)
declare <8 x float> @__svml_exp(<8 x float>)
declare <8 x float> @__svml_log(<8 x float>)
declare <8 x float> @__svml_pow(<8 x float>, <8 x float>)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float min/max
@@ -158,56 +149,84 @@ declare <8 x float> @llvm.x86.avx.max.ps.256(<8 x float>, <8 x float>) nounwind
declare <8 x float> @llvm.x86.avx.min.ps.256(<8 x float>, <8 x float>) nounwind readnone
define <8 x float> @__max_varying_float(<8 x float>,
<8 x float>) nounwind readonly alwaysinline {
<8 x float>) nounwind readonly alwaysinline {
%call = call <8 x float> @llvm.x86.avx.max.ps.256(<8 x float> %0, <8 x float> %1)
ret <8 x float> %call
}
define <8 x float> @__min_varying_float(<8 x float>,
<8 x float>) nounwind readonly alwaysinline {
<8 x float>) nounwind readonly alwaysinline {
%call = call <8 x float> @llvm.x86.avx.min.ps.256(<8 x float> %0, <8 x float> %1)
ret <8 x float> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int min/max
;; double precision min/max
define <8 x i32> @__min_varying_int32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(ret, i32, @llvm.x86.sse41.pminsd, %0, %1)
ret <8 x i32> %ret
declare <4 x double> @llvm.x86.avx.max.pd.256(<4 x double>, <4 x double>) nounwind readnone
declare <4 x double> @llvm.x86.avx.min.pd.256(<4 x double>, <4 x double>) nounwind readnone
define <8 x double> @__min_varying_double(<8 x double>, <8 x double>) nounwind readnone alwaysinline {
binary4to8(ret, double, @llvm.x86.avx.min.pd.256, %0, %1)
ret <8 x double> %ret
}
define <8 x i32> @__max_varying_int32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(ret, i32, @llvm.x86.sse41.pmaxsd, %0, %1)
ret <8 x i32> %ret
define <8 x double> @__max_varying_double(<8 x double>, <8 x double>) nounwind readnone alwaysinline {
binary4to8(ret, double, @llvm.x86.avx.max.pd.256, %0, %1)
ret <8 x double> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unsigned int min/max
;; svml
define <8 x i32> @__min_varying_uint32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(ret, i32, @llvm.x86.sse41.pminud, %0, %1)
ret <8 x i32> %ret
}
include(`svml.m4')
;; single precision
svml_declare(float,f8,8)
svml_define(float,f8,8,f)
;; double precision
svml_declare(double,4,4)
svml_define_x(double,4,4,d,8)
define <8 x i32> @__max_varying_uint32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(ret, i32, @llvm.x86.sse41.pmaxud, %0, %1)
ret <8 x i32> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; horizontal ops
;; mask handling
declare i32 @llvm.x86.avx.movmsk.ps.256(<8 x float>) nounwind readnone
define i32 @__movmsk(<8 x i32>) nounwind readnone alwaysinline {
define i64 @__movmsk(<8 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <8 x i32> %0 to <8 x float>
%v = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %floatmask) nounwind readnone
ret i32 %v
%v64 = zext i32 %v to i64
ret i64 %v64
}
define i1 @__any(<8 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <8 x i32> %0 to <8 x float>
%v = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %floatmask) nounwind readnone
%cmp = icmp ne i32 %v, 0
ret i1 %cmp
}
define i1 @__all(<8 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <8 x i32> %0 to <8 x float>
%v = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %floatmask) nounwind readnone
%cmp = icmp eq i32 %v, 255
ret i1 %cmp
}
define i1 @__none(<8 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <8 x i32> %0 to <8 x float>
%v = call i32 @llvm.x86.avx.movmsk.ps.256(<8 x float> %floatmask) nounwind readnone
%cmp = icmp eq i32 %v, 0
ret i1 %cmp
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal ops / reductions
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal float ops
@@ -222,65 +241,14 @@ define float @__reduce_add_float(<8 x float>) nounwind readonly alwaysinline {
ret float %sum
}
define float @__reduce_min_float(<8 x float>) nounwind readnone alwaysinline {
reduce8(float, @__min_varying_float, @__min_uniform_float)
}
define float @__reduce_max_float(<8 x float>) nounwind readnone alwaysinline {
reduce8(float, @__max_varying_float, @__max_uniform_float)
}
reduce_equal(8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int32 ops
define <8 x i32> @__add_varying_int32(<8 x i32>,
<8 x i32>) nounwind readnone alwaysinline {
%s = add <8 x i32> %0, %1
ret <8 x i32> %s
}
define i32 @__add_uniform_int32(i32, i32) nounwind readnone alwaysinline {
%s = add i32 %0, %1
ret i32 %s
}
define i32 @__reduce_add_int32(<8 x i32>) nounwind readnone alwaysinline {
reduce8(i32, @__add_varying_int32, @__add_uniform_int32)
}
define i32 @__reduce_min_int32(<8 x i32>) nounwind readnone alwaysinline {
reduce8(i32, @__min_varying_int32, @__min_uniform_int32)
}
define i32 @__reduce_max_int32(<8 x i32>) nounwind readnone alwaysinline {
reduce8(i32, @__max_varying_int32, @__max_uniform_int32)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; horizontal uint32 ops
define i32 @__reduce_add_uint32(<8 x i32> %v) nounwind readnone alwaysinline {
%r = call i32 @__reduce_add_int32(<8 x i32> %v)
ret i32 %r
}
define i32 @__reduce_min_uint32(<8 x i32>) nounwind readnone alwaysinline {
reduce8(i32, @__min_varying_uint32, @__min_uniform_uint32)
}
define i32 @__reduce_max_uint32(<8 x i32>) nounwind readnone alwaysinline {
reduce8(i32, @__max_varying_uint32, @__max_uniform_uint32)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal double ops
@@ -304,17 +272,89 @@ define double @__reduce_min_double(<8 x double>) nounwind readnone alwaysinline
reduce8(double, @__min_varying_double, @__min_uniform_double)
}
define double @__reduce_max_double(<8 x double>) nounwind readnone alwaysinline {
reduce8(double, @__max_varying_double, @__max_uniform_double)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int8 ops
declare <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8>, <16 x i8>) nounwind readnone
define i16 @__reduce_add_int8(<8 x i8>) nounwind readnone alwaysinline {
%wide8 = shufflevector <8 x i8> %0, <8 x i8> zeroinitializer,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
%rv = call <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8> %wide8,
<16 x i8> zeroinitializer)
%r0 = extractelement <2 x i64> %rv, i32 0
%r1 = extractelement <2 x i64> %rv, i32 1
%r = add i64 %r0, %r1
%r16 = trunc i64 %r to i16
ret i16 %r16
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int16 ops
define internal <8 x i16> @__add_varying_i16(<8 x i16>,
<8 x i16>) nounwind readnone alwaysinline {
%r = add <8 x i16> %0, %1
ret <8 x i16> %r
}
define internal i16 @__add_uniform_i16(i16, i16) nounwind readnone alwaysinline {
%r = add i16 %0, %1
ret i16 %r
}
define i16 @__reduce_add_int16(<8 x i16>) nounwind readnone alwaysinline {
reduce8(i16, @__add_varying_i16, @__add_uniform_i16)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int32 ops
;; helper functions
define <8 x i32> @__add_varying_int32(<8 x i32>,
<8 x i32>) nounwind readnone alwaysinline {
%s = add <8 x i32> %0, %1
ret <8 x i32> %s
}
define i32 @__add_uniform_int32(i32, i32) nounwind readnone alwaysinline {
%s = add i32 %0, %1
ret i32 %s
}
;; reduction functions
define i32 @__reduce_add_int32(<8 x i32>) nounwind readnone alwaysinline {
reduce8(i32, @__add_varying_int32, @__add_uniform_int32)
}
define i32 @__reduce_min_int32(<8 x i32>) nounwind readnone alwaysinline {
reduce8(i32, @__min_varying_int32, @__min_uniform_int32)
}
define i32 @__reduce_max_int32(<8 x i32>) nounwind readnone alwaysinline {
reduce8(i32, @__max_varying_int32, @__max_uniform_int32)
}
define i32 @__reduce_min_uint32(<8 x i32>) nounwind readnone alwaysinline {
reduce8(i32, @__min_varying_uint32, @__min_uniform_uint32)
}
define i32 @__reduce_max_uint32(<8 x i32>) nounwind readnone alwaysinline {
reduce8(i32, @__max_varying_uint32, @__max_uniform_uint32)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int64 ops
;; helper functions
define <8 x i64> @__add_varying_int64(<8 x i64>,
<8 x i64>) nounwind readnone alwaysinline {
<8 x i64>) nounwind readnone alwaysinline {
%s = add <8 x i64> %0, %1
ret <8 x i64> %s
}
@@ -324,6 +364,7 @@ define i64 @__add_uniform_int64(i64, i64) nounwind readnone alwaysinline {
ret i64 %s
}
;; reduction functions
define i64 @__reduce_add_int64(<8 x i64>) nounwind readnone alwaysinline {
reduce8(i64, @__add_varying_int64, @__add_uniform_int64)
}
@@ -339,14 +380,6 @@ define i64 @__reduce_max_int64(<8 x i64>) nounwind readnone alwaysinline {
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; horizontal uint64 ops
define i64 @__reduce_add_uint64(<8 x i64> %v) nounwind readnone alwaysinline {
%r = call i64 @__reduce_add_int64(<8 x i64> %v)
ret i64 %r
}
define i64 @__reduce_min_uint64(<8 x i64>) nounwind readnone alwaysinline {
reduce8(i64, @__min_varying_uint64, @__min_uniform_uint64)
}
@@ -356,42 +389,39 @@ define i64 @__reduce_max_uint64(<8 x i64>) nounwind readnone alwaysinline {
reduce8(i64, @__max_varying_uint64, @__max_uniform_uint64)
}
reduce_equal(8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unaligned loads/loads+broadcasts
load_and_broadcast(8, i8, 8)
load_and_broadcast(8, i16, 16)
load_and_broadcast(8, i32, 32)
load_and_broadcast(8, i64, 64)
; no masked load instruction for i8 and i16 types??
masked_load(8, i8, 8, 1)
masked_load(8, i16, 16, 2)
masked_load(i8, 1)
masked_load(i16, 2)
declare <8 x float> @llvm.x86.avx.maskload.ps.256(i8 *, <8 x float> %mask)
declare <4 x double> @llvm.x86.avx.maskload.pd.256(i8 *, <4 x double> %mask)
declare <8 x float> @llvm.x86.avx.maskload.ps.256(i8 *, <8 x MfORi32> %mask)
declare <4 x double> @llvm.x86.avx.maskload.pd.256(i8 *, <4 x MdORi64> %mask)
define <8 x i32> @__masked_load_32(i8 *, <8 x i32> %mask) nounwind alwaysinline {
%floatmask = bitcast <8 x i32> %mask to <8 x float>
%floatval = call <8 x float> @llvm.x86.avx.maskload.ps.256(i8 * %0, <8 x float> %floatmask)
define <8 x i32> @__masked_load_i32(i8 *, <8 x i32> %mask) nounwind alwaysinline {
%floatmask = bitcast <8 x i32> %mask to <8 x MfORi32>
%floatval = call <8 x float> @llvm.x86.avx.maskload.ps.256(i8 * %0, <8 x MfORi32> %floatmask)
%retval = bitcast <8 x float> %floatval to <8 x i32>
ret <8 x i32> %retval
}
define <8 x i64> @__masked_load_64(i8 *, <8 x i32> %mask) nounwind alwaysinline {
define <8 x i64> @__masked_load_i64(i8 *, <8 x i32> %mask) nounwind alwaysinline {
; double up masks, bitcast to doubles
%mask0 = shufflevector <8 x i32> %mask, <8 x i32> undef,
<8 x i32> <i32 0, i32 0, i32 1, i32 1, i32 2, i32 2, i32 3, i32 3>
%mask1 = shufflevector <8 x i32> %mask, <8 x i32> undef,
<8 x i32> <i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>
%mask0d = bitcast <8 x i32> %mask0 to <4 x double>
%mask1d = bitcast <8 x i32> %mask1 to <4 x double>
%mask0d = bitcast <8 x i32> %mask0 to <4 x MdORi64>
%mask1d = bitcast <8 x i32> %mask1 to <4 x MdORi64>
%val0d = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8 * %0, <4 x double> %mask0d)
%ptr1 = getelementptr i8 * %0, i32 32
%val1d = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8 * %ptr1, <4 x double> %mask1d)
%val0d = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8 * %0, <4 x MdORi64> %mask0d)
%ptr1 = getelementptr PTR_OP_ARGS(`i8') %0, i32 32
%val1d = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8 * %ptr1, <4 x MdORi64> %mask1d)
%vald = shufflevector <4 x double> %val0d, <4 x double> %val1d,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
@@ -399,31 +429,29 @@ define <8 x i64> @__masked_load_64(i8 *, <8 x i32> %mask) nounwind alwaysinline
ret <8 x i64> %val
}
masked_load_float_double()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; masked store
; FIXME: there is no AVX instruction for these, but we could be clever
; by packing the bits down and setting the last 3/4 or half, respectively,
; of the mask to zero... Not sure if this would be a win in the end
gen_masked_store(8, i8, 8)
gen_masked_store(8, i16, 16)
gen_masked_store(i8)
gen_masked_store(i16)
; note that mask is the 2nd parameter, not the 3rd one!!
declare void @llvm.x86.avx.maskstore.ps.256(i8 *, <8 x float>, <8 x float>)
declare void @llvm.x86.avx.maskstore.pd.256(i8 *, <4 x double>, <4 x double>)
declare void @llvm.x86.avx.maskstore.ps.256(i8 *, <8 x MfORi32>, <8 x float>)
declare void @llvm.x86.avx.maskstore.pd.256(i8 *, <4 x MdORi64>, <4 x double>)
define void @__masked_store_32(<8 x i32>* nocapture, <8 x i32>,
<8 x i32>) nounwind alwaysinline {
define void @__masked_store_i32(<8 x i32>* nocapture, <8 x i32>,
<8 x i32>) nounwind alwaysinline {
%ptr = bitcast <8 x i32> * %0 to i8 *
%val = bitcast <8 x i32> %1 to <8 x float>
%mask = bitcast <8 x i32> %2 to <8 x float>
call void @llvm.x86.avx.maskstore.ps.256(i8 * %ptr, <8 x float> %mask, <8 x float> %val)
%mask = bitcast <8 x i32> %2 to <8 x MfORi32>
call void @llvm.x86.avx.maskstore.ps.256(i8 * %ptr, <8 x MfORi32> %mask, <8 x float> %val)
ret void
}
define void @__masked_store_64(<8 x i64>* nocapture, <8 x i64>,
<8 x i32> %mask) nounwind alwaysinline {
define void @__masked_store_i64(<8 x i64>* nocapture, <8 x i64>,
<8 x i32> %mask) nounwind alwaysinline {
%ptr = bitcast <8 x i64> * %0 to i8 *
%val = bitcast <8 x i64> %1 to <8 x double>
@@ -432,31 +460,34 @@ define void @__masked_store_64(<8 x i64>* nocapture, <8 x i64>,
%mask1 = shufflevector <8 x i32> %mask, <8 x i32> undef,
<8 x i32> <i32 4, i32 4, i32 5, i32 5, i32 6, i32 6, i32 7, i32 7>
%mask0d = bitcast <8 x i32> %mask0 to <4 x double>
%mask1d = bitcast <8 x i32> %mask1 to <4 x double>
%mask0d = bitcast <8 x i32> %mask0 to <4 x MdORi64>
%mask1d = bitcast <8 x i32> %mask1 to <4 x MdORi64>
%val0 = shufflevector <8 x double> %val, <8 x double> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
%val1 = shufflevector <8 x double> %val, <8 x double> undef,
<4 x i32> <i32 4, i32 5, i32 6, i32 7>
call void @llvm.x86.avx.maskstore.pd.256(i8 * %ptr, <4 x double> %mask0d, <4 x double> %val0)
%ptr1 = getelementptr i8 * %ptr, i32 32
call void @llvm.x86.avx.maskstore.pd.256(i8 * %ptr1, <4 x double> %mask1d, <4 x double> %val1)
call void @llvm.x86.avx.maskstore.pd.256(i8 * %ptr, <4 x MdORi64> %mask0d, <4 x double> %val0)
%ptr1 = getelementptr PTR_OP_ARGS(`i8') %ptr, i32 32
call void @llvm.x86.avx.maskstore.pd.256(i8 * %ptr1, <4 x MdORi64> %mask1d, <4 x double> %val1)
ret void
}
masked_store_float_double()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; masked store blend
masked_store_blend_8_16_by_8()
declare <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>, <8 x float>,
<8 x float>) nounwind readnone
define void @__masked_store_blend_32(<8 x i32>* nocapture, <8 x i32>,
<8 x i32>) nounwind alwaysinline {
define void @__masked_store_blend_i32(<8 x i32>* nocapture, <8 x i32>,
<8 x i32>) nounwind alwaysinline {
%mask_as_float = bitcast <8 x i32> %2 to <8 x float>
%oldValue = load <8 x i32>* %0, align 4
%oldValue = load PTR_OP_ARGS(`<8 x i32>') %0, align 4
%oldAsFloat = bitcast <8 x i32> %oldValue to <8 x float>
%newAsFloat = bitcast <8 x i32> %1 to <8 x float>
%blend = call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> %oldAsFloat,
@@ -468,9 +499,9 @@ define void @__masked_store_blend_32(<8 x i32>* nocapture, <8 x i32>,
}
define void @__masked_store_blend_64(<8 x i64>* nocapture %ptr, <8 x i64> %new,
<8 x i32> %i32mask) nounwind alwaysinline {
%oldValue = load <8 x i64>* %ptr, align 8
define void @__masked_store_blend_i64(<8 x i64>* nocapture %ptr, <8 x i64> %new,
<8 x i32> %i32mask) nounwind alwaysinline {
%oldValue = load PTR_OP_ARGS(`<8 x i64>') %ptr, align 8
%mask = bitcast <8 x i32> %i32mask to <8 x float>
; Do 4x64-bit blends by doing two <8 x i32> blends, where the <8 x i32> values
@@ -518,44 +549,21 @@ define void @__masked_store_blend_64(<8 x i64>* nocapture %ptr, <8 x i64> %new,
ret void
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; scatter
gen_scatter(i8)
gen_scatter(i16)
gen_scatter(i32)
gen_scatter(float)
gen_scatter(i64)
gen_scatter(double)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather/scatter
;; reciprocals in double precision, if supported
gen_gather(8, i8)
gen_gather(8, i16)
gen_gather(8, i32)
gen_gather(8, i64)
gen_scatter(8, i8)
gen_scatter(8, i16)
gen_scatter(8, i32)
gen_scatter(8, i64)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision sqrt
declare <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double>) nounwind readnone
define <8 x double> @__sqrt_varying_double(<8 x double>) nounwind alwaysinline {
unary4to8(ret, double, @llvm.x86.avx.sqrt.pd.256, %0)
ret <8 x double> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision min/max
declare <4 x double> @llvm.x86.avx.max.pd.256(<4 x double>, <4 x double>) nounwind readnone
declare <4 x double> @llvm.x86.avx.min.pd.256(<4 x double>, <4 x double>) nounwind readnone
define <8 x double> @__min_varying_double(<8 x double>, <8 x double>) nounwind readnone alwaysinline {
binary4to8(ret, double, @llvm.x86.avx.min.pd.256, %0, %1)
ret <8 x double> %ret
}
define <8 x double> @__max_varying_double(<8 x double>, <8 x double>) nounwind readnone alwaysinline {
binary4to8(ret, double, @llvm.x86.avx.max.pd.256, %0, %1)
ret <8 x double> %ret
}
rsqrtd_decl()
rcpd_decl()
transcendetals_decl()
trigonometry_decl()

View File

@@ -0,0 +1,81 @@
;; Copyright (c) 2013, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
include(`target-avx1-i64x4base.ll')
rdrand_decls()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int min/max
define <4 x i32> @__min_varying_int32(<4 x i32>, <4 x i32>) nounwind readonly alwaysinline {
%call = call <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %call
}
define <4 x i32> @__max_varying_int32(<4 x i32>, <4 x i32>) nounwind readonly alwaysinline {
%call = call <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unsigned int min/max
define <4 x i32> @__min_varying_uint32(<4 x i32>, <4 x i32>) nounwind readonly alwaysinline {
%call = call <4 x i32> @llvm.x86.sse41.pminud(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %call
}
define <4 x i32> @__max_varying_uint32(<4 x i32>, <4 x i32>) nounwind readonly alwaysinline {
%call = call <4 x i32> @llvm.x86.sse41.pmaxud(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
ifelse(NO_HALF_DECLARES, `1', `', `
declare float @__half_to_float_uniform(i16 %v) nounwind readnone
declare <WIDTH x float> @__half_to_float_varying(<WIDTH x i16> %v) nounwind readnone
declare i16 @__float_to_half_uniform(float %v) nounwind readnone
declare <WIDTH x i16> @__float_to_half_varying(<WIDTH x float> %v) nounwind readnone
')
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather
gen_gather_factored(i8)
gen_gather_factored(i16)
gen_gather_factored(i32)
gen_gather_factored(float)
gen_gather_factored(i64)
gen_gather_factored(double)

View File

@@ -0,0 +1,519 @@
;; Copyright (c) 2013-2015, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Basic 4-wide definitions
define(`WIDTH',`4')
define(`MASK',`i64')
include(`util.m4')
stdlib_core()
packed_load_and_store()
scans()
int64minmax()
saturation_arithmetic()
include(`target-avx-common.ll')
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rcp
;; sse intrinsic
declare <4 x float> @llvm.x86.sse.rcp.ps(<4 x float>) nounwind readnone
define <4 x float> @__rcp_varying_float(<4 x float>) nounwind readonly alwaysinline {
; float iv = __rcp_v(v);
; return iv * (2. - v * iv);
%call = call <4 x float> @llvm.x86.sse.rcp.ps(<4 x float> %0)
; do one N-R iteration
%v_iv = fmul <4 x float> %0, %call
%two_minus = fsub <4 x float> <float 2., float 2., float 2., float 2.>, %v_iv
%iv_mul = fmul <4 x float> %call, %two_minus
ret <4 x float> %iv_mul
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rounding floats
;; sse intrinsic
declare <4 x float> @llvm.x86.sse41.round.ps(<4 x float>, i32) nounwind readnone
define <4 x float> @__round_varying_float(<4 x float>) nounwind readonly alwaysinline {
; roundps, round mode nearest 0b00 | don't signal precision exceptions 0b1000 = 8
%call = call <4 x float> @llvm.x86.sse41.round.ps(<4 x float> %0, i32 8)
ret <4 x float> %call
}
define <4 x float> @__floor_varying_float(<4 x float>) nounwind readonly alwaysinline {
; roundps, round down 0b01 | don't signal precision exceptions 0b1001 = 9
%call = call <4 x float> @llvm.x86.sse41.round.ps(<4 x float> %0, i32 9)
ret <4 x float> %call
}
define <4 x float> @__ceil_varying_float(<4 x float>) nounwind readonly alwaysinline {
; roundps, round up 0b10 | don't signal precision exceptions 0b1010 = 10
%call = call <4 x float> @llvm.x86.sse41.round.ps(<4 x float> %0, i32 10)
ret <4 x float> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rounding doubles
;; avx intrinsic
declare <4 x double> @llvm.x86.avx.round.pd.256(<4 x double>, i32) nounwind readnone
define <4 x double> @__round_varying_double(<4 x double>) nounwind readonly alwaysinline {
%call = call <4 x double> @llvm.x86.avx.round.pd.256(<4 x double> %0, i32 8)
ret <4 x double> %call
}
define <4 x double> @__floor_varying_double(<4 x double>) nounwind readonly alwaysinline {
; roundpd, round down 0b01 | don't signal precision exceptions 0b1000 = 9
%call = call <4 x double> @llvm.x86.avx.round.pd.256(<4 x double> %0, i32 9)
ret <4 x double> %call
}
define <4 x double> @__ceil_varying_double(<4 x double>) nounwind readonly alwaysinline {
; roundpd, round up 0b10 | don't signal precision exceptions 0b1000 = 10
%call = call <4 x double> @llvm.x86.avx.round.pd.256(<4 x double> %0, i32 10)
ret <4 x double> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rsqrt
;; sse intrinsic
declare <4 x float> @llvm.x86.sse.rsqrt.ps(<4 x float>) nounwind readnone
define <4 x float> @__rsqrt_varying_float(<4 x float> %v) nounwind readonly alwaysinline {
; float is = __rsqrt_v(v);
%is = call <4 x float> @llvm.x86.sse.rsqrt.ps(<4 x float> %v)
; Newton-Raphson iteration to improve precision
; return 0.5 * is * (3. - (v * is) * is);
%v_is = fmul <4 x float> %v, %is
%v_is_is = fmul <4 x float> %v_is, %is
%three_sub = fsub <4 x float> <float 3., float 3., float 3., float 3.>, %v_is_is
%is_mul = fmul <4 x float> %is, %three_sub
%half_scale = fmul <4 x float> <float 0.5, float 0.5, float 0.5, float 0.5>, %is_mul
ret <4 x float> %half_scale
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; sqrt
;; sse intrinsic
declare <4 x float> @llvm.x86.sse.sqrt.ps(<4 x float>) nounwind readnone
define <4 x float> @__sqrt_varying_float(<4 x float>) nounwind readonly alwaysinline {
%call = call <4 x float> @llvm.x86.sse.sqrt.ps(<4 x float> %0)
ret <4 x float> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision sqrt
;; avx<76> intrinsic
declare <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double>) nounwind readnone
define <4 x double> @__sqrt_varying_double(<4 x double>) nounwind alwaysinline {
%call = call <4 x double> @llvm.x86.avx.sqrt.pd.256(<4 x double> %0)
ret <4 x double> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; svml
include(`svml.m4')
;; single precision
svml_declare(float,f4,4)
svml_define(float,f4,4,f)
;; double precision
svml_declare(double,4,4)
svml_define(double,4,4,d)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float min/max
;; sse intrinsics
declare <4 x float> @llvm.x86.sse.max.ps(<4 x float>, <4 x float>) nounwind readnone
declare <4 x float> @llvm.x86.sse.min.ps(<4 x float>, <4 x float>) nounwind readnone
define <4 x float> @__max_varying_float(<4 x float>, <4 x float>) nounwind readonly alwaysinline {
%call = call <4 x float> @llvm.x86.sse.max.ps(<4 x float> %0, <4 x float> %1)
ret <4 x float> %call
}
define <4 x float> @__min_varying_float(<4 x float>, <4 x float>) nounwind readonly alwaysinline {
%call = call <4 x float> @llvm.x86.sse.min.ps(<4 x float> %0, <4 x float> %1)
ret <4 x float> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; horizontal ops
;; sse intrinsic
declare i32 @llvm.x86.avx.movmsk.pd.256(<4 x double>) nounwind readnone
define i64 @__movmsk(<4 x i64>) nounwind readnone alwaysinline {
%floatmask = bitcast <4 x i64> %0 to <4 x double>
%v = call i32 @llvm.x86.avx.movmsk.pd.256(<4 x double> %floatmask) nounwind readnone
%v64 = zext i32 %v to i64
ret i64 %v64
}
define i1 @__any(<4 x i64>) nounwind readnone alwaysinline {
%floatmask = bitcast <4 x i64> %0 to <4 x double>
%v = call i32 @llvm.x86.avx.movmsk.pd.256(<4 x double> %floatmask) nounwind readnone
%cmp = icmp ne i32 %v, 0
ret i1 %cmp
}
define i1 @__all(<4 x i64>) nounwind readnone alwaysinline {
%floatmask = bitcast <4 x i64> %0 to <4 x double>
%v = call i32 @llvm.x86.avx.movmsk.pd.256(<4 x double> %floatmask) nounwind readnone
%cmp = icmp eq i32 %v, 15
ret i1 %cmp
}
define i1 @__none(<4 x i64>) nounwind readnone alwaysinline {
%floatmask = bitcast <4 x i64> %0 to <4 x double>
%v = call i32 @llvm.x86.avx.movmsk.pd.256(<4 x double> %floatmask) nounwind readnone
%cmp = icmp eq i32 %v, 0
ret i1 %cmp
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal float ops
;; sse intrinsic
declare <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float>, <4 x float>) nounwind readnone
define float @__reduce_add_float(<4 x float>) nounwind readonly alwaysinline {
%v1 = call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %0, <4 x float> %0)
%v2 = call <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float> %v1, <4 x float> %v1)
%scalar = extractelement <4 x float> %v2, i32 0
ret float %scalar
}
define float @__reduce_min_float(<4 x float>) nounwind readnone {
reduce4(float, @__min_varying_float, @__min_uniform_float)
}
define float @__reduce_max_float(<4 x float>) nounwind readnone {
reduce4(float, @__max_varying_float, @__max_uniform_float)
}
reduce_equal(4)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int8 ops
declare <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8>, <16 x i8>) nounwind readnone
define i16 @__reduce_add_int8(<4 x i8>) nounwind readnone alwaysinline
{
%wide8 = shufflevector <4 x i8> %0, <4 x i8> zeroinitializer,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 4, i32 4, i32 4,
i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4>
%rv = call <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8> %wide8,
<16 x i8> zeroinitializer)
%r0 = extractelement <2 x i64> %rv, i32 0
%r1 = extractelement <2 x i64> %rv, i32 1
%r = add i64 %r0, %r1
%r16 = trunc i64 %r to i16
ret i16 %r16
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int16 ops
define internal <4 x i16> @__add_varying_i16(<4 x i16>,
<4 x i16>) nounwind readnone alwaysinline {
%r = add <4 x i16> %0, %1
ret <4 x i16> %r
}
define internal i16 @__add_uniform_i16(i16, i16) nounwind readnone alwaysinline {
%r = add i16 %0, %1
ret i16 %r
}
define i16 @__reduce_add_int16(<4 x i16>) nounwind readnone alwaysinline {
reduce4(i16, @__add_varying_i16, @__add_uniform_i16)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int32 ops
define <4 x i32> @__add_varying_int32(<4 x i32>,
<4 x i32>) nounwind readnone alwaysinline {
%s = add <4 x i32> %0, %1
ret <4 x i32> %s
}
define i32 @__add_uniform_int32(i32, i32) nounwind readnone alwaysinline {
%s = add i32 %0, %1
ret i32 %s
}
define i32 @__reduce_add_int32(<4 x i32>) nounwind readnone alwaysinline {
reduce4(i32, @__add_varying_int32, @__add_uniform_int32)
}
define i32 @__reduce_min_int32(<4 x i32>) nounwind readnone alwaysinline {
reduce4(i32, @__min_varying_int32, @__min_uniform_int32)
}
define i32 @__reduce_max_int32(<4 x i32>) nounwind readnone alwaysinline {
reduce4(i32, @__max_varying_int32, @__max_uniform_int32)
}
define i32 @__reduce_min_uint32(<4 x i32>) nounwind readnone alwaysinline {
reduce4(i32, @__min_varying_uint32, @__min_uniform_uint32)
}
define i32 @__reduce_max_uint32(<4 x i32>) nounwind readnone alwaysinline {
reduce4(i32, @__max_varying_uint32, @__max_uniform_uint32)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal double ops
declare <4 x double> @llvm.x86.avx.hadd.pd.256(<4 x double>, <4 x double>) nounwind readnone
define double @__reduce_add_double(<4 x double>) nounwind readonly alwaysinline {
%v0 = shufflevector <4 x double> %0, <4 x double> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
%v1 = shufflevector <4 x double> <double 0.,double 0.,double 0.,double 0.>, <4 x double> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
;; %v1 = <4 x double> <double 0., double 0., double 0., double 0.>
%sum0 = call <4 x double> @llvm.x86.avx.hadd.pd.256(<4 x double> %v0, <4 x double> %v1)
%sum1 = call <4 x double> @llvm.x86.avx.hadd.pd.256(<4 x double> %sum0, <4 x double> %sum0)
%final0 = extractelement <4 x double> %sum1, i32 0
%final1 = extractelement <4 x double> %sum1, i32 2
%sum = fadd double %final0, %final1
ret double %sum
}
define double @__reduce_min_double(<4 x double>) nounwind readnone alwaysinline {
reduce4(double, @__min_varying_double, @__min_uniform_double)
}
define double @__reduce_max_double(<4 x double>) nounwind readnone alwaysinline {
reduce4(double, @__max_varying_double, @__max_uniform_double)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int64 ops
define <4 x i64> @__add_varying_int64(<4 x i64>,
<4 x i64>) nounwind readnone alwaysinline {
%s = add <4 x i64> %0, %1
ret <4 x i64> %s
}
define i64 @__add_uniform_int64(i64, i64) nounwind readnone alwaysinline {
%s = add i64 %0, %1
ret i64 %s
}
define i64 @__reduce_add_int64(<4 x i64>) nounwind readnone alwaysinline {
reduce4(i64, @__add_varying_int64, @__add_uniform_int64)
}
define i64 @__reduce_min_int64(<4 x i64>) nounwind readnone alwaysinline {
reduce4(i64, @__min_varying_int64, @__min_uniform_int64)
}
define i64 @__reduce_max_int64(<4 x i64>) nounwind readnone alwaysinline {
reduce4(i64, @__max_varying_int64, @__max_uniform_int64)
}
define i64 @__reduce_min_uint64(<4 x i64>) nounwind readnone alwaysinline {
reduce4(i64, @__min_varying_uint64, @__min_uniform_uint64)
}
define i64 @__reduce_max_uint64(<4 x i64>) nounwind readnone alwaysinline {
reduce4(i64, @__max_varying_uint64, @__max_uniform_uint64)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unaligned loads/loads+broadcasts
; no masked load instruction for i8 and i16 types??
masked_load(i8, 1)
masked_load(i16, 2)
;; avx intrinsics
declare <4 x float> @llvm.x86.avx.maskload.ps(i8 *, <4 x MfORi32> %mask)
declare <4 x double> @llvm.x86.avx.maskload.pd.256(i8 *, <4 x MdORi64> %mask)
define <4 x i32> @__masked_load_i32(i8 *, <4 x i64> %mask64) nounwind alwaysinline {
%mask = trunc <4 x i64> %mask64 to <4 x i32>
%floatmask = bitcast <4 x i32> %mask to <4 x MfORi32>
%floatval = call <4 x float> @llvm.x86.avx.maskload.ps(i8 * %0, <4 x MfORi32> %floatmask)
%retval = bitcast <4 x float> %floatval to <4 x i32>
ret <4 x i32> %retval
}
define <4 x i64> @__masked_load_i64(i8 *, <4 x i64> %mask) nounwind alwaysinline {
%doublemask = bitcast <4 x i64> %mask to <4 x MdORi64>
%doubleval = call <4 x double> @llvm.x86.avx.maskload.pd.256(i8 * %0, <4 x MdORi64> %doublemask)
%retval = bitcast <4 x double> %doubleval to <4 x i64>
ret <4 x i64> %retval
}
masked_load_float_double()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; masked store
gen_masked_store(i8)
gen_masked_store(i16)
; note that mask is the 2nd parameter, not the 3rd one!!
;; avx intrinsics
declare void @llvm.x86.avx.maskstore.ps (i8 *, <4 x MfORi32>, <4 x float>)
declare void @llvm.x86.avx.maskstore.pd.256(i8 *, <4 x MdORi64>, <4 x double>)
define void @__masked_store_i32(<4 x i32>* nocapture, <4 x i32>,
<4 x i64>) nounwind alwaysinline {
%mask32 = trunc <4 x i64> %2 to <4 x i32>
%ptr = bitcast <4 x i32> * %0 to i8 *
%val = bitcast <4 x i32> %1 to <4 x float>
%mask = bitcast <4 x i32> %mask32 to <4 x MfORi32>
call void @llvm.x86.avx.maskstore.ps(i8 * %ptr, <4 x MfORi32> %mask, <4 x float> %val)
ret void
}
define void @__masked_store_i64(<4 x i64>* nocapture, <4 x i64>,
<4 x i64>) nounwind alwaysinline {
%ptr = bitcast <4 x i64> * %0 to i8 *
%val = bitcast <4 x i64> %1 to <4 x double>
%mask = bitcast <4 x i64> %2 to <4 x MdORi64>
call void @llvm.x86.avx.maskstore.pd.256(i8 * %ptr, <4 x MdORi64> %mask, <4 x double> %val)
ret void
}
masked_store_blend_8_16_by_4_mask64()
;; sse intrinsic
declare <4 x float> @llvm.x86.sse41.blendvps(<4 x float>, <4 x float>,
<4 x float>) nounwind readnone
define void @__masked_store_blend_i32(<4 x i32>* nocapture, <4 x i32>,
<4 x i64>) nounwind alwaysinline {
%mask = trunc <4 x i64> %2 to <4 x i32>
%mask_as_float = bitcast <4 x i32> %mask to <4 x float>
%oldValue = load PTR_OP_ARGS(` <4 x i32>') %0, align 4
%oldAsFloat = bitcast <4 x i32> %oldValue to <4 x float>
%newAsFloat = bitcast <4 x i32> %1 to <4 x float>
%blend = call <4 x float> @llvm.x86.sse41.blendvps(<4 x float> %oldAsFloat,
<4 x float> %newAsFloat,
<4 x float> %mask_as_float)
%blendAsInt = bitcast <4 x float> %blend to <4 x i32>
store <4 x i32> %blendAsInt, <4 x i32>* %0, align 4
ret void
}
;; avx intrinsic
declare <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double>, <4 x double>,
<4 x double>) nounwind readnone
define void @__masked_store_blend_i64(<4 x i64>* nocapture , <4 x i64>,
<4 x i64>) nounwind alwaysinline {
%mask_as_double = bitcast <4 x i64> %2 to <4 x double>
%oldValue = load PTR_OP_ARGS(` <4 x i64>') %0, align 4
%oldAsDouble = bitcast <4 x i64> %oldValue to <4 x double>
%newAsDouble = bitcast <4 x i64> %1 to <4 x double>
%blend = call <4 x double> @llvm.x86.avx.blendv.pd.256(<4 x double> %oldAsDouble,
<4 x double> %newAsDouble,
<4 x double> %mask_as_double)
%blendAsInt = bitcast <4 x double> %blend to <4 x i64>
store <4 x i64> %blendAsInt, <4 x i64>* %0, align 4
ret void
}
masked_store_float_double()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; scatter
gen_scatter(i8)
gen_scatter(i16)
gen_scatter(i32)
gen_scatter(float)
gen_scatter(i64)
gen_scatter(double)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision min/max
declare <4 x double> @llvm.x86.avx.max.pd.256(<4 x double>, <4 x double>) nounwind readnone
declare <4 x double> @llvm.x86.avx.min.pd.256(<4 x double>, <4 x double>) nounwind readnone
define <4 x double> @__min_varying_double(<4 x double>, <4 x double>) nounwind readnone alwaysinline {
%call = call <4 x double> @llvm.x86.avx.min.pd.256(<4 x double> %0, <4 x double> %1)
ret <4 x double> %call
}
define <4 x double> @__max_varying_double(<4 x double>, <4 x double>) nounwind readnone alwaysinline {
%call = call <4 x double> @llvm.x86.avx.max.pd.256(<4 x double> %0, <4 x double> %1)
ret <4 x double> %call
}
rsqrtd_decl()
rcpd_decl()
transcendetals_decl()
trigonometry_decl()

View File

@@ -0,0 +1,81 @@
;; Copyright (c) 2010-2012, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
include(`target-avx-x2.ll')
rdrand_decls()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int min/max
define <16 x i32> @__min_varying_int32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(ret, i32, @llvm.x86.sse41.pminsd, %0, %1)
ret <16 x i32> %ret
}
define <16 x i32> @__max_varying_int32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(ret, i32, @llvm.x86.sse41.pmaxsd, %0, %1)
ret <16 x i32> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unsigned int min/max
define <16 x i32> @__min_varying_uint32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(ret, i32, @llvm.x86.sse41.pminud, %0, %1)
ret <16 x i32> %ret
}
define <16 x i32> @__max_varying_uint32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(ret, i32, @llvm.x86.sse41.pmaxud, %0, %1)
ret <16 x i32> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
ifelse(NO_HALF_DECLARES, `1', `', `
declare float @__half_to_float_uniform(i16 %v) nounwind readnone
declare <WIDTH x float> @__half_to_float_varying(<WIDTH x i16> %v) nounwind readnone
declare i16 @__float_to_half_uniform(float %v) nounwind readnone
declare <WIDTH x i16> @__float_to_half_varying(<WIDTH x float> %v) nounwind readnone
')
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather
gen_gather_factored(i8)
gen_gather_factored(i16)
gen_gather_factored(i32)
gen_gather_factored(float)
gen_gather_factored(i64)
gen_gather_factored(double)

82
builtins/target-avx1.ll Normal file
View File

@@ -0,0 +1,82 @@
;; Copyright (c) 2010-2013, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
include(`target-avx.ll')
rdrand_decls()
saturation_arithmetic()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int min/max
define <8 x i32> @__min_varying_int32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(ret, i32, @llvm.x86.sse41.pminsd, %0, %1)
ret <8 x i32> %ret
}
define <8 x i32> @__max_varying_int32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(ret, i32, @llvm.x86.sse41.pmaxsd, %0, %1)
ret <8 x i32> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unsigned int min/max
define <8 x i32> @__min_varying_uint32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(ret, i32, @llvm.x86.sse41.pminud, %0, %1)
ret <8 x i32> %ret
}
define <8 x i32> @__max_varying_uint32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(ret, i32, @llvm.x86.sse41.pmaxud, %0, %1)
ret <8 x i32> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
ifelse(NO_HALF_DECLARES, `1', `', `
declare float @__half_to_float_uniform(i16 %v) nounwind readnone
declare <WIDTH x float> @__half_to_float_varying(<WIDTH x i16> %v) nounwind readnone
declare i16 @__float_to_half_uniform(float %v) nounwind readnone
declare <WIDTH x i16> @__float_to_half_varying(<WIDTH x float> %v) nounwind readnone
')
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather
gen_gather_factored(i8)
gen_gather_factored(i16)
gen_gather_factored(i32)
gen_gather_factored(float)
gen_gather_factored(i64)
gen_gather_factored(double)

View File

@@ -0,0 +1,119 @@
;; Copyright (c) 2013, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
include(`target-avx1-i64x4base.ll')
rdrand_definition()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int min/max
define <4 x i32> @__min_varying_int32(<4 x i32>, <4 x i32>) nounwind readonly alwaysinline {
%m = call <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %m
}
define <4 x i32> @__max_varying_int32(<4 x i32>, <4 x i32>) nounwind readonly alwaysinline {
%m = call <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %m
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unsigned int min/max
define <4 x i32> @__min_varying_uint32(<4 x i32>, <4 x i32>) nounwind readonly alwaysinline {
%m = call <4 x i32> @llvm.x86.sse41.pminud(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %m
}
define <4 x i32> @__max_varying_uint32(<4 x i32>, <4 x i32>) nounwind readonly alwaysinline {
%m = call <4 x i32> @llvm.x86.sse41.pmaxud(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %m
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather
gen_gather(i8)
gen_gather(i16)
gen_gather(i32)
gen_gather(float)
gen_gather(i64)
gen_gather(double)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float/half conversions
define(`expand_4to8', `
%$3 = shufflevector <4 x $1> %$2, <4 x $1> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
')
define(`extract_4from8', `
%$3 = shufflevector <8 x $1> %$2, <8 x $1> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
')
declare <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16>) nounwind readnone
; 0 is round nearest even
declare <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float>, i32) nounwind readnone
define <4 x float> @__half_to_float_varying(<4 x i16> %v4) nounwind readnone {
expand_4to8(i16, v4, v)
%r = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %v)
extract_4from8(float, r, ret)
ret <4 x float> %ret
}
define <4 x i16> @__float_to_half_varying(<4 x float> %v4) nounwind readnone {
expand_4to8(float, v4, v)
%r = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %v, i32 0)
extract_4from8(i16, r, ret)
ret <4 x i16> %ret
}
define float @__half_to_float_uniform(i16 %v) nounwind readnone {
%v1 = bitcast i16 %v to <1 x i16>
%vv = shufflevector <1 x i16> %v1, <1 x i16> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%rv = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %vv)
%r = extractelement <8 x float> %rv, i32 0
ret float %r
}
define i16 @__float_to_half_uniform(float %v) nounwind readnone {
%v1 = bitcast float %v to <1 x float>
%vv = shufflevector <1 x float> %v1, <1 x float> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
; round to nearest even
%rv = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %vv, i32 0)
%r = extractelement <8 x i16> %rv, i32 0
ret i16 %r
}

125
builtins/target-avx11-x2.ll Normal file
View File

@@ -0,0 +1,125 @@
;; Copyright (c) 2012-2013, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
include(`target-avx-x2.ll')
rdrand_definition()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int min/max
define <16 x i32> @__min_varying_int32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(ret, i32, @llvm.x86.sse41.pminsd, %0, %1)
ret <16 x i32> %ret
}
define <16 x i32> @__max_varying_int32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(ret, i32, @llvm.x86.sse41.pmaxsd, %0, %1)
ret <16 x i32> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unsigned int min/max
define <16 x i32> @__min_varying_uint32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(ret, i32, @llvm.x86.sse41.pminud, %0, %1)
ret <16 x i32> %ret
}
define <16 x i32> @__max_varying_uint32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(ret, i32, @llvm.x86.sse41.pmaxud, %0, %1)
ret <16 x i32> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather
gen_gather(i8)
gen_gather(i16)
gen_gather(i32)
gen_gather(float)
gen_gather(i64)
gen_gather(double)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float/half conversions
declare <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16>) nounwind readnone
; 0 is round nearest even
declare <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float>, i32) nounwind readnone
define <16 x float> @__half_to_float_varying(<16 x i16> %v) nounwind readnone {
%r_0 = shufflevector <16 x i16> %v, <16 x i16> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%vr_0 = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %r_0)
%r_1 = shufflevector <16 x i16> %v, <16 x i16> undef,
<8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%vr_1 = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %r_1)
%r = shufflevector <8 x float> %vr_0, <8 x float> %vr_1,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x float> %r
}
define <16 x i16> @__float_to_half_varying(<16 x float> %v) nounwind readnone {
%r_0 = shufflevector <16 x float> %v, <16 x float> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%vr_0 = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %r_0, i32 0)
%r_1 = shufflevector <16 x float> %v, <16 x float> undef,
<8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%vr_1 = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %r_1, i32 0)
%r = shufflevector <8 x i16> %vr_0, <8 x i16> %vr_1,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x i16> %r
}
define float @__half_to_float_uniform(i16 %v) nounwind readnone {
%v1 = bitcast i16 %v to <1 x i16>
%vv = shufflevector <1 x i16> %v1, <1 x i16> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%rv = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %vv)
%r = extractelement <8 x float> %rv, i32 0
ret float %r
}
define i16 @__float_to_half_uniform(float %v) nounwind readnone {
%v1 = bitcast float %v to <1 x float>
%vv = shufflevector <1 x float> %v1, <1 x float> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
; round to nearest even
%rv = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %vv, i32 0)
%r = extractelement <8 x i16> %rv, i32 0
ret i16 %r
}

110
builtins/target-avx11.ll Normal file
View File

@@ -0,0 +1,110 @@
;; Copyright (c) 2012-2013, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
include(`target-avx.ll')
rdrand_definition()
saturation_arithmetic()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int min/max
define <8 x i32> @__min_varying_int32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(ret, i32, @llvm.x86.sse41.pminsd, %0, %1)
ret <8 x i32> %ret
}
define <8 x i32> @__max_varying_int32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(ret, i32, @llvm.x86.sse41.pmaxsd, %0, %1)
ret <8 x i32> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unsigned int min/max
define <8 x i32> @__min_varying_uint32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(ret, i32, @llvm.x86.sse41.pminud, %0, %1)
ret <8 x i32> %ret
}
define <8 x i32> @__max_varying_uint32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(ret, i32, @llvm.x86.sse41.pmaxud, %0, %1)
ret <8 x i32> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather
gen_gather(i8)
gen_gather(i16)
gen_gather(i32)
gen_gather(float)
gen_gather(i64)
gen_gather(double)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float/half conversions
declare <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16>) nounwind readnone
; 0 is round nearest even
declare <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float>, i32) nounwind readnone
define <8 x float> @__half_to_float_varying(<8 x i16> %v) nounwind readnone {
%r = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %v)
ret <8 x float> %r
}
define <8 x i16> @__float_to_half_varying(<8 x float> %v) nounwind readnone {
%r = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %v, i32 0)
ret <8 x i16> %r
}
define float @__half_to_float_uniform(i16 %v) nounwind readnone {
%v1 = bitcast i16 %v to <1 x i16>
%vv = shufflevector <1 x i16> %v1, <1 x i16> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%rv = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %vv)
%r = extractelement <8 x float> %rv, i32 0
ret float %r
}
define i16 @__float_to_half_uniform(float %v) nounwind readnone {
%v1 = bitcast float %v to <1 x float>
%vv = shufflevector <1 x float> %v1, <1 x float> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
; round to nearest even
%rv = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %vv, i32 0)
%r = extractelement <8 x i16> %rv, i32 0
ret i16 %r
}

View File

@@ -0,0 +1,342 @@
;; Copyright (c) 2013, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
define(`HAVE_GATHER', `1')
include(`target-avx1-i64x4base.ll')
rdrand_definition()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int min/max
;; declare <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32>, <4 x i32>) nounwind readnone
;; declare <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32>, <4 x i32>) nounwind readonly
define <4 x i32> @__min_varying_int32(<4 x i32>, <4 x i32>) nounwind readonly alwaysinline {
%m = call <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %m
}
define <4 x i32> @__max_varying_int32(<4 x i32>, <4 x i32>) nounwind readonly alwaysinline {
%m = call <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %m
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unsigned int min/max
;; declare <4 x i32> @llvm.x86.sse41.pminud(<4 x i32>, <4 x i32>) nounwind readonly
;; declare <4 x i32> @llvm.x86.sse41.pmaxud(<4 x i32>, <4 x i32>) nounwind readonly
define <4 x i32> @__min_varying_uint32(<4 x i32>, <4 x i32>) nounwind readonly alwaysinline {
%m = call <4 x i32> @llvm.x86.sse41.pminud(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %m
}
define <4 x i32> @__max_varying_uint32(<4 x i32>, <4 x i32>) nounwind readonly alwaysinline {
%m = call <4 x i32> @llvm.x86.sse41.pmaxud(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %m
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float/half conversions
define(`expand_4to8', `
%$3 = shufflevector <4 x $1> %$2, <4 x $1> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
')
define(`extract_4from8', `
%$3 = shufflevector <8 x $1> %$2, <8 x $1> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
')
declare <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16>) nounwind readnone
; 0 is round nearest even
declare <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float>, i32) nounwind readnone
define <4 x float> @__half_to_float_varying(<4 x i16> %v4) nounwind readnone {
expand_4to8(i16, v4, v)
%r = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %v)
extract_4from8(float, r, ret)
ret <4 x float> %ret
}
define <4 x i16> @__float_to_half_varying(<4 x float> %v4) nounwind readnone {
expand_4to8(float, v4, v)
%r = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %v, i32 0)
extract_4from8(i16, r, ret)
ret <4 x i16> %ret
}
define float @__half_to_float_uniform(i16 %v) nounwind readnone {
%v1 = bitcast i16 %v to <1 x i16>
%vv = shufflevector <1 x i16> %v1, <1 x i16> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%rv = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %vv)
%r = extractelement <8 x float> %rv, i32 0
ret float %r
}
define i16 @__float_to_half_uniform(float %v) nounwind readnone {
%v1 = bitcast float %v to <1 x float>
%vv = shufflevector <1 x float> %v1, <1 x float> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
; round to nearest even
%rv = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %vv, i32 0)
%r = extractelement <8 x i16> %rv, i32 0
ret i16 %r
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather
declare void @llvm.trap() noreturn nounwind
gen_gather(i8)
gen_gather(i16)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int32 gathers
declare <4 x i32> @llvm.x86.avx2.gather.d.d(<4 x i32> %target, i8 * %ptr,
<4 x i32> %indices, <4 x i32> %mask, i8 %scale) readonly nounwind
declare <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> %target, i8 * %ptr,
<4 x i64> %indices, <4 x i32> %mask, i8 %scale) readonly nounwind
define <4 x i32> @__gather_base_offsets32_i32(i8 * %ptr,
i32 %scale, <4 x i32> %offsets,
<4 x i64> %vecmask64) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask = trunc <4 x i64> %vecmask64 to <4 x i32>
%v = call <4 x i32> @llvm.x86.avx2.gather.d.d(<4 x i32> undef, i8 * %ptr,
<4 x i32> %offsets, <4 x i32> %vecmask, i8 %scale8)
ret <4 x i32> %v
}
define <4 x i32> @__gather_base_offsets64_i32(i8 * %ptr,
i32 %scale, <4 x i64> %offsets,
<4 x i64> %vecmask64) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask = trunc <4 x i64> %vecmask64 to <4 x i32>
%v = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * %ptr,
<4 x i64> %offsets, <4 x i32> %vecmask, i8 %scale8)
ret <4 x i32> %v
}
define <4 x i32> @__gather32_i32(<4 x i32> %ptrs,
<4 x i64> %vecmask64) nounwind readonly alwaysinline {
%vecmask = trunc <4 x i64> %vecmask64 to <4 x i32>
%v = call <4 x i32> @llvm.x86.avx2.gather.d.d(<4 x i32> undef, i8 * null,
<4 x i32> %ptrs, <4 x i32> %vecmask, i8 1)
ret <4 x i32> %v
}
define <4 x i32> @__gather64_i32(<4 x i64> %ptrs,
<4 x i64> %vecmask64) nounwind readonly alwaysinline {
%vecmask = trunc <4 x i64> %vecmask64 to <4 x i32>
%v = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * null,
<4 x i64> %ptrs, <4 x i32> %vecmask, i8 1)
ret <4 x i32> %v
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float gathers
declare <4 x float> @llvm.x86.avx2.gather.d.ps(<4 x float> %target, i8 * %ptr,
<4 x i32> %indices, <4 x float> %mask, i8 %scale8) readonly nounwind
declare <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> %target, i8 * %ptr,
<4 x i64> %indices, <4 x float> %mask, i8 %scale8) readonly nounwind
define <4 x float> @__gather_base_offsets32_float(i8 * %ptr,
i32 %scale, <4 x i32> %offsets,
<4 x i64> %vecmask64) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask = trunc <4 x i64> %vecmask64 to <4 x i32>
%mask = bitcast <4 x i32> %vecmask to <4 x float>
%v = call <4 x float> @llvm.x86.avx2.gather.d.ps(<4 x float> undef, i8 * %ptr,
<4 x i32> %offsets, <4 x float> %mask, i8 %scale8)
ret <4 x float> %v
}
define <4 x float> @__gather_base_offsets64_float(i8 * %ptr,
i32 %scale, <4 x i64> %offsets,
<4 x i64> %vecmask64) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask = trunc <4 x i64> %vecmask64 to <4 x i32>
%mask = bitcast <4 x i32> %vecmask to <4 x float>
%v = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * %ptr,
<4 x i64> %offsets, <4 x float> %mask, i8 %scale8)
ret <4 x float> %v
}
define <4 x float> @__gather32_float(<4 x i32> %ptrs,
<4 x i64> %vecmask64) nounwind readonly alwaysinline {
%vecmask = trunc <4 x i64> %vecmask64 to <4 x i32>
%mask = bitcast <4 x i32> %vecmask to <4 x float>
%v = call <4 x float> @llvm.x86.avx2.gather.d.ps(<4 x float> undef, i8 * null,
<4 x i32> %ptrs, <4 x float> %mask, i8 1)
ret <4 x float> %v
}
define <4 x float> @__gather64_float(<4 x i64> %ptrs,
<4 x i64> %vecmask64) nounwind readonly alwaysinline {
%vecmask = trunc <4 x i64> %vecmask64 to <4 x i32>
%mask = bitcast <4 x i32> %vecmask to <4 x float>
%v = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * null,
<4 x i64> %ptrs, <4 x float> %mask, i8 1)
ret <4 x float> %v
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int64 gathers
declare <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> %target, i8 * %ptr,
<4 x i32> %indices, <4 x i64> %mask, i8 %scale) readonly nounwind
declare <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> %target, i8 * %ptr,
<4 x i64> %indices, <4 x i64> %mask, i8 %scale) readonly nounwind
define <4 x i64> @__gather_base_offsets32_i64(i8 * %ptr,
i32 %scale, <4 x i32> %offsets,
<4 x i64> %vecmask) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%v = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i32> %offsets, <4 x i64> %vecmask, i8 %scale8)
ret <4 x i64> %v
}
define <4 x i64> @__gather_base_offsets64_i64(i8 * %ptr,
i32 %scale, <4 x i64> %offsets,
<4 x i64> %vecmask) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%v = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i64> %offsets, <4 x i64> %vecmask, i8 %scale8)
ret <4 x i64> %v
}
define <4 x i64> @__gather32_i64(<4 x i32> %ptrs,
<4 x i64> %vecmask) nounwind readonly alwaysinline {
%v = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * null,
<4 x i32> %ptrs, <4 x i64> %vecmask, i8 1)
ret <4 x i64> %v
}
define <4 x i64> @__gather64_i64(<4 x i64> %ptrs,
<4 x i64> %vecmask) nounwind readonly alwaysinline {
%v = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * null,
<4 x i64> %ptrs, <4 x i64> %vecmask, i8 1)
ret <4 x i64> %v
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double gathers
declare <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> %target, i8 * %ptr,
<4 x i64> %indices, <4 x double> %mask, i8 %scale) readonly nounwind
declare <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> %target, i8 * %ptr,
<4 x i32> %indices, <4 x double> %mask, i8 %scale) readonly nounwind
define <4 x double> @__gather_base_offsets32_double(i8 * %ptr,
i32 %scale, <4 x i32> %offsets,
<4 x i64> %vecmask64) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask = bitcast <4 x i64> %vecmask64 to <4 x double>
%v = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i32> %offsets, <4 x double> %vecmask, i8 %scale8)
ret <4 x double> %v
}
define <4 x double> @__gather_base_offsets64_double(i8 * %ptr,
i32 %scale, <4 x i64> %offsets,
<4 x i64> %vecmask64) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask = bitcast <4 x i64> %vecmask64 to <4 x double>
%v = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i64> %offsets, <4 x double> %vecmask, i8 %scale8)
ret <4 x double> %v
}
define <4 x double> @__gather32_double(<4 x i32> %ptrs,
<4 x i64> %vecmask64) nounwind readonly alwaysinline {
%vecmask = bitcast <4 x i64> %vecmask64 to <4 x double>
%v = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * null,
<4 x i32> %ptrs, <4 x double> %vecmask, i8 1)
ret <4 x double> %v
}
define <4 x double> @__gather64_double(<4 x i64> %ptrs,
<4 x i64> %vecmask64) nounwind readonly alwaysinline {
%vecmask = bitcast <4 x i64> %vecmask64 to <4 x double>
%v = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * null,
<4 x i64> %ptrs, <4 x double> %vecmask, i8 1)
ret <4 x double> %v
}

538
builtins/target-avx2-x2.ll Normal file
View File

@@ -0,0 +1,538 @@
;; Copyright (c) 2010-2013, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
define(`HAVE_GATHER', `1')
include(`target-avx-x2.ll')
rdrand_definition()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int min/max
declare <8 x i32> @llvm.x86.avx2.pmins.d(<8 x i32>, <8 x i32>) nounwind readonly
declare <8 x i32> @llvm.x86.avx2.pmaxs.d(<8 x i32>, <8 x i32>) nounwind readonly
define <16 x i32> @__min_varying_int32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary8to16(m, i32, @llvm.x86.avx2.pmins.d, %0, %1)
ret <16 x i32> %m
}
define <16 x i32> @__max_varying_int32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary8to16(m, i32, @llvm.x86.avx2.pmaxs.d, %0, %1)
ret <16 x i32> %m
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unsigned int min/max
declare <8 x i32> @llvm.x86.avx2.pminu.d(<8 x i32>, <8 x i32>) nounwind readonly
declare <8 x i32> @llvm.x86.avx2.pmaxu.d(<8 x i32>, <8 x i32>) nounwind readonly
define <16 x i32> @__min_varying_uint32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary8to16(m, i32, @llvm.x86.avx2.pminu.d, %0, %1)
ret <16 x i32> %m
}
define <16 x i32> @__max_varying_uint32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary8to16(m, i32, @llvm.x86.avx2.pmaxu.d, %0, %1)
ret <16 x i32> %m
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float/half conversions
declare <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16>) nounwind readnone
; 0 is round nearest even
declare <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float>, i32) nounwind readnone
define <16 x float> @__half_to_float_varying(<16 x i16> %v) nounwind readnone {
%r_0 = shufflevector <16 x i16> %v, <16 x i16> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%vr_0 = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %r_0)
%r_1 = shufflevector <16 x i16> %v, <16 x i16> undef,
<8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%vr_1 = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %r_1)
%r = shufflevector <8 x float> %vr_0, <8 x float> %vr_1,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x float> %r
}
define <16 x i16> @__float_to_half_varying(<16 x float> %v) nounwind readnone {
%r_0 = shufflevector <16 x float> %v, <16 x float> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%vr_0 = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %r_0, i32 0)
%r_1 = shufflevector <16 x float> %v, <16 x float> undef,
<8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%vr_1 = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %r_1, i32 0)
%r = shufflevector <8 x i16> %vr_0, <8 x i16> %vr_1,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x i16> %r
}
define float @__half_to_float_uniform(i16 %v) nounwind readnone {
%v1 = bitcast i16 %v to <1 x i16>
%vv = shufflevector <1 x i16> %v1, <1 x i16> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%rv = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %vv)
%r = extractelement <8 x float> %rv, i32 0
ret float %r
}
define i16 @__float_to_half_uniform(float %v) nounwind readnone {
%v1 = bitcast float %v to <1 x float>
%vv = shufflevector <1 x float> %v1, <1 x float> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
; round to nearest even
%rv = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %vv, i32 0)
%r = extractelement <8 x i16> %rv, i32 0
ret i16 %r
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather
declare void @llvm.trap() noreturn nounwind
; $1: type
; $2: var base name
define(`extract_4s', `
%$2_1 = shufflevector <16 x $1> %$2, <16 x $1> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%$2_2 = shufflevector <16 x $1> %$2, <16 x $1> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
%$2_3 = shufflevector <16 x $1> %$2, <16 x $1> undef, <4 x i32> <i32 8, i32 9, i32 10, i32 11>
%$2_4 = shufflevector <16 x $1> %$2, <16 x $1> undef, <4 x i32> <i32 12, i32 13, i32 14, i32 15>
')
; $1: type
; $2: var base name
define(`extract_8s', `
%$2_1 = shufflevector <16 x $1> %$2, <16 x $1> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%$2_2 = shufflevector <16 x $1> %$2, <16 x $1> undef,
<8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
')
; $1: element type
; $2: ret name
; $3: v1
; $4: v2
define(`assemble_8s', `
%$2 = shufflevector <8 x $1> %$3, <8 x $1> %$4,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
')
; $1: element type
; $2: ret name
; $3: v1
; $4: v2
; $5: v3
; $6: v4
define(`assemble_4s', `
%$2_1 = shufflevector <4 x $1> %$3, <4 x $1> %$4,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%$2_2 = shufflevector <4 x $1> %$5, <4 x $1> %$6,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
assemble_8s($1, $2, $2_1, $2_2)
')
gen_gather(i8)
gen_gather(i16)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int32 gathers
declare <8 x i32> @llvm.x86.avx2.gather.d.d.256(<8 x i32> %target, i8 * %ptr,
<8 x i32> %indices, <8 x i32> %mask, i8 %scale) readonly nounwind
declare <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> %target, i8 * %ptr,
<4 x i64> %indices, <4 x i32> %mask, i8 %scale) readonly nounwind
define <16 x i32> @__gather_base_offsets32_i32(i8 * %ptr, i32 %scale, <16 x i32> %offsets,
<16 x i32> %vecmask) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
extract_8s(i32, offsets)
extract_8s(i32, vecmask)
%v1 = call <8 x i32> @llvm.x86.avx2.gather.d.d.256(<8 x i32> undef, i8 * %ptr,
<8 x i32> %offsets_1, <8 x i32> %vecmask_1, i8 %scale8)
%v2 = call <8 x i32> @llvm.x86.avx2.gather.d.d.256(<8 x i32> undef, i8 * %ptr,
<8 x i32> %offsets_2, <8 x i32> %vecmask_2, i8 %scale8)
assemble_8s(i32, v, v1, v2)
ret <16 x i32> %v
}
define <16 x i32> @__gather_base_offsets64_i32(i8 * %ptr,
i32 %scale, <16 x i64> %offsets,
<16 x i32> %vecmask) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
extract_4s(i32, vecmask)
extract_4s(i64, offsets)
%v1 = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * %ptr,
<4 x i64> %offsets_1, <4 x i32> %vecmask_1, i8 %scale8)
%v2 = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * %ptr,
<4 x i64> %offsets_2, <4 x i32> %vecmask_2, i8 %scale8)
%v3 = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * %ptr,
<4 x i64> %offsets_3, <4 x i32> %vecmask_3, i8 %scale8)
%v4 = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * %ptr,
<4 x i64> %offsets_4, <4 x i32> %vecmask_4, i8 %scale8)
assemble_4s(i32, v, v1, v2, v3, v4)
ret <16 x i32> %v
}
define <16 x i32> @__gather32_i32(<16 x i32> %ptrs,
<16 x i32> %vecmask) nounwind readonly alwaysinline {
extract_8s(i32, ptrs)
extract_8s(i32, vecmask)
%v1 = call <8 x i32> @llvm.x86.avx2.gather.d.d.256(<8 x i32> undef, i8 * null,
<8 x i32> %ptrs_1, <8 x i32> %vecmask_1, i8 1)
%v2 = call <8 x i32> @llvm.x86.avx2.gather.d.d.256(<8 x i32> undef, i8 * null,
<8 x i32> %ptrs_2, <8 x i32> %vecmask_2, i8 1)
assemble_8s(i32, v, v1, v2)
ret <16 x i32> %v
}
define <16 x i32> @__gather64_i32(<16 x i64> %ptrs,
<16 x i32> %vecmask) nounwind readonly alwaysinline {
extract_4s(i64, ptrs)
extract_4s(i32, vecmask)
%v1 = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * null,
<4 x i64> %ptrs_1, <4 x i32> %vecmask_1, i8 1)
%v2 = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * null,
<4 x i64> %ptrs_2, <4 x i32> %vecmask_2, i8 1)
%v3 = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * null,
<4 x i64> %ptrs_3, <4 x i32> %vecmask_3, i8 1)
%v4 = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * null,
<4 x i64> %ptrs_4, <4 x i32> %vecmask_4, i8 1)
assemble_4s(i32, v, v1, v2, v3, v4)
ret <16 x i32> %v
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float gathers
declare <8 x float> @llvm.x86.avx2.gather.d.ps.256(<8 x float> %target, i8 * %ptr,
<8 x i32> %indices, <8 x float> %mask, i8 %scale8) readonly nounwind
declare <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> %target, i8 * %ptr,
<4 x i64> %indices, <4 x float> %mask, i8 %scale8) readonly nounwind
define <16 x float> @__gather_base_offsets32_float(i8 * %ptr,
i32 %scale, <16 x i32> %offsets,
<16 x i32> %vecmask) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%mask = bitcast <16 x i32> %vecmask to <16 x float>
extract_8s(i32, offsets)
extract_8s(float, mask)
%v1 = call <8 x float> @llvm.x86.avx2.gather.d.ps.256(<8 x float> undef, i8 * %ptr,
<8 x i32> %offsets_1, <8 x float> %mask_1, i8 %scale8)
%v2 = call <8 x float> @llvm.x86.avx2.gather.d.ps.256(<8 x float> undef, i8 * %ptr,
<8 x i32> %offsets_2, <8 x float> %mask_2, i8 %scale8)
assemble_8s(float, v, v1, v2)
ret <16 x float> %v
}
define <16 x float> @__gather_base_offsets64_float(i8 * %ptr,
i32 %scale, <16 x i64> %offsets,
<16 x i32> %vecmask) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%mask = bitcast <16 x i32> %vecmask to <16 x float>
extract_4s(i64, offsets)
extract_4s(float, mask)
%v1 = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * %ptr,
<4 x i64> %offsets_1, <4 x float> %mask_1, i8 %scale8)
%v2 = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * %ptr,
<4 x i64> %offsets_2, <4 x float> %mask_2, i8 %scale8)
%v3 = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * %ptr,
<4 x i64> %offsets_3, <4 x float> %mask_3, i8 %scale8)
%v4 = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * %ptr,
<4 x i64> %offsets_4, <4 x float> %mask_4, i8 %scale8)
assemble_4s(float, v, v1, v2, v3, v4)
ret <16 x float> %v
}
define <16 x float> @__gather32_float(<16 x i32> %ptrs,
<16 x i32> %vecmask) nounwind readonly alwaysinline {
%mask = bitcast <16 x i32> %vecmask to <16 x float>
extract_8s(float, mask)
extract_8s(i32, ptrs)
%v1 = call <8 x float> @llvm.x86.avx2.gather.d.ps.256(<8 x float> undef, i8 * null,
<8 x i32> %ptrs_1, <8 x float> %mask_1, i8 1)
%v2 = call <8 x float> @llvm.x86.avx2.gather.d.ps.256(<8 x float> undef, i8 * null,
<8 x i32> %ptrs_2, <8 x float> %mask_2, i8 1)
assemble_8s(float, v, v1, v2)
ret <16 x float> %v
}
define <16 x float> @__gather64_float(<16 x i64> %ptrs,
<16 x i32> %vecmask) nounwind readonly alwaysinline {
%mask = bitcast <16 x i32> %vecmask to <16 x float>
extract_4s(i64, ptrs)
extract_4s(float, mask)
%v1 = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * null,
<4 x i64> %ptrs_1, <4 x float> %mask_1, i8 1)
%v2 = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * null,
<4 x i64> %ptrs_2, <4 x float> %mask_2, i8 1)
%v3 = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * null,
<4 x i64> %ptrs_3, <4 x float> %mask_3, i8 1)
%v4 = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * null,
<4 x i64> %ptrs_4, <4 x float> %mask_4, i8 1)
assemble_4s(float, v, v1, v2, v3, v4)
ret <16 x float> %v
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int64 gathers
declare <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> %target, i8 * %ptr,
<4 x i32> %indices, <4 x i64> %mask, i8 %scale) readonly nounwind
declare <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> %target, i8 * %ptr,
<4 x i64> %indices, <4 x i64> %mask, i8 %scale) readonly nounwind
define <16 x i64> @__gather_base_offsets32_i64(i8 * %ptr,
i32 %scale, <16 x i32> %offsets,
<16 x i32> %mask32) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask = sext <16 x i32> %mask32 to <16 x i64>
extract_4s(i32, offsets)
extract_4s(i64, vecmask)
%v1 = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i32> %offsets_1, <4 x i64> %vecmask_1, i8 %scale8)
%v2 = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i32> %offsets_2, <4 x i64> %vecmask_2, i8 %scale8)
%v3 = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i32> %offsets_3, <4 x i64> %vecmask_3, i8 %scale8)
%v4 = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i32> %offsets_4, <4 x i64> %vecmask_4, i8 %scale8)
assemble_4s(i64, v, v1, v2, v3, v4)
ret <16 x i64> %v
}
define <16 x i64> @__gather_base_offsets64_i64(i8 * %ptr,
i32 %scale, <16 x i64> %offsets,
<16 x i32> %mask32) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask = sext <16 x i32> %mask32 to <16 x i64>
extract_4s(i64, offsets)
extract_4s(i64, vecmask)
%v1 = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i64> %offsets_1, <4 x i64> %vecmask_1, i8 %scale8)
%v2 = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i64> %offsets_2, <4 x i64> %vecmask_2, i8 %scale8)
%v3 = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i64> %offsets_3, <4 x i64> %vecmask_3, i8 %scale8)
%v4 = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i64> %offsets_4, <4 x i64> %vecmask_4, i8 %scale8)
assemble_4s(i64, v, v1, v2, v3, v4)
ret <16 x i64> %v
}
define <16 x i64> @__gather32_i64(<16 x i32> %ptrs,
<16 x i32> %mask32) nounwind readonly alwaysinline {
%vecmask = sext <16 x i32> %mask32 to <16 x i64>
extract_4s(i32, ptrs)
extract_4s(i64, vecmask)
%v1 = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * null,
<4 x i32> %ptrs_1, <4 x i64> %vecmask_1, i8 1)
%v2 = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * null,
<4 x i32> %ptrs_2, <4 x i64> %vecmask_2, i8 1)
%v3 = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * null,
<4 x i32> %ptrs_3, <4 x i64> %vecmask_3, i8 1)
%v4 = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * null,
<4 x i32> %ptrs_4, <4 x i64> %vecmask_4, i8 1)
assemble_4s(i64, v, v1, v2, v3, v4)
ret <16 x i64> %v
}
define <16 x i64> @__gather64_i64(<16 x i64> %ptrs,
<16 x i32> %mask32) nounwind readonly alwaysinline {
%vecmask = sext <16 x i32> %mask32 to <16 x i64>
extract_4s(i64, ptrs)
extract_4s(i64, vecmask)
%v1 = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * null,
<4 x i64> %ptrs_1, <4 x i64> %vecmask_1, i8 1)
%v2 = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * null,
<4 x i64> %ptrs_2, <4 x i64> %vecmask_2, i8 1)
%v3 = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * null,
<4 x i64> %ptrs_3, <4 x i64> %vecmask_3, i8 1)
%v4 = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * null,
<4 x i64> %ptrs_4, <4 x i64> %vecmask_4, i8 1)
assemble_4s(i64, v, v1, v2, v3, v4)
ret <16 x i64> %v
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double gathers
declare <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> %target, i8 * %ptr,
<4 x i64> %indices, <4 x double> %mask, i8 %scale) readonly nounwind
declare <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> %target, i8 * %ptr,
<4 x i32> %indices, <4 x double> %mask, i8 %scale) readonly nounwind
define <16 x double> @__gather_base_offsets32_double(i8 * %ptr,
i32 %scale, <16 x i32> %offsets,
<16 x i32> %mask32) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask64 = sext <16 x i32> %mask32 to <16 x i64>
%vecmask = bitcast <16 x i64> %vecmask64 to <16 x double>
extract_4s(i32, offsets)
extract_4s(double, vecmask)
%v1 = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i32> %offsets_1, <4 x double> %vecmask_1, i8 %scale8)
%v2 = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i32> %offsets_2, <4 x double> %vecmask_2, i8 %scale8)
%v3 = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i32> %offsets_3, <4 x double> %vecmask_3, i8 %scale8)
%v4 = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i32> %offsets_4, <4 x double> %vecmask_4, i8 %scale8)
assemble_4s(double, v, v1, v2, v3, v4)
ret <16 x double> %v
}
define <16 x double> @__gather_base_offsets64_double(i8 * %ptr,
i32 %scale, <16 x i64> %offsets,
<16 x i32> %mask32) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask64 = sext <16 x i32> %mask32 to <16 x i64>
%vecmask = bitcast <16 x i64> %vecmask64 to <16 x double>
extract_4s(i64, offsets)
extract_4s(double, vecmask)
%v1 = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i64> %offsets_1, <4 x double> %vecmask_1, i8 %scale8)
%v2 = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i64> %offsets_2, <4 x double> %vecmask_2, i8 %scale8)
%v3 = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i64> %offsets_3, <4 x double> %vecmask_3, i8 %scale8)
%v4 = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i64> %offsets_4, <4 x double> %vecmask_4, i8 %scale8)
assemble_4s(double, v, v1, v2, v3, v4)
ret <16 x double> %v
}
define <16 x double> @__gather32_double(<16 x i32> %ptrs,
<16 x i32> %mask32) nounwind readonly alwaysinline {
%vecmask64 = sext <16 x i32> %mask32 to <16 x i64>
%vecmask = bitcast <16 x i64> %vecmask64 to <16 x double>
extract_4s(i32, ptrs)
extract_4s(double, vecmask)
%v1 = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * null,
<4 x i32> %ptrs_1, <4 x double> %vecmask_1, i8 1)
%v2 = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * null,
<4 x i32> %ptrs_2, <4 x double> %vecmask_2, i8 1)
%v3 = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * null,
<4 x i32> %ptrs_3, <4 x double> %vecmask_3, i8 1)
%v4 = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * null,
<4 x i32> %ptrs_4, <4 x double> %vecmask_4, i8 1)
assemble_4s(double, v, v1, v2, v3, v4)
ret <16 x double> %v
}
define <16 x double> @__gather64_double(<16 x i64> %ptrs,
<16 x i32> %mask32) nounwind readonly alwaysinline {
%vecmask64 = sext <16 x i32> %mask32 to <16 x i64>
%vecmask = bitcast <16 x i64> %vecmask64 to <16 x double>
extract_4s(i64, ptrs)
extract_4s(double, vecmask)
%v1 = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * null,
<4 x i64> %ptrs_1, <4 x double> %vecmask_1, i8 1)
%v2 = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * null,
<4 x i64> %ptrs_2, <4 x double> %vecmask_2, i8 1)
%v3 = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * null,
<4 x i64> %ptrs_3, <4 x double> %vecmask_3, i8 1)
%v4 = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * null,
<4 x i64> %ptrs_4, <4 x double> %vecmask_4, i8 1)
assemble_4s(double, v, v1, v2, v3, v4)
ret <16 x double> %v
}

409
builtins/target-avx2.ll Normal file
View File

@@ -0,0 +1,409 @@
;; Copyright (c) 2010-2013, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
define(`HAVE_GATHER', `1')
include(`target-avx.ll')
rdrand_definition()
saturation_arithmetic()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int min/max
declare <8 x i32> @llvm.x86.avx2.pmins.d(<8 x i32>, <8 x i32>) nounwind readonly
declare <8 x i32> @llvm.x86.avx2.pmaxs.d(<8 x i32>, <8 x i32>) nounwind readonly
define <8 x i32> @__min_varying_int32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
%m = call <8 x i32> @llvm.x86.avx2.pmins.d(<8 x i32> %0, <8 x i32> %1)
ret <8 x i32> %m
}
define <8 x i32> @__max_varying_int32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
%m = call <8 x i32> @llvm.x86.avx2.pmaxs.d(<8 x i32> %0, <8 x i32> %1)
ret <8 x i32> %m
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unsigned int min/max
declare <8 x i32> @llvm.x86.avx2.pminu.d(<8 x i32>, <8 x i32>) nounwind readonly
declare <8 x i32> @llvm.x86.avx2.pmaxu.d(<8 x i32>, <8 x i32>) nounwind readonly
define <8 x i32> @__min_varying_uint32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
%m = call <8 x i32> @llvm.x86.avx2.pminu.d(<8 x i32> %0, <8 x i32> %1)
ret <8 x i32> %m
}
define <8 x i32> @__max_varying_uint32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
%m = call <8 x i32> @llvm.x86.avx2.pmaxu.d(<8 x i32> %0, <8 x i32> %1)
ret <8 x i32> %m
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float/half conversions
declare <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16>) nounwind readnone
; 0 is round nearest even
declare <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float>, i32) nounwind readnone
define <8 x float> @__half_to_float_varying(<8 x i16> %v) nounwind readnone {
%r = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %v)
ret <8 x float> %r
}
define <8 x i16> @__float_to_half_varying(<8 x float> %v) nounwind readnone {
%r = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %v, i32 0)
ret <8 x i16> %r
}
define float @__half_to_float_uniform(i16 %v) nounwind readnone {
%v1 = bitcast i16 %v to <1 x i16>
%vv = shufflevector <1 x i16> %v1, <1 x i16> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%rv = call <8 x float> @llvm.x86.vcvtph2ps.256(<8 x i16> %vv)
%r = extractelement <8 x float> %rv, i32 0
ret float %r
}
define i16 @__float_to_half_uniform(float %v) nounwind readnone {
%v1 = bitcast float %v to <1 x float>
%vv = shufflevector <1 x float> %v1, <1 x float> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
; round to nearest even
%rv = call <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %vv, i32 0)
%r = extractelement <8 x i16> %rv, i32 0
ret i16 %r
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather
declare void @llvm.trap() noreturn nounwind
define(`extract_4s', `
%$2_1 = shufflevector <8 x $1> %$2, <8 x $1> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%$2_2 = shufflevector <8 x $1> %$2, <8 x $1> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
')
gen_gather(i8)
gen_gather(i16)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int32 gathers
declare <8 x i32> @llvm.x86.avx2.gather.d.d.256(<8 x i32> %target, i8 * %ptr,
<8 x i32> %indices, <8 x i32> %mask, i8 %scale) readonly nounwind
declare <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> %target, i8 * %ptr,
<4 x i64> %indices, <4 x i32> %mask, i8 %scale) readonly nounwind
define <8 x i32> @__gather_base_offsets32_i32(i8 * %ptr,
i32 %scale, <8 x i32> %offsets,
<8 x i32> %vecmask) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%v = call <8 x i32> @llvm.x86.avx2.gather.d.d.256(<8 x i32> undef, i8 * %ptr,
<8 x i32> %offsets, <8 x i32> %vecmask, i8 %scale8)
ret <8 x i32> %v
}
define <8 x i32> @__gather_base_offsets64_i32(i8 * %ptr,
i32 %scale, <8 x i64> %offsets,
<8 x i32> %vecmask) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
extract_4s(i32, vecmask)
extract_4s(i64, offsets)
%v1 = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * %ptr,
<4 x i64> %offsets_1, <4 x i32> %vecmask_1, i8 %scale8)
%v2 = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * %ptr,
<4 x i64> %offsets_2, <4 x i32> %vecmask_2, i8 %scale8)
%v = shufflevector <4 x i32> %v1, <4 x i32> %v2,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i32> %v
}
define <8 x i32> @__gather32_i32(<8 x i32> %ptrs,
<8 x i32> %vecmask) nounwind readonly alwaysinline {
%v = call <8 x i32> @llvm.x86.avx2.gather.d.d.256(<8 x i32> undef, i8 * null,
<8 x i32> %ptrs, <8 x i32> %vecmask, i8 1)
ret <8 x i32> %v
}
define <8 x i32> @__gather64_i32(<8 x i64> %ptrs,
<8 x i32> %vecmask) nounwind readonly alwaysinline {
extract_4s(i64, ptrs)
extract_4s(i32, vecmask)
%v1 = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * null,
<4 x i64> %ptrs_1, <4 x i32> %vecmask_1, i8 1)
%v2 = call <4 x i32> @llvm.x86.avx2.gather.q.d.256(<4 x i32> undef, i8 * null,
<4 x i64> %ptrs_2, <4 x i32> %vecmask_2, i8 1)
%v = shufflevector <4 x i32> %v1, <4 x i32> %v2,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i32> %v
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float gathers
declare <8 x float> @llvm.x86.avx2.gather.d.ps.256(<8 x float> %target, i8 * %ptr,
<8 x i32> %indices, <8 x float> %mask, i8 %scale8) readonly nounwind
declare <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> %target, i8 * %ptr,
<4 x i64> %indices, <4 x float> %mask, i8 %scale8) readonly nounwind
define <8 x float> @__gather_base_offsets32_float(i8 * %ptr,
i32 %scale, <8 x i32> %offsets,
<8 x i32> %vecmask) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%mask = bitcast <8 x i32> %vecmask to <8 x float>
%v = call <8 x float> @llvm.x86.avx2.gather.d.ps.256(<8 x float> undef, i8 * %ptr,
<8 x i32> %offsets, <8 x float> %mask, i8 %scale8)
ret <8 x float> %v
}
define <8 x float> @__gather_base_offsets64_float(i8 * %ptr,
i32 %scale, <8 x i64> %offsets,
<8 x i32> %vecmask) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%mask = bitcast <8 x i32> %vecmask to <8 x float>
extract_4s(i64, offsets)
extract_4s(float, mask)
%v1 = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * %ptr,
<4 x i64> %offsets_1, <4 x float> %mask_1, i8 %scale8)
%v2 = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * %ptr,
<4 x i64> %offsets_2, <4 x float> %mask_2, i8 %scale8)
%v = shufflevector <4 x float> %v1, <4 x float> %v2,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x float> %v
}
define <8 x float> @__gather32_float(<8 x i32> %ptrs,
<8 x i32> %vecmask) nounwind readonly alwaysinline {
%mask = bitcast <8 x i32> %vecmask to <8 x float>
%v = call <8 x float> @llvm.x86.avx2.gather.d.ps.256(<8 x float> undef, i8 * null,
<8 x i32> %ptrs, <8 x float> %mask, i8 1)
ret <8 x float> %v
}
define <8 x float> @__gather64_float(<8 x i64> %ptrs,
<8 x i32> %vecmask) nounwind readonly alwaysinline {
%mask = bitcast <8 x i32> %vecmask to <8 x float>
extract_4s(i64, ptrs)
extract_4s(float, mask)
%v1 = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * null,
<4 x i64> %ptrs_1, <4 x float> %mask_1, i8 1)
%v2 = call <4 x float> @llvm.x86.avx2.gather.q.ps.256(<4 x float> undef, i8 * null,
<4 x i64> %ptrs_2, <4 x float> %mask_2, i8 1)
%v = shufflevector <4 x float> %v1, <4 x float> %v2,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x float> %v
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int64 gathers
declare <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> %target, i8 * %ptr,
<4 x i32> %indices, <4 x i64> %mask, i8 %scale) readonly nounwind
declare <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> %target, i8 * %ptr,
<4 x i64> %indices, <4 x i64> %mask, i8 %scale) readonly nounwind
define <8 x i64> @__gather_base_offsets32_i64(i8 * %ptr,
i32 %scale, <8 x i32> %offsets,
<8 x i32> %mask32) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask = sext <8 x i32> %mask32 to <8 x i64>
extract_4s(i32, offsets)
extract_4s(i64, vecmask)
%v1 = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i32> %offsets_1, <4 x i64> %vecmask_1, i8 %scale8)
%v2 = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i32> %offsets_2, <4 x i64> %vecmask_2, i8 %scale8)
%v = shufflevector <4 x i64> %v1, <4 x i64> %v2,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i64> %v
}
define <8 x i64> @__gather_base_offsets64_i64(i8 * %ptr,
i32 %scale, <8 x i64> %offsets,
<8 x i32> %mask32) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask = sext <8 x i32> %mask32 to <8 x i64>
extract_4s(i64, offsets)
extract_4s(i64, vecmask)
%v1 = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i64> %offsets_1, <4 x i64> %vecmask_1, i8 %scale8)
%v2 = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * %ptr,
<4 x i64> %offsets_2, <4 x i64> %vecmask_2, i8 %scale8)
%v = shufflevector <4 x i64> %v1, <4 x i64> %v2,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i64> %v
}
define <8 x i64> @__gather32_i64(<8 x i32> %ptrs,
<8 x i32> %mask32) nounwind readonly alwaysinline {
%vecmask = sext <8 x i32> %mask32 to <8 x i64>
extract_4s(i32, ptrs)
extract_4s(i64, vecmask)
%v1 = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * null,
<4 x i32> %ptrs_1, <4 x i64> %vecmask_1, i8 1)
%v2 = call <4 x i64> @llvm.x86.avx2.gather.d.q.256(<4 x i64> undef, i8 * null,
<4 x i32> %ptrs_2, <4 x i64> %vecmask_2, i8 1)
%v = shufflevector <4 x i64> %v1, <4 x i64> %v2,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i64> %v
}
define <8 x i64> @__gather64_i64(<8 x i64> %ptrs,
<8 x i32> %mask32) nounwind readonly alwaysinline {
%vecmask = sext <8 x i32> %mask32 to <8 x i64>
extract_4s(i64, ptrs)
extract_4s(i64, vecmask)
%v1 = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * null,
<4 x i64> %ptrs_1, <4 x i64> %vecmask_1, i8 1)
%v2 = call <4 x i64> @llvm.x86.avx2.gather.q.q.256(<4 x i64> undef, i8 * null,
<4 x i64> %ptrs_2, <4 x i64> %vecmask_2, i8 1)
%v = shufflevector <4 x i64> %v1, <4 x i64> %v2,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i64> %v
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double gathers
declare <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> %target, i8 * %ptr,
<4 x i64> %indices, <4 x double> %mask, i8 %scale) readonly nounwind
declare <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> %target, i8 * %ptr,
<4 x i32> %indices, <4 x double> %mask, i8 %scale) readonly nounwind
define <8 x double> @__gather_base_offsets32_double(i8 * %ptr,
i32 %scale, <8 x i32> %offsets,
<8 x i32> %mask32) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask64 = sext <8 x i32> %mask32 to <8 x i64>
%vecmask = bitcast <8 x i64> %vecmask64 to <8 x double>
extract_4s(i32, offsets)
extract_4s(double, vecmask)
%v1 = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i32> %offsets_1, <4 x double> %vecmask_1, i8 %scale8)
%v2 = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i32> %offsets_2, <4 x double> %vecmask_2, i8 %scale8)
%v = shufflevector <4 x double> %v1, <4 x double> %v2,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x double> %v
}
define <8 x double> @__gather_base_offsets64_double(i8 * %ptr,
i32 %scale, <8 x i64> %offsets,
<8 x i32> %mask32) nounwind readonly alwaysinline {
%scale8 = trunc i32 %scale to i8
%vecmask64 = sext <8 x i32> %mask32 to <8 x i64>
%vecmask = bitcast <8 x i64> %vecmask64 to <8 x double>
extract_4s(i64, offsets)
extract_4s(double, vecmask)
%v1 = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i64> %offsets_1, <4 x double> %vecmask_1, i8 %scale8)
%v2 = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * %ptr,
<4 x i64> %offsets_2, <4 x double> %vecmask_2, i8 %scale8)
%v = shufflevector <4 x double> %v1, <4 x double> %v2,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x double> %v
}
define <8 x double> @__gather32_double(<8 x i32> %ptrs,
<8 x i32> %mask32) nounwind readonly alwaysinline {
%vecmask64 = sext <8 x i32> %mask32 to <8 x i64>
%vecmask = bitcast <8 x i64> %vecmask64 to <8 x double>
extract_4s(i32, ptrs)
extract_4s(double, vecmask)
%v1 = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * null,
<4 x i32> %ptrs_1, <4 x double> %vecmask_1, i8 1)
%v2 = call <4 x double> @llvm.x86.avx2.gather.d.pd.256(<4 x double> undef, i8 * null,
<4 x i32> %ptrs_2, <4 x double> %vecmask_2, i8 1)
%v = shufflevector <4 x double> %v1, <4 x double> %v2,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x double> %v
}
define <8 x double> @__gather64_double(<8 x i64> %ptrs,
<8 x i32> %mask32) nounwind readonly alwaysinline {
%vecmask64 = sext <8 x i32> %mask32 to <8 x i64>
%vecmask = bitcast <8 x i64> %vecmask64 to <8 x double>
extract_4s(i64, ptrs)
extract_4s(double, vecmask)
%v1 = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * null,
<4 x i64> %ptrs_1, <4 x double> %vecmask_1, i8 1)
%v2 = call <4 x double> @llvm.x86.avx2.gather.q.pd.256(<4 x double> undef, i8 * null,
<4 x i64> %ptrs_2, <4 x double> %vecmask_2, i8 1)
%v = shufflevector <4 x double> %v1, <4 x double> %v2,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x double> %v
}

File diff suppressed because it is too large Load Diff

1036
builtins/target-generic-1.ll Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2010-2011, Intel Corporation
;; Copyright (c) 2010-2014, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -31,4 +31,4 @@
define(`WIDTH',`16')
include(`target-generic-common.ll')
saturation_arithmetic_novec()

View File

@@ -0,0 +1,34 @@
;; Copyright (c) 2010-2014, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
define(`WIDTH',`32')
include(`target-generic-common.ll')
saturation_arithmetic_novec()

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2010-2011, Intel Corporation
;; Copyright (c) 2010-2014, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -31,4 +31,4 @@
define(`WIDTH',`4')
include(`target-generic-common.ll')
saturation_arithmetic_novec()

View File

@@ -0,0 +1,34 @@
;; Copyright (c) 2010-2014, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
define(`WIDTH',`64')
include(`target-generic-common.ll')
saturation_arithmetic_novec()

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2010-2011, Intel Corporation
;; Copyright (c) 2010-2014, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -31,4 +31,4 @@
define(`WIDTH',`8')
include(`target-generic-common.ll')
saturation_arithmetic_novec()

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2010-2011, Intel Corporation
;; Copyright (c) 2010-2015, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -29,12 +29,18 @@
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128-v16:16:16-v32:32:32-v4:128:128";
define(`MASK',`i1')
define(`HAVE_GATHER',`1')
define(`HAVE_SCATTER',`1')
include(`util.m4')
stdlib_core()
scans()
reduce_equal(WIDTH)
rdrand_decls()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; broadcast/rotate/shuffle
@@ -46,6 +52,20 @@ declare <WIDTH x i16> @__smear_i16(i16) nounwind readnone
declare <WIDTH x i32> @__smear_i32(i32) nounwind readnone
declare <WIDTH x i64> @__smear_i64(i64) nounwind readnone
declare <WIDTH x float> @__setzero_float() nounwind readnone
declare <WIDTH x double> @__setzero_double() nounwind readnone
declare <WIDTH x i8> @__setzero_i8() nounwind readnone
declare <WIDTH x i16> @__setzero_i16() nounwind readnone
declare <WIDTH x i32> @__setzero_i32() nounwind readnone
declare <WIDTH x i64> @__setzero_i64() nounwind readnone
declare <WIDTH x float> @__undef_float() nounwind readnone
declare <WIDTH x double> @__undef_double() nounwind readnone
declare <WIDTH x i8> @__undef_i8() nounwind readnone
declare <WIDTH x i16> @__undef_i16() nounwind readnone
declare <WIDTH x i32> @__undef_i32() nounwind readnone
declare <WIDTH x i64> @__undef_i64() nounwind readnone
declare <WIDTH x float> @__broadcast_float(<WIDTH x float>, i32) nounwind readnone
declare <WIDTH x double> @__broadcast_double(<WIDTH x double>, i32) nounwind readnone
declare <WIDTH x i8> @__broadcast_i8(<WIDTH x i8>, i32) nounwind readnone
@@ -60,6 +80,13 @@ declare <WIDTH x i32> @__rotate_i32(<WIDTH x i32>, i32) nounwind readnone
declare <WIDTH x double> @__rotate_double(<WIDTH x double>, i32) nounwind readnone
declare <WIDTH x i64> @__rotate_i64(<WIDTH x i64>, i32) nounwind readnone
declare <WIDTH x i8> @__shift_i8(<WIDTH x i8>, i32) nounwind readnone
declare <WIDTH x i16> @__shift_i16(<WIDTH x i16>, i32) nounwind readnone
declare <WIDTH x float> @__shift_float(<WIDTH x float>, i32) nounwind readnone
declare <WIDTH x i32> @__shift_i32(<WIDTH x i32>, i32) nounwind readnone
declare <WIDTH x double> @__shift_double(<WIDTH x double>, i32) nounwind readnone
declare <WIDTH x i64> @__shift_i64(<WIDTH x i64>, i32) nounwind readnone
declare <WIDTH x i8> @__shuffle_i8(<WIDTH x i8>, <WIDTH x i32>) nounwind readnone
declare <WIDTH x i8> @__shuffle2_i8(<WIDTH x i8>, <WIDTH x i8>,
<WIDTH x i32>) nounwind readnone
@@ -98,6 +125,14 @@ declare void @__aos_to_soa4_float(float * noalias %p, <WIDTH x float> * noalias
<WIDTH x float> * noalias %out2,
<WIDTH x float> * noalias %out3) nounwind
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
declare float @__half_to_float_uniform(i16 %v) nounwind readnone
declare <WIDTH x float> @__half_to_float_varying(<WIDTH x i16> %v) nounwind readnone
declare i16 @__float_to_half_uniform(float %v) nounwind readnone
declare <WIDTH x i16> @__float_to_half_varying(<WIDTH x float> %v) nounwind readnone
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; math
@@ -159,6 +194,7 @@ declare float @__rcp_uniform_float(float) nounwind readnone
declare float @__sqrt_uniform_float(float) nounwind readnone
declare <WIDTH x float> @__rcp_varying_float(<WIDTH x float>) nounwind readnone
declare <WIDTH x float> @__rsqrt_varying_float(<WIDTH x float>) nounwind readnone
declare <WIDTH x float> @__sqrt_varying_float(<WIDTH x float>) nounwind readnone
declare double @__sqrt_uniform_double(double) nounwind readnone
@@ -174,36 +210,34 @@ declare i64 @__count_trailing_zeros_i64(i64) nounwind readnone
declare i32 @__count_leading_zeros_i32(i32) nounwind readnone
declare i64 @__count_leading_zeros_i64(i64) nounwind readnone
;; svml
; FIXME: need either to wire these up to the 8-wide SVML entrypoints,
; or, use the macro to call the 4-wide ones twice with our 8-wide
; vectors...
declare <WIDTH x float> @__svml_sin(<WIDTH x float>)
declare <WIDTH x float> @__svml_cos(<WIDTH x float>)
declare void @__svml_sincos(<WIDTH x float>, <WIDTH x float> *, <WIDTH x float> *)
declare <WIDTH x float> @__svml_tan(<WIDTH x float>)
declare <WIDTH x float> @__svml_atan(<WIDTH x float>)
declare <WIDTH x float> @__svml_atan2(<WIDTH x float>, <WIDTH x float>)
declare <WIDTH x float> @__svml_exp(<WIDTH x float>)
declare <WIDTH x float> @__svml_log(<WIDTH x float>)
declare <WIDTH x float> @__svml_pow(<WIDTH x float>, <WIDTH x float>)
;; svml
include(`svml.m4')
svml_stubs(float,f,WIDTH)
svml_stubs(double,d,WIDTH)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reductions
declare i32 @__movmsk(<WIDTH x i1>) nounwind readnone
declare i64 @__movmsk(<WIDTH x i1>) nounwind readnone
declare i1 @__any(<WIDTH x i1>) nounwind readnone
declare i1 @__all(<WIDTH x i1>) nounwind readnone
declare i1 @__none(<WIDTH x i1>) nounwind readnone
declare i16 @__reduce_add_int8(<WIDTH x i8>) nounwind readnone
declare i32 @__reduce_add_int16(<WIDTH x i16>) nounwind readnone
declare float @__reduce_add_float(<WIDTH x float>) nounwind readnone
declare float @__reduce_min_float(<WIDTH x float>) nounwind readnone
declare float @__reduce_max_float(<WIDTH x float>) nounwind readnone
declare i32 @__reduce_add_int32(<WIDTH x i32>) nounwind readnone
declare i64 @__reduce_add_int32(<WIDTH x i32>) nounwind readnone
declare i32 @__reduce_min_int32(<WIDTH x i32>) nounwind readnone
declare i32 @__reduce_max_int32(<WIDTH x i32>) nounwind readnone
declare i32 @__reduce_add_uint32(<WIDTH x i32>) nounwind readnone
declare i32 @__reduce_min_uint32(<WIDTH x i32>) nounwind readnone
declare i32 @__reduce_max_uint32(<WIDTH x i32>) nounwind readnone
@@ -214,82 +248,99 @@ declare double @__reduce_max_double(<WIDTH x double>) nounwind readnone
declare i64 @__reduce_add_int64(<WIDTH x i64>) nounwind readnone
declare i64 @__reduce_min_int64(<WIDTH x i64>) nounwind readnone
declare i64 @__reduce_max_int64(<WIDTH x i64>) nounwind readnone
declare i64 @__reduce_add_uint64(<WIDTH x i64>) nounwind readnone
declare i64 @__reduce_min_uint64(<WIDTH x i64>) nounwind readnone
declare i64 @__reduce_max_uint64(<WIDTH x i64>) nounwind readnone
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unaligned loads/loads+broadcasts
load_and_broadcast(WIDTH, i8, 8)
load_and_broadcast(WIDTH, i16, 16)
load_and_broadcast(WIDTH, i32, 32)
load_and_broadcast(WIDTH, i64, 64)
declare <WIDTH x i8> @__masked_load_8(i8 * nocapture, <WIDTH x i1> %mask) nounwind readonly
declare <WIDTH x i16> @__masked_load_16(i8 * nocapture, <WIDTH x i1> %mask) nounwind readonly
declare <WIDTH x i32> @__masked_load_32(i8 * nocapture, <WIDTH x i1> %mask) nounwind readonly
declare <WIDTH x i64> @__masked_load_64(i8 * nocapture, <WIDTH x i1> %mask) nounwind readonly
declare <WIDTH x i8> @__masked_load_i8(i8 * nocapture, <WIDTH x i1> %mask) nounwind readonly
declare <WIDTH x i16> @__masked_load_i16(i8 * nocapture, <WIDTH x i1> %mask) nounwind readonly
declare <WIDTH x i32> @__masked_load_i32(i8 * nocapture, <WIDTH x i1> %mask) nounwind readonly
declare <WIDTH x float> @__masked_load_float(i8 * nocapture, <WIDTH x i1> %mask) nounwind readonly
declare <WIDTH x i64> @__masked_load_i64(i8 * nocapture, <WIDTH x i1> %mask) nounwind readonly
declare <WIDTH x double> @__masked_load_double(i8 * nocapture, <WIDTH x i1> %mask) nounwind readonly
declare void @__masked_store_8(<WIDTH x i8>* nocapture, <WIDTH x i8>,
declare void @__masked_store_i8(<WIDTH x i8>* nocapture, <WIDTH x i8>,
<WIDTH x i1>) nounwind
declare void @__masked_store_16(<WIDTH x i16>* nocapture, <WIDTH x i16>,
<WIDTH x i1>) nounwind
declare void @__masked_store_32(<WIDTH x i32>* nocapture, <WIDTH x i32>,
<WIDTH x i1>) nounwind
declare void @__masked_store_64(<WIDTH x i64>* nocapture, <WIDTH x i64>,
<WIDTH x i1> %mask) nounwind
declare void @__masked_store_i16(<WIDTH x i16>* nocapture, <WIDTH x i16>,
<WIDTH x i1>) nounwind
declare void @__masked_store_i32(<WIDTH x i32>* nocapture, <WIDTH x i32>,
<WIDTH x i1>) nounwind
declare void @__masked_store_float(<WIDTH x float>* nocapture, <WIDTH x float>,
<WIDTH x i1>) nounwind
declare void @__masked_store_i64(<WIDTH x i64>* nocapture, <WIDTH x i64>,
<WIDTH x i1> %mask) nounwind
declare void @__masked_store_double(<WIDTH x double>* nocapture, <WIDTH x double>,
<WIDTH x i1> %mask) nounwind
define void @__masked_store_blend_8(<WIDTH x i8>* nocapture, <WIDTH x i8>,
<WIDTH x i1>) nounwind {
%v = load <WIDTH x i8> * %0
define void @__masked_store_blend_i8(<WIDTH x i8>* nocapture, <WIDTH x i8>,
<WIDTH x i1>) nounwind alwaysinline {
%v = load PTR_OP_ARGS(`<WIDTH x i8> ') %0
%v1 = select <WIDTH x i1> %2, <WIDTH x i8> %1, <WIDTH x i8> %v
store <WIDTH x i8> %v1, <WIDTH x i8> * %0
ret void
}
define void @__masked_store_blend_16(<WIDTH x i16>* nocapture, <WIDTH x i16>,
<WIDTH x i1>) nounwind {
%v = load <WIDTH x i16> * %0
define void @__masked_store_blend_i16(<WIDTH x i16>* nocapture, <WIDTH x i16>,
<WIDTH x i1>) nounwind alwaysinline {
%v = load PTR_OP_ARGS(`<WIDTH x i16> ') %0
%v1 = select <WIDTH x i1> %2, <WIDTH x i16> %1, <WIDTH x i16> %v
store <WIDTH x i16> %v1, <WIDTH x i16> * %0
ret void
}
define void @__masked_store_blend_32(<WIDTH x i32>* nocapture, <WIDTH x i32>,
<WIDTH x i1>) nounwind {
%v = load <WIDTH x i32> * %0
define void @__masked_store_blend_i32(<WIDTH x i32>* nocapture, <WIDTH x i32>,
<WIDTH x i1>) nounwind alwaysinline {
%v = load PTR_OP_ARGS(`<WIDTH x i32> ') %0
%v1 = select <WIDTH x i1> %2, <WIDTH x i32> %1, <WIDTH x i32> %v
store <WIDTH x i32> %v1, <WIDTH x i32> * %0
ret void
}
define void @__masked_store_blend_64(<WIDTH x i64>* nocapture,
<WIDTH x i64>, <WIDTH x i1>) nounwind {
%v = load <WIDTH x i64> * %0
define void @__masked_store_blend_float(<WIDTH x float>* nocapture, <WIDTH x float>,
<WIDTH x i1>) nounwind alwaysinline {
%v = load PTR_OP_ARGS(`<WIDTH x float> ') %0
%v1 = select <WIDTH x i1> %2, <WIDTH x float> %1, <WIDTH x float> %v
store <WIDTH x float> %v1, <WIDTH x float> * %0
ret void
}
define void @__masked_store_blend_i64(<WIDTH x i64>* nocapture,
<WIDTH x i64>, <WIDTH x i1>) nounwind alwaysinline {
%v = load PTR_OP_ARGS(`<WIDTH x i64> ') %0
%v1 = select <WIDTH x i1> %2, <WIDTH x i64> %1, <WIDTH x i64> %v
store <WIDTH x i64> %v1, <WIDTH x i64> * %0
ret void
}
define void @__masked_store_blend_double(<WIDTH x double>* nocapture,
<WIDTH x double>, <WIDTH x i1>) nounwind alwaysinline {
%v = load PTR_OP_ARGS(`<WIDTH x double> ') %0
%v1 = select <WIDTH x i1> %2, <WIDTH x double> %1, <WIDTH x double> %v
store <WIDTH x double> %v1, <WIDTH x double> * %0
ret void
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather/scatter
define(`gather_scatter', `
declare <WIDTH x $1> @__gather_base_offsets32_$1(i8 * nocapture, <WIDTH x i32>,
i32, <WIDTH x i1>) nounwind readonly
declare <WIDTH x $1> @__gather_base_offsets64_$1(i8 * nocapture, <WIDTH x i64>,
i32, <WIDTH x i1>) nounwind readonly
declare <WIDTH x $1> @__gather_base_offsets32_$1(i8 * nocapture, i32, <WIDTH x i32>,
<WIDTH x i1>) nounwind readonly
declare <WIDTH x $1> @__gather_base_offsets64_$1(i8 * nocapture, i32, <WIDTH x i64>,
<WIDTH x i1>) nounwind readonly
declare <WIDTH x $1> @__gather32_$1(<WIDTH x i32>,
<WIDTH x i1>) nounwind readonly
declare <WIDTH x $1> @__gather64_$1(<WIDTH x i64>,
<WIDTH x i1>) nounwind readonly
declare void @__scatter_base_offsets32_$1(i8* nocapture, <WIDTH x i32>,
i32, <WIDTH x $1>, <WIDTH x i1>) nounwind
declare void @__scatter_base_offsets64_$1(i8* nocapture, <WIDTH x i64>,
i32, <WIDTH x $1>, <WIDTH x i1>) nounwind
declare void @__scatter_base_offsets32_$1(i8* nocapture, i32, <WIDTH x i32>,
<WIDTH x $1>, <WIDTH x i1>) nounwind
declare void @__scatter_base_offsets64_$1(i8* nocapture, i32, <WIDTH x i64>,
<WIDTH x $1>, <WIDTH x i1>) nounwind
declare void @__scatter32_$1(<WIDTH x i32>, <WIDTH x $1>,
<WIDTH x i1>) nounwind
declare void @__scatter64_$1(<WIDTH x i64>, <WIDTH x $1>,
@@ -299,12 +350,16 @@ declare void @__scatter64_$1(<WIDTH x i64>, <WIDTH x $1>,
gather_scatter(i8)
gather_scatter(i16)
gather_scatter(i32)
gather_scatter(float)
gather_scatter(i64)
gather_scatter(double)
declare i32 @__packed_load_active(i32 * nocapture, <WIDTH x i32> * nocapture,
<WIDTH x i1>) nounwind
declare i32 @__packed_store_active(i32 * nocapture, <WIDTH x i32> %vals,
<WIDTH x i1>) nounwind
declare i32 @__packed_store_active2(i32 * nocapture, <WIDTH x i32> %vals,
<WIDTH x i1>) nounwind
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@@ -315,3 +370,25 @@ declare void @__prefetch_read_uniform_2(i8 * nocapture) nounwind
declare void @__prefetch_read_uniform_3(i8 * nocapture) nounwind
declare void @__prefetch_read_uniform_nt(i8 * nocapture) nounwind
declare void @__prefetch_read_varying_1(<WIDTH x i64> %addr, <WIDTH x MASK> %mask) nounwind
declare void @__prefetch_read_varying_1_native(i8 * %base, i32 %scale, <WIDTH x i32> %offsets, <WIDTH x MASK> %mask) nounwind
declare void @__prefetch_read_varying_2(<WIDTH x i64> %addr, <WIDTH x MASK> %mask) nounwind
declare void @__prefetch_read_varying_2_native(i8 * %base, i32 %scale, <WIDTH x i32> %offsets, <WIDTH x MASK> %mask) nounwind
declare void @__prefetch_read_varying_3(<WIDTH x i64> %addr, <WIDTH x MASK> %mask) nounwind
declare void @__prefetch_read_varying_3_native(i8 * %base, i32 %scale, <WIDTH x i32> %offsets, <WIDTH x MASK> %mask) nounwind
declare void @__prefetch_read_varying_nt(<WIDTH x i64> %addr, <WIDTH x MASK> %mask) nounwind
declare void @__prefetch_read_varying_nt_native(i8 * %base, i32 %scale, <WIDTH x i32> %offsets, <WIDTH x MASK> %mask) nounwind
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int8/int16 builtins
define_avgs()
declare_nvptx()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reciprocals in double precision, if supported
rsqrtd_decl()
rcpd_decl()
transcendetals_decl()
trigonometry_decl()

74
builtins/target-knl.ll Normal file
View File

@@ -0,0 +1,74 @@
;; Copyright (c) 2015-2016, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
define(`WIDTH',`16')
ifelse(LLVM_VERSION, LLVM_3_7,
`include(`target-avx512-common.ll')',
LLVM_VERSION, LLVM_3_8,
`include(`target-avx512-common.ll')',
LLVM_VERSION, LLVM_3_9,
`include(`target-avx512-common.ll')',
LLVM_VERSION, LLVM_4_0,
`include(`target-avx512-common.ll')',
LLVM_VERSION, LLVM_5_0,
`include(`target-avx512-common.ll')'
)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rcp, rsqrt
define(`rcp_rsqrt_varying_float_knl',`
declare <16 x float> @llvm.x86.avx512.rcp28.ps(<16 x float>, <16 x float>, i16, i32) nounwind readnone
define <16 x float> @__rcp_varying_float(<16 x float>) nounwind readonly alwaysinline {
%res = call <16 x float> @llvm.x86.avx512.rcp28.ps(<16 x float> %0, <16 x float> undef, i16 -1, i32 8)
ret <16 x float> %res
}
declare <16 x float> @llvm.x86.avx512.rsqrt28.ps(<16 x float>, <16 x float>, i16, i32) nounwind readnone
define <16 x float> @__rsqrt_varying_float(<16 x float> %v) nounwind readonly alwaysinline {
%res = call <16 x float> @llvm.x86.avx512.rsqrt28.ps(<16 x float> %v, <16 x float> undef, i16 -1, i32 8)
ret <16 x float> %res
}
')
ifelse(LLVM_VERSION, LLVM_3_7,
rcp_rsqrt_varying_float_knl(),
LLVM_VERSION, LLVM_3_8,
rcp_rsqrt_varying_float_knl(),
LLVM_VERSION, LLVM_3_9,
rcp_rsqrt_varying_float_knl(),
LLVM_VERSION, LLVM_4_0,
rcp_rsqrt_varying_float_knl(),
LLVM_VERSION, LLVM_5_0,
rcp_rsqrt_varying_float_knl()
)
;;saturation_arithmetic_novec()

527
builtins/target-neon-16.ll Normal file
View File

@@ -0,0 +1,527 @@
;;
;; target-neon-16.ll
;;
;; Copyright(c) 2013-2015 Google, Inc.
;;
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Matt Pharr nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
define(`WIDTH',`8')
define(`MASK',`i16')
include(`util.m4')
include(`target-neon-common.ll')
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
define <8 x float> @__half_to_float_varying(<8 x i16> %v) nounwind readnone alwaysinline {
unary4to8conv(r, i16, float, @llvm.arm.neon.vcvthf2fp, %v)
ret <8 x float> %r
}
define <8 x i16> @__float_to_half_varying(<8 x float> %v) nounwind readnone alwaysinline {
unary4to8conv(r, float, i16, @llvm.arm.neon.vcvtfp2hf, %v)
ret <8 x i16> %r
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; math
;; round/floor/ceil
;; FIXME: grabbed these from the sse2 target, which does not have native
;; instructions for these. Is there a better approach for NEON?
define <8 x float> @__round_varying_float(<8 x float>) nounwind readonly alwaysinline {
%float_to_int_bitcast.i.i.i.i = bitcast <8 x float> %0 to <8 x i32>
%bitop.i.i = and <8 x i32> %float_to_int_bitcast.i.i.i.i,
<i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648,
i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>
%bitop.i = xor <8 x i32> %float_to_int_bitcast.i.i.i.i, %bitop.i.i
%int_to_float_bitcast.i.i40.i = bitcast <8 x i32> %bitop.i to <8 x float>
%binop.i = fadd <8 x float> %int_to_float_bitcast.i.i40.i,
<float 8.388608e+06, float 8.388608e+06, float 8.388608e+06, float 8.388608e+06,
float 8.388608e+06, float 8.388608e+06, float 8.388608e+06, float 8.388608e+06>
%binop21.i = fadd <8 x float> %binop.i,
<float -8.388608e+06, float -8.388608e+06, float -8.388608e+06, float -8.388608e+06,
float -8.388608e+06, float -8.388608e+06, float -8.388608e+06, float -8.388608e+06>
%float_to_int_bitcast.i.i.i = bitcast <8 x float> %binop21.i to <8 x i32>
%bitop31.i = xor <8 x i32> %float_to_int_bitcast.i.i.i, %bitop.i.i
%int_to_float_bitcast.i.i.i = bitcast <8 x i32> %bitop31.i to <8 x float>
ret <8 x float> %int_to_float_bitcast.i.i.i
}
define <8 x float> @__floor_varying_float(<8 x float>) nounwind readonly alwaysinline {
%calltmp.i = tail call <8 x float> @__round_varying_float(<8 x float> %0) nounwind
%bincmp.i = fcmp ogt <8 x float> %calltmp.i, %0
%val_to_boolvec32.i = sext <8 x i1> %bincmp.i to <8 x i32>
%bitop.i = and <8 x i32> %val_to_boolvec32.i,
<i32 -1082130432, i32 -1082130432, i32 -1082130432, i32 -1082130432,
i32 -1082130432, i32 -1082130432, i32 -1082130432, i32 -1082130432>
%int_to_float_bitcast.i.i.i = bitcast <8 x i32> %bitop.i to <8 x float>
%binop.i = fadd <8 x float> %calltmp.i, %int_to_float_bitcast.i.i.i
ret <8 x float> %binop.i
}
define <8 x float> @__ceil_varying_float(<8 x float>) nounwind readonly alwaysinline {
%calltmp.i = tail call <8 x float> @__round_varying_float(<8 x float> %0) nounwind
%bincmp.i = fcmp olt <8 x float> %calltmp.i, %0
%val_to_boolvec32.i = sext <8 x i1> %bincmp.i to <8 x i32>
%bitop.i = and <8 x i32> %val_to_boolvec32.i,
<i32 1065353216, i32 1065353216, i32 1065353216, i32 1065353216,
i32 1065353216, i32 1065353216, i32 1065353216, i32 1065353216>
%int_to_float_bitcast.i.i.i = bitcast <8 x i32> %bitop.i to <8 x float>
%binop.i = fadd <8 x float> %calltmp.i, %int_to_float_bitcast.i.i.i
ret <8 x float> %binop.i
}
;; FIXME: rounding doubles and double vectors needs to be implemented
declare <WIDTH x double> @__round_varying_double(<WIDTH x double>) nounwind readnone
declare <WIDTH x double> @__floor_varying_double(<WIDTH x double>) nounwind readnone
declare <WIDTH x double> @__ceil_varying_double(<WIDTH x double>) nounwind readnone
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; min/max
declare <4 x float> @llvm.arm.neon.vmins.v4f32(<4 x float>, <4 x float>) nounwind readnone
declare <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float>, <4 x float>) nounwind readnone
define <WIDTH x float> @__max_varying_float(<WIDTH x float>,
<WIDTH x float>) nounwind readnone alwaysinline {
binary4to8(r, float, @llvm.arm.neon.vmaxs.v4f32, %0, %1)
ret <WIDTH x float> %r
}
define <WIDTH x float> @__min_varying_float(<WIDTH x float>,
<WIDTH x float>) nounwind readnone alwaysinline {
binary4to8(r, float, @llvm.arm.neon.vmins.v4f32, %0, %1)
ret <WIDTH x float> %r
}
declare <4 x i32> @llvm.arm.neon.vmins.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.arm.neon.vminu.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.arm.neon.vmaxs.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.arm.neon.vmaxu.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
define <WIDTH x i32> @__min_varying_int32(<WIDTH x i32>, <WIDTH x i32>) nounwind readnone alwaysinline {
binary4to8(r, i32, @llvm.arm.neon.vmins.v4i32, %0, %1)
ret <WIDTH x i32> %r
}
define <WIDTH x i32> @__max_varying_int32(<WIDTH x i32>, <WIDTH x i32>) nounwind readnone alwaysinline {
binary4to8(r, i32, @llvm.arm.neon.vmaxs.v4i32, %0, %1)
ret <WIDTH x i32> %r
}
define <WIDTH x i32> @__min_varying_uint32(<WIDTH x i32>, <WIDTH x i32>) nounwind readnone alwaysinline {
binary4to8(r, i32, @llvm.arm.neon.vminu.v4i32, %0, %1)
ret <WIDTH x i32> %r
}
define <WIDTH x i32> @__max_varying_uint32(<WIDTH x i32>, <WIDTH x i32>) nounwind readnone alwaysinline {
binary4to8(r, i32, @llvm.arm.neon.vmaxu.v4i32, %0, %1)
ret <WIDTH x i32> %r
}
;; sqrt/rsqrt/rcp
declare <4 x float> @llvm.arm.neon.vrecpe.v4f32(<4 x float>) nounwind readnone
declare <4 x float> @llvm.arm.neon.vrecps.v4f32(<4 x float>, <4 x float>) nounwind readnone
define <WIDTH x float> @__rcp_varying_float(<WIDTH x float> %d) nounwind readnone alwaysinline {
unary4to8(x0, float, @llvm.arm.neon.vrecpe.v4f32, %d)
binary4to8(x0_nr, float, @llvm.arm.neon.vrecps.v4f32, %d, %x0)
%x1 = fmul <WIDTH x float> %x0, %x0_nr
binary4to8(x1_nr, float, @llvm.arm.neon.vrecps.v4f32, %d, %x1)
%x2 = fmul <WIDTH x float> %x1, %x1_nr
ret <WIDTH x float> %x2
}
declare <4 x float> @llvm.arm.neon.vrsqrte.v4f32(<4 x float>) nounwind readnone
declare <4 x float> @llvm.arm.neon.vrsqrts.v4f32(<4 x float>, <4 x float>) nounwind readnone
define <WIDTH x float> @__rsqrt_varying_float(<WIDTH x float> %d) nounwind readnone alwaysinline {
unary4to8(x0, float, @llvm.arm.neon.vrsqrte.v4f32, %d)
%x0_2 = fmul <WIDTH x float> %x0, %x0
binary4to8(x0_nr, float, @llvm.arm.neon.vrsqrts.v4f32, %d, %x0_2)
%x1 = fmul <WIDTH x float> %x0, %x0_nr
%x1_2 = fmul <WIDTH x float> %x1, %x1
binary4to8(x1_nr, float, @llvm.arm.neon.vrsqrts.v4f32, %d, %x1_2)
%x2 = fmul <WIDTH x float> %x1, %x1_nr
ret <WIDTH x float> %x2
}
define float @__rsqrt_uniform_float(float) nounwind readnone alwaysinline {
%v1 = bitcast float %0 to <1 x float>
%vs = shufflevector <1 x float> %v1, <1 x float> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%vr = call <8 x float> @__rsqrt_varying_float(<8 x float> %vs)
%r = extractelement <8 x float> %vr, i32 0
ret float %r
}
define float @__rcp_uniform_float(float) nounwind readnone alwaysinline {
%v1 = bitcast float %0 to <1 x float>
%vs = shufflevector <1 x float> %v1, <1 x float> undef,
<8 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%vr = call <8 x float> @__rcp_varying_float(<8 x float> %vs)
%r = extractelement <8 x float> %vr, i32 0
ret float %r
}
declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)
define <WIDTH x float> @__sqrt_varying_float(<WIDTH x float>) nounwind readnone alwaysinline {
unary4to8(result, float, @llvm.sqrt.v4f32, %0)
;; this returns nan for v=0, which is undesirable..
;; %rsqrt = call <WIDTH x float> @__rsqrt_varying_float(<WIDTH x float> %0)
;; %result = fmul <4 x float> %rsqrt, %0
ret <8 x float> %result
}
declare <4 x double> @llvm.sqrt.v4f64(<4 x double>)
define <WIDTH x double> @__sqrt_varying_double(<WIDTH x double>) nounwind readnone alwaysinline {
unary4to8(r, double, @llvm.sqrt.v4f64, %0)
ret <WIDTH x double> %r
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reductions
define i64 @__movmsk(<WIDTH x MASK>) nounwind readnone alwaysinline {
%and_mask = and <WIDTH x i16> %0,
<i16 1, i16 2, i16 4, i16 8, i16 16, i16 32, i16 64, i16 128>
%v4 = call <4 x i32> @llvm.arm.neon.vpaddlu.v4i32.v8i16(<8 x i16> %and_mask)
%v2 = call <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32> %v4)
%va = extractelement <2 x i64> %v2, i32 0
%vb = extractelement <2 x i64> %v2, i32 1
%v = or i64 %va, %vb
ret i64 %v
}
define i1 @__any(<WIDTH x MASK>) nounwind readnone alwaysinline {
v8tov4(MASK, %0, %v0123, %v4567)
%vor = or <4 x MASK> %v0123, %v4567
%v0 = extractelement <4 x MASK> %vor, i32 0
%v1 = extractelement <4 x MASK> %vor, i32 1
%v2 = extractelement <4 x MASK> %vor, i32 2
%v3 = extractelement <4 x MASK> %vor, i32 3
%v01 = or MASK %v0, %v1
%v23 = or MASK %v2, %v3
%v = or MASK %v01, %v23
%cmp = icmp ne MASK %v, 0
ret i1 %cmp
}
define i1 @__all(<WIDTH x MASK>) nounwind readnone alwaysinline {
v8tov4(MASK, %0, %v0123, %v4567)
%vand = and <4 x MASK> %v0123, %v4567
%v0 = extractelement <4 x MASK> %vand, i32 0
%v1 = extractelement <4 x MASK> %vand, i32 1
%v2 = extractelement <4 x MASK> %vand, i32 2
%v3 = extractelement <4 x MASK> %vand, i32 3
%v01 = and MASK %v0, %v1
%v23 = and MASK %v2, %v3
%v = and MASK %v01, %v23
%cmp = icmp ne MASK %v, 0
ret i1 %cmp
}
define i1 @__none(<WIDTH x MASK>) nounwind readnone alwaysinline {
%any = call i1 @__any(<WIDTH x MASK> %0)
%none = icmp eq i1 %any, 0
ret i1 %none
}
;; $1: scalar type
;; $2: vector/vector reduce function (2 x <WIDTH x vec> -> <WIDTH x vec>)
;; $3: pairwise vector reduce function (2 x <2 x vec> -> <2 x vec>)
;; $4: scalar reduce function
define(`neon_reduce', `
v8tov4($1, %0, %v0123, %v4567)
%v0123_8 = shufflevector <4 x $1> %v0123, <4 x $1> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
%v4567_8 = shufflevector <4 x $1> %v4567, <4 x $1> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
%vfirst = call <8 x $1> $2(<8 x $1> %v0123_8, <8 x $1> %v4567_8)
%vfirst_4 = shufflevector <8 x $1> %vfirst, <8 x $1> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
v4tov2($1, %vfirst_4, %v0, %v1)
%vh = call <2 x $1> $3(<2 x $1> %v0, <2 x $1> %v1)
%vh0 = extractelement <2 x $1> %vh, i32 0
%vh1 = extractelement <2 x $1> %vh, i32 1
%r = call $1 $4($1 %vh0, $1 %vh1)
ret $1 %r
')
declare <2 x float> @llvm.arm.neon.vpadd.v2f32(<2 x float>, <2 x float>) nounwind readnone
define internal float @add_f32(float, float) nounwind readnone alwaysinline {
%r = fadd float %0, %1
ret float %r
}
define internal <WIDTH x float> @__add_varying_float(<WIDTH x float>, <WIDTH x float>) nounwind readnone alwaysinline {
%r = fadd <WIDTH x float> %0, %1
ret <WIDTH x float> %r
}
define float @__reduce_add_float(<WIDTH x float>) nounwind readnone alwaysinline {
neon_reduce(float, @__add_varying_float, @llvm.arm.neon.vpadd.v2f32, @add_f32)
}
declare <2 x float> @llvm.arm.neon.vpmins.v2f32(<2 x float>, <2 x float>) nounwind readnone
define internal float @min_f32(float, float) nounwind readnone alwaysinline {
%cmp = fcmp olt float %0, %1
%r = select i1 %cmp, float %0, float %1
ret float %r
}
define float @__reduce_min_float(<WIDTH x float>) nounwind readnone alwaysinline {
neon_reduce(float, @__min_varying_float, @llvm.arm.neon.vpmins.v2f32, @min_f32)
}
declare <2 x float> @llvm.arm.neon.vpmaxs.v2f32(<2 x float>, <2 x float>) nounwind readnone
define internal float @max_f32(float, float) nounwind readnone alwaysinline {
%cmp = fcmp ugt float %0, %1
%r = select i1 %cmp, float %0, float %1
ret float %r
}
define float @__reduce_max_float(<WIDTH x float>) nounwind readnone alwaysinline {
neon_reduce(float, @__max_varying_float, @llvm.arm.neon.vpmaxs.v2f32, @max_f32)
}
declare <4 x i16> @llvm.arm.neon.vpaddls.v4i16.v8i8(<8 x i8>) nounwind readnone
declare <2 x i32> @llvm.arm.neon.vpaddlu.v2i32.v4i16(<4 x i16>) nounwind readnone
define i16 @__reduce_add_int8(<WIDTH x i8>) nounwind readnone alwaysinline {
%a16 = call <4 x i16> @llvm.arm.neon.vpaddls.v4i16.v8i8(<8 x i8> %0)
%a32 = call <2 x i32> @llvm.arm.neon.vpaddlu.v2i32.v4i16(<4 x i16> %a16)
%a0 = extractelement <2 x i32> %a32, i32 0
%a1 = extractelement <2 x i32> %a32, i32 1
%r = add i32 %a0, %a1
%r16 = trunc i32 %r to i16
ret i16 %r16
}
declare <4 x i32> @llvm.arm.neon.vpaddlu.v4i32.v8i16(<WIDTH x i16>)
define i64 @__reduce_add_int16(<WIDTH x i16>) nounwind readnone alwaysinline {
%a1 = call <4 x i32> @llvm.arm.neon.vpaddlu.v4i32.v8i16(<WIDTH x i16> %0)
%a2 = call <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32> %a1)
%aa = extractelement <2 x i64> %a2, i32 0
%ab = extractelement <2 x i64> %a2, i32 1
%r = add i64 %aa, %ab
ret i64 %r
}
declare <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32>) nounwind readnone
define i64 @__reduce_add_int32(<WIDTH x i32>) nounwind readnone alwaysinline {
v8tov4(i32, %0, %va, %vb)
%pa = call <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32> %va)
%pb = call <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32> %vb)
%psum = add <2 x i64> %pa, %pb
%a0 = extractelement <2 x i64> %psum, i32 0
%a1 = extractelement <2 x i64> %psum, i32 1
%r = add i64 %a0, %a1
ret i64 %r
}
declare <2 x i32> @llvm.arm.neon.vpmins.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
define internal i32 @min_si32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp slt i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__reduce_min_int32(<WIDTH x i32>) nounwind readnone alwaysinline {
neon_reduce(i32, @__min_varying_int32, @llvm.arm.neon.vpmins.v2i32, @min_si32)
}
declare <2 x i32> @llvm.arm.neon.vpmaxs.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
define internal i32 @max_si32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp sgt i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__reduce_max_int32(<WIDTH x i32>) nounwind readnone alwaysinline {
neon_reduce(i32, @__max_varying_int32, @llvm.arm.neon.vpmaxs.v2i32, @max_si32)
}
declare <2 x i32> @llvm.arm.neon.vpminu.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
define internal i32 @min_ui32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp ult i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__reduce_min_uint32(<WIDTH x i32>) nounwind readnone alwaysinline {
neon_reduce(i32, @__min_varying_uint32, @llvm.arm.neon.vpmins.v2i32, @min_ui32)
}
declare <2 x i32> @llvm.arm.neon.vpmaxu.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
define internal i32 @max_ui32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp ugt i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__reduce_max_uint32(<WIDTH x i32>) nounwind readnone alwaysinline {
neon_reduce(i32, @__max_varying_uint32, @llvm.arm.neon.vpmaxs.v2i32, @max_ui32)
}
define double @__reduce_add_double(<WIDTH x double>) nounwind readnone alwaysinline {
v8tov2(double, %0, %v0, %v1, %v2, %v3)
%v01 = fadd <2 x double> %v0, %v1
%v23 = fadd <2 x double> %v2, %v3
%sum = fadd <2 x double> %v01, %v23
%e0 = extractelement <2 x double> %sum, i32 0
%e1 = extractelement <2 x double> %sum, i32 1
%m = fadd double %e0, %e1
ret double %m
}
define double @__reduce_min_double(<WIDTH x double>) nounwind readnone alwaysinline {
reduce8(double, @__min_varying_double, @__min_uniform_double)
}
define double @__reduce_max_double(<WIDTH x double>) nounwind readnone alwaysinline {
reduce8(double, @__max_varying_double, @__max_uniform_double)
}
define i64 @__reduce_add_int64(<WIDTH x i64>) nounwind readnone alwaysinline {
v8tov2(i64, %0, %v0, %v1, %v2, %v3)
%v01 = add <2 x i64> %v0, %v1
%v23 = add <2 x i64> %v2, %v3
%sum = add <2 x i64> %v01, %v23
%e0 = extractelement <2 x i64> %sum, i32 0
%e1 = extractelement <2 x i64> %sum, i32 1
%m = add i64 %e0, %e1
ret i64 %m
}
define i64 @__reduce_min_int64(<WIDTH x i64>) nounwind readnone alwaysinline {
reduce8(i64, @__min_varying_int64, @__min_uniform_int64)
}
define i64 @__reduce_max_int64(<WIDTH x i64>) nounwind readnone alwaysinline {
reduce8(i64, @__max_varying_int64, @__max_uniform_int64)
}
define i64 @__reduce_min_uint64(<WIDTH x i64>) nounwind readnone alwaysinline {
reduce8(i64, @__min_varying_uint64, @__min_uniform_uint64)
}
define i64 @__reduce_max_uint64(<WIDTH x i64>) nounwind readnone alwaysinline {
reduce8(i64, @__max_varying_uint64, @__max_uniform_uint64)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int8/int16
declare <8 x i8> @llvm.arm.neon.vrhaddu.v8i8(<8 x i8>, <8 x i8>) nounwind readnone
define <8 x i8> @__avg_up_uint8(<8 x i8>, <8 x i8>) nounwind readnone alwaysinline {
%r = call <8 x i8> @llvm.arm.neon.vrhaddu.v8i8(<8 x i8> %0, <8 x i8> %1)
ret <8 x i8> %r
}
declare <8 x i8> @llvm.arm.neon.vrhadds.v8i8(<8 x i8>, <8 x i8>) nounwind readnone
define <8 x i8> @__avg_up_int8(<8 x i8>, <8 x i8>) nounwind readnone alwaysinline {
%r = call <8 x i8> @llvm.arm.neon.vrhadds.v8i8(<8 x i8> %0, <8 x i8> %1)
ret <8 x i8> %r
}
declare <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8>, <8 x i8>) nounwind readnone
define <8 x i8> @__avg_down_uint8(<8 x i8>, <8 x i8>) nounwind readnone alwaysinline {
%r = call <8 x i8> @llvm.arm.neon.vhaddu.v8i8(<8 x i8> %0, <8 x i8> %1)
ret <8 x i8> %r
}
declare <8 x i8> @llvm.arm.neon.vhadds.v8i8(<8 x i8>, <8 x i8>) nounwind readnone
define <8 x i8> @__avg_down_int8(<8 x i8>, <8 x i8>) nounwind readnone alwaysinline {
%r = call <8 x i8> @llvm.arm.neon.vhadds.v8i8(<8 x i8> %0, <8 x i8> %1)
ret <8 x i8> %r
}
declare <8 x i16> @llvm.arm.neon.vrhaddu.v8i16(<8 x i16>, <8 x i16>) nounwind readnone
define <8 x i16> @__avg_up_uint16(<8 x i16>, <8 x i16>) nounwind readnone alwaysinline {
%r = call <8 x i16> @llvm.arm.neon.vrhaddu.v8i16(<8 x i16> %0, <8 x i16> %1)
ret <8 x i16> %r
}
declare <8 x i16> @llvm.arm.neon.vrhadds.v8i16(<8 x i16>, <8 x i16>) nounwind readnone
define <8 x i16> @__avg_up_int16(<8 x i16>, <8 x i16>) nounwind readnone alwaysinline {
%r = call <8 x i16> @llvm.arm.neon.vrhadds.v8i16(<8 x i16> %0, <8 x i16> %1)
ret <8 x i16> %r
}
declare <8 x i16> @llvm.arm.neon.vhaddu.v8i16(<8 x i16>, <8 x i16>) nounwind readnone
define <8 x i16> @__avg_down_uint16(<8 x i16>, <8 x i16>) nounwind readnone alwaysinline {
%r = call <8 x i16> @llvm.arm.neon.vhaddu.v8i16(<8 x i16> %0, <8 x i16> %1)
ret <8 x i16> %r
}
declare <8 x i16> @llvm.arm.neon.vhadds.v8i16(<8 x i16>, <8 x i16>) nounwind readnone
define <8 x i16> @__avg_down_int16(<8 x i16>, <8 x i16>) nounwind readnone alwaysinline {
%r = call <8 x i16> @llvm.arm.neon.vhadds.v8i16(<8 x i16> %0, <8 x i16> %1)
ret <8 x i16> %r
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reciprocals in double precision, if supported
rsqrtd_decl()
rcpd_decl()
transcendetals_decl()
trigonometry_decl()
saturation_arithmetic()

497
builtins/target-neon-32.ll Normal file
View File

@@ -0,0 +1,497 @@
;;
;; target-neon-32.ll
;;
;; Copyright(c) 2012-2013 Matt Pharr
;; Copyright(c) 2013, 2015 Google, Inc.
;;
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Matt Pharr nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
define(`WIDTH',`4')
define(`MASK',`i32')
include(`util.m4')
include(`target-neon-common.ll')
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
define <4 x float> @__half_to_float_varying(<4 x i16> %v) nounwind readnone alwaysinline {
%r = call <4 x float> @llvm.arm.neon.vcvthf2fp(<4 x i16> %v)
ret <4 x float> %r
}
define <4 x i16> @__float_to_half_varying(<4 x float> %v) nounwind readnone alwaysinline {
%r = call <4 x i16> @llvm.arm.neon.vcvtfp2hf(<4 x float> %v)
ret <4 x i16> %r
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; math
;; round/floor/ceil
;; FIXME: grabbed these from the sse2 target, which does not have native
;; instructions for these. Is there a better approach for NEON?
define <4 x float> @__round_varying_float(<4 x float>) nounwind readonly alwaysinline {
%float_to_int_bitcast.i.i.i.i = bitcast <4 x float> %0 to <4 x i32>
%bitop.i.i = and <4 x i32> %float_to_int_bitcast.i.i.i.i, <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>
%bitop.i = xor <4 x i32> %float_to_int_bitcast.i.i.i.i, %bitop.i.i
%int_to_float_bitcast.i.i40.i = bitcast <4 x i32> %bitop.i to <4 x float>
%binop.i = fadd <4 x float> %int_to_float_bitcast.i.i40.i, <float 8.388608e+06, float 8.388608e+06, float 8.388608e+06, float 8.388608e+06>
%binop21.i = fadd <4 x float> %binop.i, <float -8.388608e+06, float -8.388608e+06, float -8.388608e+06, float -8.388608e+06>
%float_to_int_bitcast.i.i.i = bitcast <4 x float> %binop21.i to <4 x i32>
%bitop31.i = xor <4 x i32> %float_to_int_bitcast.i.i.i, %bitop.i.i
%int_to_float_bitcast.i.i.i = bitcast <4 x i32> %bitop31.i to <4 x float>
ret <4 x float> %int_to_float_bitcast.i.i.i
}
define <4 x float> @__floor_varying_float(<4 x float>) nounwind readonly alwaysinline {
%calltmp.i = tail call <4 x float> @__round_varying_float(<4 x float> %0) nounwind
%bincmp.i = fcmp ogt <4 x float> %calltmp.i, %0
%val_to_boolvec32.i = sext <4 x i1> %bincmp.i to <4 x i32>
%bitop.i = and <4 x i32> %val_to_boolvec32.i, <i32 -1082130432, i32 -1082130432, i32 -1082130432, i32 -1082130432>
%int_to_float_bitcast.i.i.i = bitcast <4 x i32> %bitop.i to <4 x float>
%binop.i = fadd <4 x float> %calltmp.i, %int_to_float_bitcast.i.i.i
ret <4 x float> %binop.i
}
define <4 x float> @__ceil_varying_float(<4 x float>) nounwind readonly alwaysinline {
%calltmp.i = tail call <4 x float> @__round_varying_float(<4 x float> %0) nounwind
%bincmp.i = fcmp olt <4 x float> %calltmp.i, %0
%val_to_boolvec32.i = sext <4 x i1> %bincmp.i to <4 x i32>
%bitop.i = and <4 x i32> %val_to_boolvec32.i, <i32 1065353216, i32 1065353216, i32 1065353216, i32 1065353216>
%int_to_float_bitcast.i.i.i = bitcast <4 x i32> %bitop.i to <4 x float>
%binop.i = fadd <4 x float> %calltmp.i, %int_to_float_bitcast.i.i.i
ret <4 x float> %binop.i
}
;; FIXME: rounding doubles and double vectors needs to be implemented
declare <WIDTH x double> @__round_varying_double(<WIDTH x double>) nounwind readnone
declare <WIDTH x double> @__floor_varying_double(<WIDTH x double>) nounwind readnone
declare <WIDTH x double> @__ceil_varying_double(<WIDTH x double>) nounwind readnone
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; min/max
declare <4 x float> @llvm.arm.neon.vmins.v4f32(<4 x float>, <4 x float>) nounwind readnone
declare <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float>, <4 x float>) nounwind readnone
define <WIDTH x float> @__max_varying_float(<WIDTH x float>,
<WIDTH x float>) nounwind readnone alwaysinline {
%r = call <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float> %0, <4 x float> %1)
ret <WIDTH x float> %r
}
define <WIDTH x float> @__min_varying_float(<WIDTH x float>,
<WIDTH x float>) nounwind readnone alwaysinline {
%r = call <4 x float> @llvm.arm.neon.vmins.v4f32(<4 x float> %0, <4 x float> %1)
ret <WIDTH x float> %r
}
declare <4 x i32> @llvm.arm.neon.vmins.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.arm.neon.vminu.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.arm.neon.vmaxs.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.arm.neon.vmaxu.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
define <WIDTH x i32> @__min_varying_int32(<WIDTH x i32>, <WIDTH x i32>) nounwind readnone alwaysinline {
%r = call <4 x i32> @llvm.arm.neon.vmins.v4i32(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %r
}
define <WIDTH x i32> @__max_varying_int32(<WIDTH x i32>, <WIDTH x i32>) nounwind readnone alwaysinline {
%r = call <4 x i32> @llvm.arm.neon.vmaxs.v4i32(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %r
}
define <WIDTH x i32> @__min_varying_uint32(<WIDTH x i32>, <WIDTH x i32>) nounwind readnone alwaysinline {
%r = call <4 x i32> @llvm.arm.neon.vminu.v4i32(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %r
}
define <WIDTH x i32> @__max_varying_uint32(<WIDTH x i32>, <WIDTH x i32>) nounwind readnone alwaysinline {
%r = call <4 x i32> @llvm.arm.neon.vmaxu.v4i32(<4 x i32> %0, <4 x i32> %1)
ret <4 x i32> %r
}
;; sqrt/rsqrt/rcp
declare <4 x float> @llvm.arm.neon.vrecpe.v4f32(<4 x float>) nounwind readnone
declare <4 x float> @llvm.arm.neon.vrecps.v4f32(<4 x float>, <4 x float>) nounwind readnone
define <WIDTH x float> @__rcp_varying_float(<WIDTH x float> %d) nounwind readnone alwaysinline {
%x0 = call <4 x float> @llvm.arm.neon.vrecpe.v4f32(<4 x float> %d)
%x0_nr = call <4 x float> @llvm.arm.neon.vrecps.v4f32(<4 x float> %d, <4 x float> %x0)
%x1 = fmul <4 x float> %x0, %x0_nr
%x1_nr = call <4 x float> @llvm.arm.neon.vrecps.v4f32(<4 x float> %d, <4 x float> %x1)
%x2 = fmul <4 x float> %x1, %x1_nr
ret <4 x float> %x2
}
declare <4 x float> @llvm.arm.neon.vrsqrte.v4f32(<4 x float>) nounwind readnone
declare <4 x float> @llvm.arm.neon.vrsqrts.v4f32(<4 x float>, <4 x float>) nounwind readnone
define <WIDTH x float> @__rsqrt_varying_float(<WIDTH x float> %d) nounwind readnone alwaysinline {
%x0 = call <4 x float> @llvm.arm.neon.vrsqrte.v4f32(<4 x float> %d)
%x0_2 = fmul <4 x float> %x0, %x0
%x0_nr = call <4 x float> @llvm.arm.neon.vrsqrts.v4f32(<4 x float> %d, <4 x float> %x0_2)
%x1 = fmul <4 x float> %x0, %x0_nr
%x1_2 = fmul <4 x float> %x1, %x1
%x1_nr = call <4 x float> @llvm.arm.neon.vrsqrts.v4f32(<4 x float> %d, <4 x float> %x1_2)
%x2 = fmul <4 x float> %x1, %x1_nr
ret <4 x float> %x2
}
define float @__rsqrt_uniform_float(float) nounwind readnone alwaysinline {
%v1 = bitcast float %0 to <1 x float>
%vs = shufflevector <1 x float> %v1, <1 x float> undef,
<4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
%vr = call <4 x float> @__rsqrt_varying_float(<4 x float> %vs)
%r = extractelement <4 x float> %vr, i32 0
ret float %r
}
define float @__rcp_uniform_float(float) nounwind readnone alwaysinline {
%v1 = bitcast float %0 to <1 x float>
%vs = shufflevector <1 x float> %v1, <1 x float> undef,
<4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
%vr = call <4 x float> @__rcp_varying_float(<4 x float> %vs)
%r = extractelement <4 x float> %vr, i32 0
ret float %r
}
declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)
define <WIDTH x float> @__sqrt_varying_float(<WIDTH x float>) nounwind readnone alwaysinline {
%result = call <4 x float> @llvm.sqrt.v4f32(<4 x float> %0)
;; this returns nan for v=0, which is undesirable..
;; %rsqrt = call <WIDTH x float> @__rsqrt_varying_float(<WIDTH x float> %0)
;; %result = fmul <4 x float> %rsqrt, %0
ret <4 x float> %result
}
declare <4 x double> @llvm.sqrt.v4f64(<4 x double>)
define <WIDTH x double> @__sqrt_varying_double(<WIDTH x double>) nounwind readnone alwaysinline {
%r = call <4 x double> @llvm.sqrt.v4f64(<4 x double> %0)
ret <4 x double> %r
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reductions
define i64 @__movmsk(<4 x MASK>) nounwind readnone alwaysinline {
%and_mask = and <4 x MASK> %0, <MASK 1, MASK 2, MASK 4, MASK 8>
%v01 = shufflevector <4 x i32> %and_mask, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
%v23 = shufflevector <4 x i32> %and_mask, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%vor = or <2 x i32> %v01, %v23
%v0 = extractelement <2 x i32> %vor, i32 0
%v1 = extractelement <2 x i32> %vor, i32 1
%v = or i32 %v0, %v1
%mask64 = zext i32 %v to i64
ret i64 %mask64
}
define i1 @__any(<4 x i32>) nounwind readnone alwaysinline {
%v01 = shufflevector <4 x i32> %0, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
%v23 = shufflevector <4 x i32> %0, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%vor = or <2 x i32> %v01, %v23
%v0 = extractelement <2 x i32> %vor, i32 0
%v1 = extractelement <2 x i32> %vor, i32 1
%v = or i32 %v0, %v1
%cmp = icmp ne i32 %v, 0
ret i1 %cmp
}
define i1 @__all(<4 x i32>) nounwind readnone alwaysinline {
%v01 = shufflevector <4 x i32> %0, <4 x i32> undef, <2 x i32> <i32 0, i32 1>
%v23 = shufflevector <4 x i32> %0, <4 x i32> undef, <2 x i32> <i32 2, i32 3>
%vor = and <2 x i32> %v01, %v23
%v0 = extractelement <2 x i32> %vor, i32 0
%v1 = extractelement <2 x i32> %vor, i32 1
%v = and i32 %v0, %v1
%cmp = icmp ne i32 %v, 0
ret i1 %cmp
}
define i1 @__none(<4 x i32>) nounwind readnone alwaysinline {
%any = call i1 @__any(<4 x i32> %0)
%none = icmp eq i1 %any, 0
ret i1 %none
}
;; $1: scalar type
;; $2: vector reduce function (2 x <2 x vec> -> <2 x vec>)
;; $3 scalar reduce function
define(`neon_reduce', `
%v0 = shufflevector <4 x $1> %0, <4 x $1> undef, <2 x i32> <i32 0, i32 1>
%v1 = shufflevector <4 x $1> %0, <4 x $1> undef, <2 x i32> <i32 2, i32 3>
%vh = call <2 x $1> $2(<2 x $1> %v0, <2 x $1> %v1)
%vh0 = extractelement <2 x $1> %vh, i32 0
%vh1 = extractelement <2 x $1> %vh, i32 1
%r = call $1$3 ($1 %vh0, $1 %vh1)
ret $1 %r
')
declare <2 x float> @llvm.arm.neon.vpadd.v2f32(<2 x float>, <2 x float>) nounwind readnone
define internal float @add_f32(float, float) nounwind readnone alwaysinline {
%r = fadd float %0, %1
ret float %r
}
define float @__reduce_add_float(<4 x float>) nounwind readnone alwaysinline {
neon_reduce(float, @llvm.arm.neon.vpadd.v2f32, @add_f32)
}
declare <2 x float> @llvm.arm.neon.vpmins.v2f32(<2 x float>, <2 x float>) nounwind readnone
define internal float @min_f32(float, float) nounwind readnone alwaysinline {
%cmp = fcmp olt float %0, %1
%r = select i1 %cmp, float %0, float %1
ret float %r
}
define float @__reduce_min_float(<4 x float>) nounwind readnone alwaysinline {
neon_reduce(float, @llvm.arm.neon.vpmins.v2f32, @min_f32)
}
declare <2 x float> @llvm.arm.neon.vpmaxs.v2f32(<2 x float>, <2 x float>) nounwind readnone
define internal float @max_f32(float, float) nounwind readnone alwaysinline {
%cmp = fcmp ugt float %0, %1
%r = select i1 %cmp, float %0, float %1
ret float %r
}
define float @__reduce_max_float(<4 x float>) nounwind readnone alwaysinline {
neon_reduce(float, @llvm.arm.neon.vpmaxs.v2f32, @max_f32)
}
declare <4 x i16> @llvm.arm.neon.vpaddls.v4i16.v8i8(<8 x i8>) nounwind readnone
define i16 @__reduce_add_int8(<WIDTH x i8>) nounwind readnone alwaysinline {
%v8 = shufflevector <4 x i8> %0, <4 x i8> zeroinitializer,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 4, i32 4, i32 4>
%a16 = call <4 x i16> @llvm.arm.neon.vpaddls.v4i16.v8i8(<8 x i8> %v8)
%a32 = call <2 x i32> @llvm.arm.neon.vpaddlu.v2i32.v4i16(<4 x i16> %a16)
%a0 = extractelement <2 x i32> %a32, i32 0
%a1 = extractelement <2 x i32> %a32, i32 1
%r = add i32 %a0, %a1
%r16 = trunc i32 %r to i16
ret i16 %r16
}
declare <2 x i32> @llvm.arm.neon.vpaddlu.v2i32.v4i16(<4 x i16>) nounwind readnone
define i32 @__reduce_add_int16(<WIDTH x i16>) nounwind readnone alwaysinline {
%a32 = call <2 x i32> @llvm.arm.neon.vpaddlu.v2i32.v4i16(<4 x i16> %0)
%a0 = extractelement <2 x i32> %a32, i32 0
%a1 = extractelement <2 x i32> %a32, i32 1
%r = add i32 %a0, %a1
ret i32 %r
}
declare <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32>) nounwind readnone
define i64 @__reduce_add_int32(<WIDTH x i32>) nounwind readnone alwaysinline {
%a64 = call <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32> %0)
%a0 = extractelement <2 x i64> %a64, i32 0
%a1 = extractelement <2 x i64> %a64, i32 1
%r = add i64 %a0, %a1
ret i64 %r
}
declare <2 x i32> @llvm.arm.neon.vpmins.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
define internal i32 @min_si32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp slt i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__reduce_min_int32(<4 x i32>) nounwind readnone alwaysinline {
neon_reduce(i32, @llvm.arm.neon.vpmins.v2i32, @min_si32)
}
declare <2 x i32> @llvm.arm.neon.vpmaxs.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
define internal i32 @max_si32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp sgt i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__reduce_max_int32(<4 x i32>) nounwind readnone alwaysinline {
neon_reduce(i32, @llvm.arm.neon.vpmaxs.v2i32, @max_si32)
}
declare <2 x i32> @llvm.arm.neon.vpminu.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
define internal i32 @min_ui32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp ult i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__reduce_min_uint32(<4 x i32>) nounwind readnone alwaysinline {
neon_reduce(i32, @llvm.arm.neon.vpmins.v2i32, @min_ui32)
}
declare <2 x i32> @llvm.arm.neon.vpmaxu.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
define internal i32 @max_ui32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp ugt i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__reduce_max_uint32(<4 x i32>) nounwind readnone alwaysinline {
neon_reduce(i32, @llvm.arm.neon.vpmaxs.v2i32, @max_ui32)
}
define double @__reduce_add_double(<4 x double>) nounwind readnone alwaysinline {
%v0 = shufflevector <4 x double> %0, <4 x double> undef,
<2 x i32> <i32 0, i32 1>
%v1 = shufflevector <4 x double> %0, <4 x double> undef,
<2 x i32> <i32 2, i32 3>
%sum = fadd <2 x double> %v0, %v1
%e0 = extractelement <2 x double> %sum, i32 0
%e1 = extractelement <2 x double> %sum, i32 1
%m = fadd double %e0, %e1
ret double %m
}
define double @__reduce_min_double(<4 x double>) nounwind readnone alwaysinline {
reduce4(double, @__min_varying_double, @__min_uniform_double)
}
define double @__reduce_max_double(<4 x double>) nounwind readnone alwaysinline {
reduce4(double, @__max_varying_double, @__max_uniform_double)
}
define i64 @__reduce_add_int64(<4 x i64>) nounwind readnone alwaysinline {
%v0 = shufflevector <4 x i64> %0, <4 x i64> undef,
<2 x i32> <i32 0, i32 1>
%v1 = shufflevector <4 x i64> %0, <4 x i64> undef,
<2 x i32> <i32 2, i32 3>
%sum = add <2 x i64> %v0, %v1
%e0 = extractelement <2 x i64> %sum, i32 0
%e1 = extractelement <2 x i64> %sum, i32 1
%m = add i64 %e0, %e1
ret i64 %m
}
define i64 @__reduce_min_int64(<4 x i64>) nounwind readnone alwaysinline {
reduce4(i64, @__min_varying_int64, @__min_uniform_int64)
}
define i64 @__reduce_max_int64(<4 x i64>) nounwind readnone alwaysinline {
reduce4(i64, @__max_varying_int64, @__max_uniform_int64)
}
define i64 @__reduce_min_uint64(<4 x i64>) nounwind readnone alwaysinline {
reduce4(i64, @__min_varying_uint64, @__min_uniform_uint64)
}
define i64 @__reduce_max_uint64(<4 x i64>) nounwind readnone alwaysinline {
reduce4(i64, @__max_varying_uint64, @__max_uniform_uint64)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int8/int16
declare <4 x i8> @llvm.arm.neon.vrhaddu.v4i8(<4 x i8>, <4 x i8>) nounwind readnone
define <4 x i8> @__avg_up_uint8(<4 x i8>, <4 x i8>) nounwind readnone alwaysinline {
%r = call <4 x i8> @llvm.arm.neon.vrhaddu.v4i8(<4 x i8> %0, <4 x i8> %1)
ret <4 x i8> %r
}
declare <4 x i8> @llvm.arm.neon.vrhadds.v4i8(<4 x i8>, <4 x i8>) nounwind readnone
define <4 x i8> @__avg_up_int8(<4 x i8>, <4 x i8>) nounwind readnone alwaysinline {
%r = call <4 x i8> @llvm.arm.neon.vrhadds.v4i8(<4 x i8> %0, <4 x i8> %1)
ret <4 x i8> %r
}
declare <4 x i8> @llvm.arm.neon.vhaddu.v4i8(<4 x i8>, <4 x i8>) nounwind readnone
define <4 x i8> @__avg_down_uint8(<4 x i8>, <4 x i8>) nounwind readnone alwaysinline {
%r = call <4 x i8> @llvm.arm.neon.vhaddu.v4i8(<4 x i8> %0, <4 x i8> %1)
ret <4 x i8> %r
}
declare <4 x i8> @llvm.arm.neon.vhadds.v4i8(<4 x i8>, <4 x i8>) nounwind readnone
define <4 x i8> @__avg_down_int8(<4 x i8>, <4 x i8>) nounwind readnone alwaysinline {
%r = call <4 x i8> @llvm.arm.neon.vhadds.v4i8(<4 x i8> %0, <4 x i8> %1)
ret <4 x i8> %r
}
declare <4 x i16> @llvm.arm.neon.vrhaddu.v4i16(<4 x i16>, <4 x i16>) nounwind readnone
define <4 x i16> @__avg_up_uint16(<4 x i16>, <4 x i16>) nounwind readnone alwaysinline {
%r = call <4 x i16> @llvm.arm.neon.vrhaddu.v4i16(<4 x i16> %0, <4 x i16> %1)
ret <4 x i16> %r
}
declare <4 x i16> @llvm.arm.neon.vrhadds.v4i16(<4 x i16>, <4 x i16>) nounwind readnone
define <4 x i16> @__avg_up_int16(<4 x i16>, <4 x i16>) nounwind readnone alwaysinline {
%r = call <4 x i16> @llvm.arm.neon.vrhadds.v4i16(<4 x i16> %0, <4 x i16> %1)
ret <4 x i16> %r
}
declare <4 x i16> @llvm.arm.neon.vhaddu.v4i16(<4 x i16>, <4 x i16>) nounwind readnone
define <4 x i16> @__avg_down_uint16(<4 x i16>, <4 x i16>) nounwind readnone alwaysinline {
%r = call <4 x i16> @llvm.arm.neon.vhaddu.v4i16(<4 x i16> %0, <4 x i16> %1)
ret <4 x i16> %r
}
declare <4 x i16> @llvm.arm.neon.vhadds.v4i16(<4 x i16>, <4 x i16>) nounwind readnone
define <4 x i16> @__avg_down_int16(<4 x i16>, <4 x i16>) nounwind readnone alwaysinline {
%r = call <4 x i16> @llvm.arm.neon.vhadds.v4i16(<4 x i16> %0, <4 x i16> %1)
ret <4 x i16> %r
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reciprocals in double precision, if supported
rsqrtd_decl()
rcpd_decl()
transcendetals_decl()
trigonometry_decl()
saturation_arithmetic()

593
builtins/target-neon-8.ll Normal file
View File

@@ -0,0 +1,593 @@
;;
;; target-neon-8.ll
;;
;; Copyright(c) 2013-2015 Google, Inc.
;;
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Matt Pharr nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
define(`WIDTH',`16')
define(`MASK',`i8')
include(`util.m4')
include(`target-neon-common.ll')
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
define <16 x float> @__half_to_float_varying(<16 x i16> %v) nounwind readnone alwaysinline {
unary4to16conv(r, i16, float, @llvm.arm.neon.vcvthf2fp, %v)
ret <16 x float> %r
}
define <16 x i16> @__float_to_half_varying(<16 x float> %v) nounwind readnone alwaysinline {
unary4to16conv(r, float, i16, @llvm.arm.neon.vcvtfp2hf, %v)
ret <16 x i16> %r
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; math
;; round/floor/ceil
;; FIXME: grabbed these from the sse2 target, which does not have native
;; instructions for these. Is there a better approach for NEON?
define <16 x float> @__round_varying_float(<16 x float>) nounwind readonly alwaysinline {
%float_to_int_bitcast.i.i.i.i = bitcast <16 x float> %0 to <16 x i32>
%bitop.i.i = and <16 x i32> %float_to_int_bitcast.i.i.i.i,
<i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648,
i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648,
i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648,
i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>
%bitop.i = xor <16 x i32> %float_to_int_bitcast.i.i.i.i, %bitop.i.i
%int_to_float_bitcast.i.i40.i = bitcast <16 x i32> %bitop.i to <16 x float>
%binop.i = fadd <16 x float> %int_to_float_bitcast.i.i40.i,
<float 8.388608e+06, float 8.388608e+06, float 8.388608e+06, float 8.388608e+06,
float 8.388608e+06, float 8.388608e+06, float 8.388608e+06, float 8.388608e+06,
float 8.388608e+06, float 8.388608e+06, float 8.388608e+06, float 8.388608e+06,
float 8.388608e+06, float 8.388608e+06, float 8.388608e+06, float 8.388608e+06>
%binop21.i = fadd <16 x float> %binop.i,
<float -8.388608e+06, float -8.388608e+06, float -8.388608e+06, float -8.388608e+06,
float -8.388608e+06, float -8.388608e+06, float -8.388608e+06, float -8.388608e+06,
float -8.388608e+06, float -8.388608e+06, float -8.388608e+06, float -8.388608e+06,
float -8.388608e+06, float -8.388608e+06, float -8.388608e+06, float -8.388608e+06>
%float_to_int_bitcast.i.i.i = bitcast <16 x float> %binop21.i to <16 x i32>
%bitop31.i = xor <16 x i32> %float_to_int_bitcast.i.i.i, %bitop.i.i
%int_to_float_bitcast.i.i.i = bitcast <16 x i32> %bitop31.i to <16 x float>
ret <16 x float> %int_to_float_bitcast.i.i.i
}
define <16 x float> @__floor_varying_float(<16 x float>) nounwind readonly alwaysinline {
%calltmp.i = tail call <16 x float> @__round_varying_float(<16 x float> %0) nounwind
%bincmp.i = fcmp ogt <16 x float> %calltmp.i, %0
%val_to_boolvec32.i = sext <16 x i1> %bincmp.i to <16 x i32>
%bitop.i = and <16 x i32> %val_to_boolvec32.i,
<i32 -1082130432, i32 -1082130432, i32 -1082130432, i32 -1082130432,
i32 -1082130432, i32 -1082130432, i32 -1082130432, i32 -1082130432,
i32 -1082130432, i32 -1082130432, i32 -1082130432, i32 -1082130432,
i32 -1082130432, i32 -1082130432, i32 -1082130432, i32 -1082130432>
%int_to_float_bitcast.i.i.i = bitcast <16 x i32> %bitop.i to <16 x float>
%binop.i = fadd <16 x float> %calltmp.i, %int_to_float_bitcast.i.i.i
ret <16 x float> %binop.i
}
define <16 x float> @__ceil_varying_float(<16 x float>) nounwind readonly alwaysinline {
%calltmp.i = tail call <16 x float> @__round_varying_float(<16 x float> %0) nounwind
%bincmp.i = fcmp olt <16 x float> %calltmp.i, %0
%val_to_boolvec32.i = sext <16 x i1> %bincmp.i to <16 x i32>
%bitop.i = and <16 x i32> %val_to_boolvec32.i,
<i32 1065353216, i32 1065353216, i32 1065353216, i32 1065353216,
i32 1065353216, i32 1065353216, i32 1065353216, i32 1065353216,
i32 1065353216, i32 1065353216, i32 1065353216, i32 1065353216,
i32 1065353216, i32 1065353216, i32 1065353216, i32 1065353216>
%int_to_float_bitcast.i.i.i = bitcast <16 x i32> %bitop.i to <16 x float>
%binop.i = fadd <16 x float> %calltmp.i, %int_to_float_bitcast.i.i.i
ret <16 x float> %binop.i
}
;; FIXME: rounding doubles and double vectors needs to be implemented
declare <WIDTH x double> @__round_varying_double(<WIDTH x double>) nounwind readnone
declare <WIDTH x double> @__floor_varying_double(<WIDTH x double>) nounwind readnone
declare <WIDTH x double> @__ceil_varying_double(<WIDTH x double>) nounwind readnone
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; min/max
declare <4 x float> @llvm.arm.neon.vmins.v4f32(<4 x float>, <4 x float>) nounwind readnone
declare <4 x float> @llvm.arm.neon.vmaxs.v4f32(<4 x float>, <4 x float>) nounwind readnone
define <WIDTH x float> @__max_varying_float(<WIDTH x float>,
<WIDTH x float>) nounwind readnone alwaysinline {
binary4to16(r, float, @llvm.arm.neon.vmaxs.v4f32, %0, %1)
ret <WIDTH x float> %r
}
define <WIDTH x float> @__min_varying_float(<WIDTH x float>,
<WIDTH x float>) nounwind readnone alwaysinline {
binary4to16(r, float, @llvm.arm.neon.vmins.v4f32, %0, %1)
ret <WIDTH x float> %r
}
declare <4 x i32> @llvm.arm.neon.vmins.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.arm.neon.vminu.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.arm.neon.vmaxs.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.arm.neon.vmaxu.v4i32(<4 x i32>, <4 x i32>) nounwind readnone
define <WIDTH x i32> @__min_varying_int32(<WIDTH x i32>, <WIDTH x i32>) nounwind readnone alwaysinline {
binary4to16(r, i32, @llvm.arm.neon.vmins.v4i32, %0, %1)
ret <WIDTH x i32> %r
}
define <WIDTH x i32> @__max_varying_int32(<WIDTH x i32>, <WIDTH x i32>) nounwind readnone alwaysinline {
binary4to16(r, i32, @llvm.arm.neon.vmaxs.v4i32, %0, %1)
ret <WIDTH x i32> %r
}
define <WIDTH x i32> @__min_varying_uint32(<WIDTH x i32>, <WIDTH x i32>) nounwind readnone alwaysinline {
binary4to16(r, i32, @llvm.arm.neon.vminu.v4i32, %0, %1)
ret <WIDTH x i32> %r
}
define <WIDTH x i32> @__max_varying_uint32(<WIDTH x i32>, <WIDTH x i32>) nounwind readnone alwaysinline {
binary4to16(r, i32, @llvm.arm.neon.vmaxu.v4i32, %0, %1)
ret <WIDTH x i32> %r
}
;; sqrt/rsqrt/rcp
declare <4 x float> @llvm.arm.neon.vrecpe.v4f32(<4 x float>) nounwind readnone
declare <4 x float> @llvm.arm.neon.vrecps.v4f32(<4 x float>, <4 x float>) nounwind readnone
define <WIDTH x float> @__rcp_varying_float(<WIDTH x float> %d) nounwind readnone alwaysinline {
unary4to16(x0, float, @llvm.arm.neon.vrecpe.v4f32, %d)
binary4to16(x0_nr, float, @llvm.arm.neon.vrecps.v4f32, %d, %x0)
%x1 = fmul <WIDTH x float> %x0, %x0_nr
binary4to16(x1_nr, float, @llvm.arm.neon.vrecps.v4f32, %d, %x1)
%x2 = fmul <WIDTH x float> %x1, %x1_nr
ret <WIDTH x float> %x2
}
declare <4 x float> @llvm.arm.neon.vrsqrte.v4f32(<4 x float>) nounwind readnone
declare <4 x float> @llvm.arm.neon.vrsqrts.v4f32(<4 x float>, <4 x float>) nounwind readnone
define <WIDTH x float> @__rsqrt_varying_float(<WIDTH x float> %d) nounwind readnone alwaysinline {
unary4to16(x0, float, @llvm.arm.neon.vrsqrte.v4f32, %d)
%x0_2 = fmul <WIDTH x float> %x0, %x0
binary4to16(x0_nr, float, @llvm.arm.neon.vrsqrts.v4f32, %d, %x0_2)
%x1 = fmul <WIDTH x float> %x0, %x0_nr
%x1_2 = fmul <WIDTH x float> %x1, %x1
binary4to16(x1_nr, float, @llvm.arm.neon.vrsqrts.v4f32, %d, %x1_2)
%x2 = fmul <WIDTH x float> %x1, %x1_nr
ret <WIDTH x float> %x2
}
define float @__rsqrt_uniform_float(float) nounwind readnone alwaysinline {
%v1 = bitcast float %0 to <1 x float>
%vs = shufflevector <1 x float> %v1, <1 x float> undef,
<16 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%vr = call <16 x float> @__rsqrt_varying_float(<16 x float> %vs)
%r = extractelement <16 x float> %vr, i32 0
ret float %r
}
define float @__rcp_uniform_float(float) nounwind readnone alwaysinline {
%v1 = bitcast float %0 to <1 x float>
%vs = shufflevector <1 x float> %v1, <1 x float> undef,
<16 x i32> <i32 0, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%vr = call <16 x float> @__rcp_varying_float(<16 x float> %vs)
%r = extractelement <16 x float> %vr, i32 0
ret float %r
}
declare <4 x float> @llvm.sqrt.v4f32(<4 x float>)
define <WIDTH x float> @__sqrt_varying_float(<WIDTH x float>) nounwind readnone alwaysinline {
unary4to16(result, float, @llvm.sqrt.v4f32, %0)
;; this returns nan for v=0, which is undesirable..
;; %rsqrt = call <WIDTH x float> @__rsqrt_varying_float(<WIDTH x float> %0)
;; %result = fmul <4 x float> %rsqrt, %0
ret <16 x float> %result
}
declare <4 x double> @llvm.sqrt.v4f64(<4 x double>)
define <WIDTH x double> @__sqrt_varying_double(<WIDTH x double>) nounwind readnone alwaysinline {
unary4to16(r, double, @llvm.sqrt.v4f64, %0)
ret <WIDTH x double> %r
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reductions
define i64 @__movmsk(<WIDTH x MASK>) nounwind readnone alwaysinline {
%and_mask = and <WIDTH x i8> %0,
<i8 1, i8 2, i8 4, i8 8, i8 16, i8 32, i8 64, i8 128,
i8 1, i8 2, i8 4, i8 8, i8 16, i8 32, i8 64, i8 128>
%v8 = call <8 x i16> @llvm.arm.neon.vpaddlu.v8i16.v16i8(<16 x i8> %and_mask)
%v4 = call <4 x i32> @llvm.arm.neon.vpaddlu.v4i32.v8i16(<8 x i16> %v8)
%v2 = call <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32> %v4)
%va = extractelement <2 x i64> %v2, i32 0
%vb = extractelement <2 x i64> %v2, i32 1
%vbshift = shl i64 %vb, 8
%v = or i64 %va, %vbshift
ret i64 %v
}
define i1 @__any(<WIDTH x MASK>) nounwind readnone alwaysinline {
v16tov8(MASK, %0, %v8a, %v8b)
%vor8 = or <8 x MASK> %v8a, %v8b
%v16 = sext <8 x i8> %vor8 to <8 x i16>
v8tov4(i16, %v16, %v16a, %v16b)
%vor16 = or <4 x i16> %v16a, %v16b
%v32 = sext <4 x i16> %vor16 to <4 x i32>
v4tov2(i32, %v32, %v32a, %v32b)
%vor32 = or <2 x i32> %v32a, %v32b
%v0 = extractelement <2 x i32> %vor32, i32 0
%v1 = extractelement <2 x i32> %vor32, i32 1
%v = or i32 %v0, %v1
%cmp = icmp ne i32 %v, 0
ret i1 %cmp
}
define i1 @__all(<WIDTH x MASK>) nounwind readnone alwaysinline {
v16tov8(MASK, %0, %v8a, %v8b)
%vand8 = and <8 x MASK> %v8a, %v8b
%v16 = sext <8 x i8> %vand8 to <8 x i16>
v8tov4(i16, %v16, %v16a, %v16b)
%vand16 = and <4 x i16> %v16a, %v16b
%v32 = sext <4 x i16> %vand16 to <4 x i32>
v4tov2(i32, %v32, %v32a, %v32b)
%vand32 = and <2 x i32> %v32a, %v32b
%v0 = extractelement <2 x i32> %vand32, i32 0
%v1 = extractelement <2 x i32> %vand32, i32 1
%v = and i32 %v0, %v1
%cmp = icmp ne i32 %v, 0
ret i1 %cmp
}
define i1 @__none(<WIDTH x MASK>) nounwind readnone alwaysinline {
%any = call i1 @__any(<WIDTH x MASK> %0)
%none = icmp eq i1 %any, 0
ret i1 %none
}
;; $1: scalar type
;; $2: vector/vector reduce function (2 x <WIDTH x vec> -> <WIDTH x vec>)
;; $3: pairwise vector reduce function (2 x <2 x vec> -> <2 x vec>)
;; $4: scalar reduce function
define(`neon_reduce', `
v16tov8($1, %0, %va, %vb)
%va_16 = shufflevector <8 x $1> %va, <8 x $1> undef,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 undef, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%vb_16 = shufflevector <8 x $1> %vb, <8 x $1> undef,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 undef, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%v8 = call <16 x $1> $2(<16 x $1> %va_16, <16 x $1> %vb_16)
%v8a = shufflevector <16 x $1> %v8, <16 x $1> undef,
<16 x i32> <i32 0, i32 1, i32 2, i32 3,
i32 undef, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%v8b = shufflevector <16 x $1> %v8, <16 x $1> undef,
<16 x i32> <i32 4, i32 5, i32 6, i32 7,
i32 undef, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%v4 = call <16 x $1> $2(<16 x $1> %v8a, <16 x $1> %v8b)
%vfirst_4 = shufflevector <16 x $1> %v4, <16 x $1> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
v4tov2($1, %vfirst_4, %v0, %v1)
%vh = call <2 x $1> $3(<2 x $1> %v0, <2 x $1> %v1)
%vh0 = extractelement <2 x $1> %vh, i32 0
%vh1 = extractelement <2 x $1> %vh, i32 1
%r = call $1 $4($1 %vh0, $1 %vh1)
ret $1 %r
')
declare <2 x float> @llvm.arm.neon.vpadd.v2f32(<2 x float>, <2 x float>) nounwind readnone
define internal float @add_f32(float, float) nounwind readnone alwaysinline {
%r = fadd float %0, %1
ret float %r
}
define internal <WIDTH x float> @__add_varying_float(<WIDTH x float>, <WIDTH x float>) nounwind readnone alwaysinline {
%r = fadd <WIDTH x float> %0, %1
ret <WIDTH x float> %r
}
define float @__reduce_add_float(<WIDTH x float>) nounwind readnone alwaysinline {
neon_reduce(float, @__add_varying_float, @llvm.arm.neon.vpadd.v2f32, @add_f32)
}
declare <2 x float> @llvm.arm.neon.vpmins.v2f32(<2 x float>, <2 x float>) nounwind readnone
define internal float @min_f32(float, float) nounwind readnone alwaysinline {
%cmp = fcmp olt float %0, %1
%r = select i1 %cmp, float %0, float %1
ret float %r
}
define float @__reduce_min_float(<WIDTH x float>) nounwind readnone alwaysinline {
neon_reduce(float, @__min_varying_float, @llvm.arm.neon.vpmins.v2f32, @min_f32)
}
declare <2 x float> @llvm.arm.neon.vpmaxs.v2f32(<2 x float>, <2 x float>) nounwind readnone
define internal float @max_f32(float, float) nounwind readnone alwaysinline {
%cmp = fcmp ugt float %0, %1
%r = select i1 %cmp, float %0, float %1
ret float %r
}
define float @__reduce_max_float(<WIDTH x float>) nounwind readnone alwaysinline {
neon_reduce(float, @__max_varying_float, @llvm.arm.neon.vpmaxs.v2f32, @max_f32)
}
declare <8 x i16> @llvm.arm.neon.vpaddlu.v8i16.v16i8(<16 x i8>) nounwind readnone
declare <4 x i32> @llvm.arm.neon.vpaddlu.v4i32.v8i16(<8 x i16>) nounwind readnone
declare <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32>) nounwind readnone
define i64 @__reduce_add_int8(<WIDTH x i8>) nounwind readnone alwaysinline {
%a16 = call <8 x i16> @llvm.arm.neon.vpaddlu.v8i16.v16i8(<16 x i8> %0)
%a32 = call <4 x i32> @llvm.arm.neon.vpaddlu.v4i32.v8i16(<8 x i16> %a16)
%a64 = call <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32> %a32)
%a0 = extractelement <2 x i64> %a64, i32 0
%a1 = extractelement <2 x i64> %a64, i32 1
%r = add i64 %a0, %a1
ret i64 %r
}
define i64 @__reduce_add_int16(<WIDTH x i16>) nounwind readnone alwaysinline {
v16tov8(i16, %0, %va, %vb)
%a32 = call <4 x i32> @llvm.arm.neon.vpaddlu.v4i32.v8i16(<8 x i16> %va)
%b32 = call <4 x i32> @llvm.arm.neon.vpaddlu.v4i32.v8i16(<8 x i16> %vb)
%a64 = call <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32> %a32)
%b64 = call <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32> %b32)
%sum = add <2 x i64> %a64, %b64
%a0 = extractelement <2 x i64> %sum, i32 0
%a1 = extractelement <2 x i64> %sum, i32 1
%r = add i64 %a0, %a1
ret i64 %r
}
define i64 @__reduce_add_int32(<WIDTH x i32>) nounwind readnone alwaysinline {
v16tov4(i32, %0, %va, %vb, %vc, %vd)
%a64 = call <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32> %va)
%b64 = call <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32> %vb)
%c64 = call <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32> %vc)
%d64 = call <2 x i64> @llvm.arm.neon.vpaddlu.v2i64.v4i32(<4 x i32> %vd)
%ab = add <2 x i64> %a64, %b64
%cd = add <2 x i64> %c64, %d64
%sum = add <2 x i64> %ab, %cd
%a0 = extractelement <2 x i64> %sum, i32 0
%a1 = extractelement <2 x i64> %sum, i32 1
%r = add i64 %a0, %a1
ret i64 %r
}
declare <2 x i32> @llvm.arm.neon.vpmins.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
define internal i32 @min_si32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp slt i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__reduce_min_int32(<WIDTH x i32>) nounwind readnone alwaysinline {
neon_reduce(i32, @__min_varying_int32, @llvm.arm.neon.vpmins.v2i32, @min_si32)
}
declare <2 x i32> @llvm.arm.neon.vpmaxs.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
define internal i32 @max_si32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp sgt i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__reduce_max_int32(<WIDTH x i32>) nounwind readnone alwaysinline {
neon_reduce(i32, @__max_varying_int32, @llvm.arm.neon.vpmaxs.v2i32, @max_si32)
}
declare <2 x i32> @llvm.arm.neon.vpminu.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
define internal i32 @min_ui32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp ult i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__reduce_min_uint32(<WIDTH x i32>) nounwind readnone alwaysinline {
neon_reduce(i32, @__min_varying_uint32, @llvm.arm.neon.vpmins.v2i32, @min_ui32)
}
declare <2 x i32> @llvm.arm.neon.vpmaxu.v2i32(<2 x i32>, <2 x i32>) nounwind readnone
define internal i32 @max_ui32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp ugt i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__reduce_max_uint32(<WIDTH x i32>) nounwind readnone alwaysinline {
neon_reduce(i32, @__max_varying_uint32, @llvm.arm.neon.vpmaxs.v2i32, @max_ui32)
}
define internal double @__add_uniform_double(double, double) nounwind readnone alwaysinline {
%r = fadd double %0, %1
ret double %r
}
define internal <WIDTH x double> @__add_varying_double(<WIDTH x double>, <WIDTH x double>) nounwind readnone alwaysinline {
%r = fadd <WIDTH x double> %0, %1
ret <WIDTH x double> %r
}
define double @__reduce_add_double(<WIDTH x double>) nounwind readnone alwaysinline {
reduce16(double, @__add_varying_double, @__add_uniform_double)
}
define double @__reduce_min_double(<WIDTH x double>) nounwind readnone alwaysinline {
reduce16(double, @__min_varying_double, @__min_uniform_double)
}
define double @__reduce_max_double(<WIDTH x double>) nounwind readnone alwaysinline {
reduce16(double, @__max_varying_double, @__max_uniform_double)
}
define internal i64 @__add_uniform_int64(i64, i64) nounwind readnone alwaysinline {
%r = add i64 %0, %1
ret i64 %r
}
define internal <WIDTH x i64> @__add_varying_int64(<WIDTH x i64>, <WIDTH x i64>) nounwind readnone alwaysinline {
%r = add <WIDTH x i64> %0, %1
ret <WIDTH x i64> %r
}
define i64 @__reduce_add_int64(<WIDTH x i64>) nounwind readnone alwaysinline {
reduce16(i64, @__add_varying_int64, @__add_uniform_int64)
}
define i64 @__reduce_min_int64(<WIDTH x i64>) nounwind readnone alwaysinline {
reduce16(i64, @__min_varying_int64, @__min_uniform_int64)
}
define i64 @__reduce_max_int64(<WIDTH x i64>) nounwind readnone alwaysinline {
reduce16(i64, @__max_varying_int64, @__max_uniform_int64)
}
define i64 @__reduce_min_uint64(<WIDTH x i64>) nounwind readnone alwaysinline {
reduce16(i64, @__min_varying_uint64, @__min_uniform_uint64)
}
define i64 @__reduce_max_uint64(<WIDTH x i64>) nounwind readnone alwaysinline {
reduce16(i64, @__max_varying_uint64, @__max_uniform_uint64)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int8/int16 builtins
declare <16 x i8> @llvm.arm.neon.vrhaddu.v16i8(<16 x i8>, <16 x i8>) nounwind readnone
define <16 x i8> @__avg_up_uint8(<16 x i8>, <16 x i8>) nounwind readnone alwaysinline {
%r = call <16 x i8> @llvm.arm.neon.vrhaddu.v16i8(<16 x i8> %0, <16 x i8> %1)
ret <16 x i8> %r
}
declare <16 x i8> @llvm.arm.neon.vrhadds.v16i8(<16 x i8>, <16 x i8>) nounwind readnone
define <16 x i8> @__avg_up_int8(<16 x i8>, <16 x i8>) nounwind readnone alwaysinline {
%r = call <16 x i8> @llvm.arm.neon.vrhadds.v16i8(<16 x i8> %0, <16 x i8> %1)
ret <16 x i8> %r
}
declare <16 x i8> @llvm.arm.neon.vhaddu.v16i8(<16 x i8>, <16 x i8>) nounwind readnone
define <16 x i8> @__avg_down_uint8(<16 x i8>, <16 x i8>) nounwind readnone alwaysinline {
%r = call <16 x i8> @llvm.arm.neon.vhaddu.v16i8(<16 x i8> %0, <16 x i8> %1)
ret <16 x i8> %r
}
declare <16 x i8> @llvm.arm.neon.vhadds.v16i8(<16 x i8>, <16 x i8>) nounwind readnone
define <16 x i8> @__avg_down_int8(<16 x i8>, <16 x i8>) nounwind readnone alwaysinline {
%r = call <16 x i8> @llvm.arm.neon.vhadds.v16i8(<16 x i8> %0, <16 x i8> %1)
ret <16 x i8> %r
}
declare <8 x i16> @llvm.arm.neon.vrhaddu.v8i16(<8 x i16>, <8 x i16>) nounwind readnone
define <16 x i16> @__avg_up_uint16(<16 x i16>, <16 x i16>) nounwind readnone alwaysinline {
v16tov8(i16, %0, %a0, %b0)
v16tov8(i16, %1, %a1, %b1)
%r0 = call <8 x i16> @llvm.arm.neon.vrhaddu.v8i16(<8 x i16> %a0, <8 x i16> %a1)
%r1 = call <8 x i16> @llvm.arm.neon.vrhaddu.v8i16(<8 x i16> %b0, <8 x i16> %b1)
v8tov16(i16, %r0, %r1, %r)
ret <16 x i16> %r
}
declare <8 x i16> @llvm.arm.neon.vrhadds.v8i16(<8 x i16>, <8 x i16>) nounwind readnone
define <16 x i16> @__avg_up_int16(<16 x i16>, <16 x i16>) nounwind readnone alwaysinline {
v16tov8(i16, %0, %a0, %b0)
v16tov8(i16, %1, %a1, %b1)
%r0 = call <8 x i16> @llvm.arm.neon.vrhadds.v8i16(<8 x i16> %a0, <8 x i16> %a1)
%r1 = call <8 x i16> @llvm.arm.neon.vrhadds.v8i16(<8 x i16> %b0, <8 x i16> %b1)
v8tov16(i16, %r0, %r1, %r)
ret <16 x i16> %r
}
declare <8 x i16> @llvm.arm.neon.vhaddu.v8i16(<8 x i16>, <8 x i16>) nounwind readnone
define <16 x i16> @__avg_down_uint16(<16 x i16>, <16 x i16>) nounwind readnone alwaysinline {
v16tov8(i16, %0, %a0, %b0)
v16tov8(i16, %1, %a1, %b1)
%r0 = call <8 x i16> @llvm.arm.neon.vhaddu.v8i16(<8 x i16> %a0, <8 x i16> %a1)
%r1 = call <8 x i16> @llvm.arm.neon.vhaddu.v8i16(<8 x i16> %b0, <8 x i16> %b1)
v8tov16(i16, %r0, %r1, %r)
ret <16 x i16> %r
}
declare <8 x i16> @llvm.arm.neon.vhadds.v8i16(<8 x i16>, <8 x i16>) nounwind readnone
define <16 x i16> @__avg_down_int16(<16 x i16>, <16 x i16>) nounwind readnone alwaysinline {
v16tov8(i16, %0, %a0, %b0)
v16tov8(i16, %1, %a1, %b1)
%r0 = call <8 x i16> @llvm.arm.neon.vhadds.v8i16(<8 x i16> %a0, <8 x i16> %a1)
%r1 = call <8 x i16> @llvm.arm.neon.vhadds.v8i16(<8 x i16> %b0, <8 x i16> %b1)
v8tov16(i16, %r0, %r1, %r)
ret <16 x i16> %r
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reciprocals in double precision, if supported
rsqrtd_decl()
rcpd_decl()
transcendetals_decl()
trigonometry_decl()
saturation_arithmetic()

View File

@@ -0,0 +1,354 @@
;;
;; target-neon-common.ll
;;
;; Copyright(c) 2013-2015 Google, Inc.
;;
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Matt Pharr nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
target datalayout = "e-p:32:32:32-S32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f16:16:16-f32:32:32-f64:32:64-f128:128:128-v64:32:64-v128:32:128-a0:0:64-n32"
stdlib_core()
scans()
reduce_equal(WIDTH)
rdrand_decls()
define_shuffles()
aossoa()
ctlztz()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
declare <4 x i16> @llvm.arm.neon.vcvtfp2hf(<4 x float>) nounwind readnone
declare <4 x float> @llvm.arm.neon.vcvthf2fp(<4 x i16>) nounwind readnone
define float @__half_to_float_uniform(i16 %v) nounwind readnone alwaysinline {
%v1 = bitcast i16 %v to <1 x i16>
%vec = shufflevector <1 x i16> %v1, <1 x i16> undef,
<4 x i32> <i32 0, i32 0, i32 0, i32 0>
%h = call <4 x float> @llvm.arm.neon.vcvthf2fp(<4 x i16> %vec)
%r = extractelement <4 x float> %h, i32 0
ret float %r
}
define i16 @__float_to_half_uniform(float %v) nounwind readnone alwaysinline {
%v1 = bitcast float %v to <1 x float>
%vec = shufflevector <1 x float> %v1, <1 x float> undef,
<4 x i32> <i32 0, i32 0, i32 0, i32 0>
%h = call <4 x i16> @llvm.arm.neon.vcvtfp2hf(<4 x float> %vec)
%r = extractelement <4 x i16> %h, i32 0
ret i16 %r
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; math
declare i32 @llvm.arm.get.fpscr() nounwind
declare void @llvm.arm.set.fpscr(i32) nounwind
define void @__fastmath() nounwind alwaysinline {
%x = call i32 @llvm.arm.get.fpscr()
; Turn on FTZ (bit 24) and default NaN (bit 25)
%y = or i32 %x, 50331648
call void @llvm.arm.set.fpscr(i32 %y)
ret void
}
;; round/floor/ceil
;; FIXME: grabbed these from the sse2 target, which does not have native
;; instructions for these. Is there a better approach for NEON?
define float @__round_uniform_float(float) nounwind readonly alwaysinline {
%float_to_int_bitcast.i.i.i.i = bitcast float %0 to i32
%bitop.i.i = and i32 %float_to_int_bitcast.i.i.i.i, -2147483648
%bitop.i = xor i32 %bitop.i.i, %float_to_int_bitcast.i.i.i.i
%int_to_float_bitcast.i.i40.i = bitcast i32 %bitop.i to float
%binop.i = fadd float %int_to_float_bitcast.i.i40.i, 8.388608e+06
%binop21.i = fadd float %binop.i, -8.388608e+06
%float_to_int_bitcast.i.i.i = bitcast float %binop21.i to i32
%bitop31.i = xor i32 %float_to_int_bitcast.i.i.i, %bitop.i.i
%int_to_float_bitcast.i.i.i = bitcast i32 %bitop31.i to float
ret float %int_to_float_bitcast.i.i.i
}
define float @__floor_uniform_float(float) nounwind readonly alwaysinline {
%calltmp.i = tail call float @__round_uniform_float(float %0) nounwind
%bincmp.i = fcmp ogt float %calltmp.i, %0
%selectexpr.i = sext i1 %bincmp.i to i32
%bitop.i = and i32 %selectexpr.i, -1082130432
%int_to_float_bitcast.i.i.i = bitcast i32 %bitop.i to float
%binop.i = fadd float %calltmp.i, %int_to_float_bitcast.i.i.i
ret float %binop.i
}
define float @__ceil_uniform_float(float) nounwind readonly alwaysinline {
%calltmp.i = tail call float @__round_uniform_float(float %0) nounwind
%bincmp.i = fcmp olt float %calltmp.i, %0
%selectexpr.i = sext i1 %bincmp.i to i32
%bitop.i = and i32 %selectexpr.i, 1065353216
%int_to_float_bitcast.i.i.i = bitcast i32 %bitop.i to float
%binop.i = fadd float %calltmp.i, %int_to_float_bitcast.i.i.i
ret float %binop.i
}
;; FIXME: rounding doubles and double vectors needs to be implemented
declare double @__round_uniform_double(double) nounwind readnone
declare double @__floor_uniform_double(double) nounwind readnone
declare double @__ceil_uniform_double(double) nounwind readnone
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; min/max
define float @__max_uniform_float(float, float) nounwind readnone alwaysinline {
%cmp = fcmp ugt float %0, %1
%r = select i1 %cmp, float %0, float %1
ret float %r
}
define float @__min_uniform_float(float, float) nounwind readnone alwaysinline {
%cmp = fcmp ult float %0, %1
%r = select i1 %cmp, float %0, float %1
ret float %r
}
define i32 @__min_uniform_int32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp slt i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__max_uniform_int32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp sgt i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__min_uniform_uint32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp ult i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i32 @__max_uniform_uint32(i32, i32) nounwind readnone alwaysinline {
%cmp = icmp ugt i32 %0, %1
%r = select i1 %cmp, i32 %0, i32 %1
ret i32 %r
}
define i64 @__min_uniform_int64(i64, i64) nounwind readnone alwaysinline {
%cmp = icmp slt i64 %0, %1
%r = select i1 %cmp, i64 %0, i64 %1
ret i64 %r
}
define i64 @__max_uniform_int64(i64, i64) nounwind readnone alwaysinline {
%cmp = icmp sgt i64 %0, %1
%r = select i1 %cmp, i64 %0, i64 %1
ret i64 %r
}
define i64 @__min_uniform_uint64(i64, i64) nounwind readnone alwaysinline {
%cmp = icmp ult i64 %0, %1
%r = select i1 %cmp, i64 %0, i64 %1
ret i64 %r
}
define i64 @__max_uniform_uint64(i64, i64) nounwind readnone alwaysinline {
%cmp = icmp ugt i64 %0, %1
%r = select i1 %cmp, i64 %0, i64 %1
ret i64 %r
}
define double @__min_uniform_double(double, double) nounwind readnone alwaysinline {
%cmp = fcmp olt double %0, %1
%r = select i1 %cmp, double %0, double %1
ret double %r
}
define double @__max_uniform_double(double, double) nounwind readnone alwaysinline {
%cmp = fcmp ogt double %0, %1
%r = select i1 %cmp, double %0, double %1
ret double %r
}
define <WIDTH x i64> @__min_varying_int64(<WIDTH x i64>, <WIDTH x i64>) nounwind readnone alwaysinline {
%m = icmp slt <WIDTH x i64> %0, %1
%r = select <WIDTH x i1> %m, <WIDTH x i64> %0, <WIDTH x i64> %1
ret <WIDTH x i64> %r
}
define <WIDTH x i64> @__max_varying_int64(<WIDTH x i64>, <WIDTH x i64>) nounwind readnone alwaysinline {
%m = icmp sgt <WIDTH x i64> %0, %1
%r = select <WIDTH x i1> %m, <WIDTH x i64> %0, <WIDTH x i64> %1
ret <WIDTH x i64> %r
}
define <WIDTH x i64> @__min_varying_uint64(<WIDTH x i64>, <WIDTH x i64>) nounwind readnone alwaysinline {
%m = icmp ult <WIDTH x i64> %0, %1
%r = select <WIDTH x i1> %m, <WIDTH x i64> %0, <WIDTH x i64> %1
ret <WIDTH x i64> %r
}
define <WIDTH x i64> @__max_varying_uint64(<WIDTH x i64>, <WIDTH x i64>) nounwind readnone alwaysinline {
%m = icmp ugt <WIDTH x i64> %0, %1
%r = select <WIDTH x i1> %m, <WIDTH x i64> %0, <WIDTH x i64> %1
ret <WIDTH x i64> %r
}
define <WIDTH x double> @__min_varying_double(<WIDTH x double>,
<WIDTH x double>) nounwind readnone alwaysinline {
%m = fcmp olt <WIDTH x double> %0, %1
%r = select <WIDTH x i1> %m, <WIDTH x double> %0, <WIDTH x double> %1
ret <WIDTH x double> %r
}
define <WIDTH x double> @__max_varying_double(<WIDTH x double>,
<WIDTH x double>) nounwind readnone alwaysinline {
%m = fcmp ogt <WIDTH x double> %0, %1
%r = select <WIDTH x i1> %m, <WIDTH x double> %0, <WIDTH x double> %1
ret <WIDTH x double> %r
}
;; sqrt/rsqrt/rcp
declare float @llvm.sqrt.f32(float)
define float @__sqrt_uniform_float(float) nounwind readnone alwaysinline {
%r = call float @llvm.sqrt.f32(float %0)
ret float %r
}
declare double @llvm.sqrt.f64(double)
define double @__sqrt_uniform_double(double) nounwind readnone alwaysinline {
%r = call double @llvm.sqrt.f64(double %0)
ret double %r
}
;; bit ops
declare i32 @llvm.ctpop.i32(i32) nounwind readnone
declare i64 @llvm.ctpop.i64(i64) nounwind readnone
define i32 @__popcnt_int32(i32) nounwind readnone alwaysinline {
%v = call i32 @llvm.ctpop.i32(i32 %0)
ret i32 %v
}
define i64 @__popcnt_int64(i64) nounwind readnone alwaysinline {
%v = call i64 @llvm.ctpop.i64(i64 %0)
ret i64 %v
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unaligned loads/loads+broadcasts
masked_load(i8, 1)
masked_load(i16, 2)
masked_load(i32, 4)
masked_load(float, 4)
masked_load(i64, 8)
masked_load(double, 8)
gen_masked_store(i8)
gen_masked_store(i16)
gen_masked_store(i32)
gen_masked_store(i64)
masked_store_float_double()
define void @__masked_store_blend_i8(<WIDTH x i8>* nocapture %ptr, <WIDTH x i8> %new,
<WIDTH x MASK> %mask) nounwind alwaysinline {
%old = load PTR_OP_ARGS(`<WIDTH x i8> ') %ptr
%mask1 = trunc <WIDTH x MASK> %mask to <WIDTH x i1>
%result = select <WIDTH x i1> %mask1, <WIDTH x i8> %new, <WIDTH x i8> %old
store <WIDTH x i8> %result, <WIDTH x i8> * %ptr
ret void
}
define void @__masked_store_blend_i16(<WIDTH x i16>* nocapture %ptr, <WIDTH x i16> %new,
<WIDTH x MASK> %mask) nounwind alwaysinline {
%old = load PTR_OP_ARGS(`<WIDTH x i16> ') %ptr
%mask1 = trunc <WIDTH x MASK> %mask to <WIDTH x i1>
%result = select <WIDTH x i1> %mask1, <WIDTH x i16> %new, <WIDTH x i16> %old
store <WIDTH x i16> %result, <WIDTH x i16> * %ptr
ret void
}
define void @__masked_store_blend_i32(<WIDTH x i32>* nocapture %ptr, <WIDTH x i32> %new,
<WIDTH x MASK> %mask) nounwind alwaysinline {
%old = load PTR_OP_ARGS(`<WIDTH x i32> ') %ptr
%mask1 = trunc <WIDTH x MASK> %mask to <WIDTH x i1>
%result = select <WIDTH x i1> %mask1, <WIDTH x i32> %new, <WIDTH x i32> %old
store <WIDTH x i32> %result, <WIDTH x i32> * %ptr
ret void
}
define void @__masked_store_blend_i64(<WIDTH x i64>* nocapture %ptr,
<WIDTH x i64> %new, <WIDTH x MASK> %mask) nounwind alwaysinline {
%old = load PTR_OP_ARGS(`<WIDTH x i64> ') %ptr
%mask1 = trunc <WIDTH x MASK> %mask to <WIDTH x i1>
%result = select <WIDTH x i1> %mask1, <WIDTH x i64> %new, <WIDTH x i64> %old
store <WIDTH x i64> %result, <WIDTH x i64> * %ptr
ret void
}
;; yuck. We need declarations of these, even though we shouldnt ever
;; actually generate calls to them for the NEON target...
include(`svml.m4')
svml_stubs(float,f,WIDTH)
svml_stubs(double,d,WIDTH)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather
gen_gather_factored(i8)
gen_gather_factored(i16)
gen_gather_factored(i32)
gen_gather_factored(float)
gen_gather_factored(i64)
gen_gather_factored(double)
gen_scatter(i8)
gen_scatter(i16)
gen_scatter(i32)
gen_scatter(float)
gen_scatter(i64)
gen_scatter(double)
packed_load_and_store(4)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; prefetch
define_prefetches()
declare_nvptx()

2371
builtins/target-nvptx.ll Normal file

File diff suppressed because it is too large Load Diff

94
builtins/target-skx.ll Normal file
View File

@@ -0,0 +1,94 @@
;; Copyright (c) 2016, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Intel Corporation nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
define(`WIDTH',`16')
ifelse(LLVM_VERSION, LLVM_3_8,
`include(`target-avx512-common.ll')',
LLVM_VERSION, LLVM_3_9,
`include(`target-avx512-common.ll')',
LLVM_VERSION, LLVM_4_0,
`include(`target-avx512-common.ll')',
LLVM_VERSION, LLVM_5_0,
`include(`target-avx512-common.ll')'
)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rcp, rsqrt
define(`rcp_rsqrt_varying_float_skx',`
declare <16 x float> @llvm.x86.avx512.rcp14.ps.512(<16 x float>, <16 x float>, i16) nounwind readnone
define <16 x float> @__rcp_varying_float(<16 x float>) nounwind readonly alwaysinline {
%call = call <16 x float> @llvm.x86.avx512.rcp14.ps.512(<16 x float> %0, <16 x float> undef, i16 -1)
;; do one Newton-Raphson iteration to improve precision
;; float iv = __rcp_v(v);
;; return iv * (2. - v * iv);
%v_iv = fmul <16 x float> %0`,' %call
%two_minus = fsub <16 x float> <float 2.`,' float 2.`,' float 2.`,' float 2.`,'
float 2.`,' float 2.`,' float 2.`,' float 2.`,'
float 2.`,' float 2.`,' float 2.`,' float 2.`,'
float 2.`,' float 2.`,' float 2.`,' float 2.>`,' %v_iv
%iv_mul = fmul <16 x float> %call`,' %two_minus
ret <16 x float> %iv_mul
}
declare <16 x float> @llvm.x86.avx512.rsqrt14.ps.512(<16 x float>`,' <16 x float>`,' i16) nounwind readnone
define <16 x float> @__rsqrt_varying_float(<16 x float> %v) nounwind readonly alwaysinline {
%is = call <16 x float> @llvm.x86.avx512.rsqrt14.ps.512(<16 x float> %v`,' <16 x float> undef`,' i16 -1)
; Newton-Raphson iteration to improve precision
; float is = __rsqrt_v(v);
; return 0.5 * is * (3. - (v * is) * is);
%v_is = fmul <16 x float> %v`,' %is
%v_is_is = fmul <16 x float> %v_is`,' %is
%three_sub = fsub <16 x float> <float 3.`,' float 3.`,' float 3.`,' float 3.`,'
float 3.`,' float 3.`,' float 3.`,' float 3.`,'
float 3.`,' float 3.`,' float 3.`,' float 3.`,'
float 3.`,' float 3.`,' float 3.`,' float 3.>`,' %v_is_is
%is_mul = fmul <16 x float> %is`,' %three_sub
%half_scale = fmul <16 x float> <float 0.5`,' float 0.5`,' float 0.5`,' float 0.5`,'
float 0.5`,' float 0.5`,' float 0.5`,' float 0.5`,'
float 0.5`,' float 0.5`,' float 0.5`,' float 0.5`,'
float 0.5`,' float 0.5`,' float 0.5`,' float 0.5>`,' %is_mul
ret <16 x float> %half_scale
}
')
ifelse(LLVM_VERSION, LLVM_3_8,
rcp_rsqrt_varying_float_skx(),
LLVM_VERSION, LLVM_3_9,
rcp_rsqrt_varying_float_skx(),
LLVM_VERSION, LLVM_4_0,
rcp_rsqrt_varying_float_skx(),
LLVM_VERSION, LLVM_5_0,
rcp_rsqrt_varying_float_skx()
)
;;saturation_arithmetic_novec()

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2010-2011, Intel Corporation
;; Copyright (c) 2010-2015, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -33,6 +33,7 @@ ctlztz()
define_prefetches()
define_shuffles()
aossoa()
rdrand_decls()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rcp
@@ -96,7 +97,7 @@ define void @__fastmath() nounwind alwaysinline {
%ptr = alloca i32
%ptr8 = bitcast i32 * %ptr to i8 *
call void @llvm.x86.sse.stmxcsr(i8 * %ptr8)
%oldval = load i32 *%ptr
%oldval = load PTR_OP_ARGS(`i32 ') %ptr
; turn on DAZ (64)/FTZ (32768) -> 32832
%update = or i32 %oldval, 32832
@@ -268,4 +269,9 @@ define i64 @__popcnt_int64(i64) nounwind readnone alwaysinline {
ret i64 %val
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int8/int16 builtins
define_avgs()
declare_nvptx()

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2010-2011, Intel Corporation
;; Copyright (c) 2010-2015, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -44,9 +44,18 @@ stdlib_core()
packed_load_and_store()
scans()
int64minmax()
saturation_arithmetic()
include(`target-sse2-common.ll')
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
declare float @__half_to_float_uniform(i16 %v) nounwind readnone
declare <WIDTH x float> @__half_to_float_varying(<WIDTH x i16> %v) nounwind readnone
declare i16 @__float_to_half_uniform(float %v) nounwind readnone
declare <WIDTH x i16> @__float_to_half_varying(<WIDTH x float> %v) nounwind readnone
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rcp
@@ -97,87 +106,14 @@ define <8 x float> @__sqrt_varying_float(<8 x float>) nounwind readonly alwaysin
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; svml stuff
declare <4 x float> @__svml_sinf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_cosf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_sincosf4(<4 x float> *, <4 x float>) nounwind readnone
declare <4 x float> @__svml_tanf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_atanf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_atan2f4(<4 x float>, <4 x float>) nounwind readnone
declare <4 x float> @__svml_expf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_logf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_powf4(<4 x float>, <4 x float>) nounwind readnone
include(`svml.m4')
;; single precision
svml_declare(float,f4,4)
svml_define_x(float,f4,4,f,8)
define <8 x float> @__svml_sin(<8 x float>) nounwind readnone alwaysinline {
unary4to8(ret, float, @__svml_sinf4, %0)
ret <8 x float> %ret
}
define <8 x float> @__svml_cos(<8 x float>) nounwind readnone alwaysinline {
unary4to8(ret, float, @__svml_cosf4, %0)
ret <8 x float> %ret
}
define void @__svml_sincos(<8 x float>, <8 x float> *,
<8 x float> *) nounwind readnone alwaysinline {
; call svml_sincosf4 two times with the two 4-wide sub-vectors
%a = shufflevector <8 x float> %0, <8 x float> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
%b = shufflevector <8 x float> %0, <8 x float> undef,
<4 x i32> <i32 4, i32 5, i32 6, i32 7>
%cospa = alloca <4 x float>
%sa = call <4 x float> @__svml_sincosf4(<4 x float> * %cospa, <4 x float> %a)
%cospb = alloca <4 x float>
%sb = call <4 x float> @__svml_sincosf4(<4 x float> * %cospb, <4 x float> %b)
%sin = shufflevector <4 x float> %sa, <4 x float> %sb,
<8 x i32> <i32 0, i32 1, i32 2, i32 3,
i32 4, i32 5, i32 6, i32 7>
store <8 x float> %sin, <8 x float> * %1
%cosa = load <4 x float> * %cospa
%cosb = load <4 x float> * %cospb
%cos = shufflevector <4 x float> %cosa, <4 x float> %cosb,
<8 x i32> <i32 0, i32 1, i32 2, i32 3,
i32 4, i32 5, i32 6, i32 7>
store <8 x float> %cos, <8 x float> * %2
ret void
}
define <8 x float> @__svml_tan(<8 x float>) nounwind readnone alwaysinline {
unary4to8(ret, float, @__svml_tanf4, %0)
ret <8 x float> %ret
}
define <8 x float> @__svml_atan(<8 x float>) nounwind readnone alwaysinline {
unary4to8(ret, float, @__svml_atanf4, %0)
ret <8 x float> %ret
}
define <8 x float> @__svml_atan2(<8 x float>,
<8 x float>) nounwind readnone alwaysinline {
binary4to8(ret, float, @__svml_atan2f4, %0, %1)
ret <8 x float> %ret
}
define <8 x float> @__svml_exp(<8 x float>) nounwind readnone alwaysinline {
unary4to8(ret, float, @__svml_expf4, %0)
ret <8 x float> %ret
}
define <8 x float> @__svml_log(<8 x float>) nounwind readnone alwaysinline {
unary4to8(ret, float, @__svml_logf4, %0)
ret <8 x float> %ret
}
define <8 x float> @__svml_pow(<8 x float>,
<8 x float>) nounwind readnone alwaysinline {
binary4to8(ret, float, @__svml_powf4, %0, %1)
ret <8 x float> %ret
}
;; double precision
svml_declare(double,2,2)
svml_define_x(double,2,2,d,8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@@ -287,7 +223,7 @@ define i32 @__max_uniform_uint32(i32, i32) nounwind readonly alwaysinline {
declare i32 @llvm.x86.sse.movmsk.ps(<4 x float>) nounwind readnone
define i32 @__movmsk(<8 x i32>) nounwind readnone alwaysinline {
define i64 @__movmsk(<8 x i32>) nounwind readnone alwaysinline {
; first do two 4-wide movmsk calls
%floatmask = bitcast <8 x i32> %0 to <8 x float>
%m0 = shufflevector <8 x float> %floatmask, <8 x float> undef,
@@ -301,7 +237,92 @@ define i32 @__movmsk(<8 x i32>) nounwind readnone alwaysinline {
; of the second one
%v1s = shl i32 %v1, 4
%v = or i32 %v0, %v1s
ret i32 %v
%v64 = zext i32 %v to i64
ret i64 %v64
}
define i1 @__any(<8 x i32>) nounwind readnone alwaysinline {
; first do two 4-wide movmsk calls
%floatmask = bitcast <8 x i32> %0 to <8 x float>
%m0 = shufflevector <8 x float> %floatmask, <8 x float> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
%v0 = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %m0) nounwind readnone
%m1 = shufflevector <8 x float> %floatmask, <8 x float> undef,
<4 x i32> <i32 4, i32 5, i32 6, i32 7>
%v1 = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %m1) nounwind readnone
; and shift the first one over by 4 before ORing it with the value
; of the second one
%v1s = shl i32 %v1, 4
%v = or i32 %v0, %v1s
%cmp = icmp ne i32 %v, 0
ret i1 %cmp
}
define i1 @__all(<8 x i32>) nounwind readnone alwaysinline {
; first do two 4-wide movmsk calls
%floatmask = bitcast <8 x i32> %0 to <8 x float>
%m0 = shufflevector <8 x float> %floatmask, <8 x float> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
%v0 = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %m0) nounwind readnone
%m1 = shufflevector <8 x float> %floatmask, <8 x float> undef,
<4 x i32> <i32 4, i32 5, i32 6, i32 7>
%v1 = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %m1) nounwind readnone
; and shift the first one over by 4 before ORing it with the value
; of the second one
%v1s = shl i32 %v1, 4
%v = or i32 %v0, %v1s
%cmp = icmp eq i32 %v, 255
ret i1 %cmp
}
define i1 @__none(<8 x i32>) nounwind readnone alwaysinline {
; first do two 4-wide movmsk calls
%floatmask = bitcast <8 x i32> %0 to <8 x float>
%m0 = shufflevector <8 x float> %floatmask, <8 x float> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
%v0 = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %m0) nounwind readnone
%m1 = shufflevector <8 x float> %floatmask, <8 x float> undef,
<4 x i32> <i32 4, i32 5, i32 6, i32 7>
%v1 = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %m1) nounwind readnone
; and shift the first one over by 4 before ORing it with the value
; of the second one
%v1s = shl i32 %v1, 4
%v = or i32 %v0, %v1s
%cmp = icmp eq i32 %v, 0
ret i1 %cmp
}
declare <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8>, <16 x i8>) nounwind readnone
define i16 @__reduce_add_int8(<8 x i8>) nounwind readnone alwaysinline {
%wide8 = shufflevector <8 x i8> %0, <8 x i8> zeroinitializer,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
%rv = call <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8> %wide8,
<16 x i8> zeroinitializer)
%r0 = extractelement <2 x i64> %rv, i32 0
%r1 = extractelement <2 x i64> %rv, i32 1
%r = add i64 %r0, %r1
%r16 = trunc i64 %r to i16
ret i16 %r16
}
define internal <8 x i16> @__add_varying_i16(<8 x i16>,
<8 x i16>) nounwind readnone alwaysinline {
%r = add <8 x i16> %0, %1
ret <8 x i16> %r
}
define internal i16 @__add_uniform_i16(i16, i16) nounwind readnone alwaysinline {
%r = add i16 %0, %1
ret i16 %r
}
define i16 @__reduce_add_int16(<8 x i16>) nounwind readnone alwaysinline {
reduce8(i16, @__add_varying_i16, @__add_uniform_i16)
}
define <4 x float> @__vec4_add_float(<4 x float> %v0,
@@ -352,11 +373,6 @@ define i32 @__reduce_max_int32(<8 x i32>) nounwind readnone alwaysinline {
reduce8(i32, @__max_varying_int32, @__max_uniform_int32)
}
define i32 @__reduce_add_uint32(<8 x i32> %v) nounwind readnone alwaysinline {
%r = call i32 @__reduce_add_int32(<8 x i32> %v)
ret i32 %r
}
define i32 @__reduce_min_uint32(<8 x i32>) nounwind readnone alwaysinline {
reduce8(i32, @__min_varying_uint32, @__min_uniform_uint32)
}
@@ -389,7 +405,7 @@ define double @__reduce_max_double(<8 x double>) nounwind readnone {
}
define <4 x i64> @__add_varying_int64(<4 x i64>,
<4 x i64>) nounwind readnone alwaysinline {
<4 x i64>) nounwind readnone alwaysinline {
%r = add <4 x i64> %0, %1
ret <4 x i64> %r
}
@@ -424,28 +440,30 @@ reduce_equal(8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unaligned loads/loads+broadcasts
load_and_broadcast(8, i8, 8)
load_and_broadcast(8, i16, 16)
load_and_broadcast(8, i32, 32)
load_and_broadcast(8, i64, 64)
masked_load(8, i8, 8, 1)
masked_load(8, i16, 16, 2)
masked_load(8, i32, 32, 4)
masked_load(8, i64, 64, 8)
masked_load(i8, 1)
masked_load(i16, 2)
masked_load(i32, 4)
masked_load(float, 4)
masked_load(i64, 8)
masked_load(double, 8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather/scatter
gen_gather(8, i8)
gen_gather(8, i16)
gen_gather(8, i32)
gen_gather(8, i64)
gen_gather_factored(i8)
gen_gather_factored(i16)
gen_gather_factored(i32)
gen_gather_factored(float)
gen_gather_factored(i64)
gen_gather_factored(double)
gen_scatter(8, i8)
gen_scatter(8, i16)
gen_scatter(8, i32)
gen_scatter(8, i64)
gen_scatter(i8)
gen_scatter(i16)
gen_scatter(i32)
gen_scatter(float)
gen_scatter(i64)
gen_scatter(double)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float rounding
@@ -549,24 +567,24 @@ define <8 x double> @__ceil_varying_double(<8 x double>) nounwind readonly alway
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; masked store
gen_masked_store(8, i8, 8)
gen_masked_store(8, i16, 16)
gen_masked_store(8, i32, 32)
gen_masked_store(8, i64, 64)
gen_masked_store(i8)
gen_masked_store(i16)
gen_masked_store(i32)
gen_masked_store(i64)
masked_store_blend_8_16_by_8()
define void @__masked_store_blend_32(<8 x i32>* nocapture, <8 x i32>,
<8 x i32> %mask) nounwind alwaysinline {
%val = load <8 x i32> * %0, align 4
define void @__masked_store_blend_i32(<8 x i32>* nocapture, <8 x i32>,
<8 x i32> %mask) nounwind alwaysinline {
%val = load PTR_OP_ARGS(`<8 x i32> ') %0, align 4
%newval = call <8 x i32> @__vselect_i32(<8 x i32> %val, <8 x i32> %1, <8 x i32> %mask)
store <8 x i32> %newval, <8 x i32> * %0, align 4
ret void
}
define void @__masked_store_blend_64(<8 x i64>* nocapture %ptr, <8 x i64> %new,
<8 x i32> %mask) nounwind alwaysinline {
%oldValue = load <8 x i64>* %ptr, align 8
define void @__masked_store_blend_i64(<8 x i64>* nocapture %ptr, <8 x i64> %new,
<8 x i32> %mask) nounwind alwaysinline {
%oldValue = load PTR_OP_ARGS(`<8 x i64>') %ptr, align 8
; Do 8x64-bit blends by doing two <8 x i32> blends, where the <8 x i32> values
; are actually bitcast <2 x i64> values
@@ -608,6 +626,8 @@ define void @__masked_store_blend_64(<8 x i64>* nocapture %ptr, <8 x i64> %new,
ret void
}
masked_store_float_double()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision sqrt
@@ -633,3 +653,12 @@ define <8 x double> @__max_varying_double(<8 x double>, <8 x double>) nounwind r
binary2to8(ret, double, @llvm.x86.sse2.max.pd, %0, %1)
ret <8 x double> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reciprocals in double precision, if supported
rsqrtd_decl()
rcpd_decl()
transcendetals_decl()
trigonometry_decl()

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2010-2011, Intel Corporation
;; Copyright (c) 2010-2015, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -41,9 +41,18 @@ stdlib_core()
packed_load_and_store()
scans()
int64minmax()
saturation_arithmetic()
include(`target-sse2-common.ll')
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
declare float @__half_to_float_uniform(i16 %v) nounwind readnone
declare <WIDTH x float> @__half_to_float_varying(<WIDTH x i16> %v) nounwind readnone
declare i16 @__float_to_half_uniform(float %v) nounwind readnone
declare <WIDTH x i16> @__float_to_half_varying(<WIDTH x float> %v) nounwind readnone
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rounding
;;
@@ -231,10 +240,62 @@ define i32 @__max_uniform_uint32(i32, i32) nounwind readonly alwaysinline {
declare i32 @llvm.x86.sse.movmsk.ps(<4 x float>) nounwind readnone
define i32 @__movmsk(<4 x i32>) nounwind readnone alwaysinline {
define i64 @__movmsk(<4 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <4 x i32> %0 to <4 x float>
%v = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %floatmask) nounwind readnone
ret i32 %v
%v64 = zext i32 %v to i64
ret i64 %v64
}
define i1 @__any(<4 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <4 x i32> %0 to <4 x float>
%v = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %floatmask) nounwind readnone
%cmp = icmp ne i32 %v, 0
ret i1 %cmp
}
define i1 @__all(<4 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <4 x i32> %0 to <4 x float>
%v = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %floatmask) nounwind readnone
%cmp = icmp eq i32 %v, 15
ret i1 %cmp
}
define i1 @__none(<4 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <4 x i32> %0 to <4 x float>
%v = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %floatmask) nounwind readnone
%cmp = icmp eq i32 %v, 0
ret i1 %cmp
}
declare <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8>, <16 x i8>) nounwind readnone
define i16 @__reduce_add_int8(<4 x i8>) nounwind readnone alwaysinline {
%wide8 = shufflevector <4 x i8> %0, <4 x i8> zeroinitializer,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 4, i32 4, i32 4,
i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4>
%rv = call <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8> %wide8,
<16 x i8> zeroinitializer)
%r0 = extractelement <2 x i64> %rv, i32 0
%r1 = extractelement <2 x i64> %rv, i32 1
%r = add i64 %r0, %r1
%r16 = trunc i64 %r to i16
ret i16 %r16
}
define internal <4 x i16> @__add_varying_i16(<4 x i16>,
<4 x i16>) nounwind readnone alwaysinline {
%r = add <4 x i16> %0, %1
ret <4 x i16> %r
}
define internal i16 @__add_uniform_i16(i16, i16) nounwind readnone alwaysinline {
%r = add i16 %0, %1
ret i16 %r
}
define i16 @__reduce_add_int16(<4 x i16>) nounwind readnone alwaysinline {
reduce4(i16, @__add_varying_i16, @__add_uniform_i16)
}
define float @__reduce_add_float(<4 x float> %v) nounwind readonly alwaysinline {
@@ -273,18 +334,13 @@ define i32 @__reduce_max_int32(<4 x i32>) nounwind readnone {
reduce4(i32, @__max_varying_int32, @__max_uniform_int32)
}
define i32 @__reduce_add_uint32(<4 x i32> %v) nounwind readnone {
%r = call i32 @__reduce_add_int32(<4 x i32> %v)
ret i32 %r
}
define i32 @__reduce_min_uint32(<4 x i32>) nounwind readnone {
reduce4(i32, @__min_varying_uint32, @__min_uniform_uint32)
}
define i32 @__reduce_max_uint32(<4 x i32>) nounwind readnone {
reduce4(i32, @__max_varying_uint32, @__max_uniform_uint32)
}
}
define double @__reduce_add_double(<4 x double>) nounwind readnone {
@@ -341,17 +397,17 @@ reduce_equal(4)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; masked store
define void @__masked_store_blend_32(<4 x i32>* nocapture, <4 x i32>,
<4 x i32> %mask) nounwind alwaysinline {
%val = load <4 x i32> * %0, align 4
define void @__masked_store_blend_i32(<4 x i32>* nocapture, <4 x i32>,
<4 x i32> %mask) nounwind alwaysinline {
%val = load PTR_OP_ARGS(`<4 x i32> ') %0, align 4
%newval = call <4 x i32> @__vselect_i32(<4 x i32> %val, <4 x i32> %1, <4 x i32> %mask)
store <4 x i32> %newval, <4 x i32> * %0, align 4
ret void
}
define void @__masked_store_blend_64(<4 x i64>* nocapture %ptr, <4 x i64> %new,
<4 x i32> %mask) nounwind alwaysinline {
%oldValue = load <4 x i64>* %ptr, align 8
define void @__masked_store_blend_i64(<4 x i64>* nocapture %ptr, <4 x i64> %new,
<4 x i32> %mask) nounwind alwaysinline {
%oldValue = load PTR_OP_ARGS(`<4 x i64>') %ptr, align 8
; Do 4x64-bit blends by doing two <4 x i32> blends, where the <4 x i32> values
; are actually bitcast <2 x i64> values
@@ -392,6 +448,8 @@ define void @__masked_store_blend_64(<4 x i64>* nocapture %ptr, <4 x i64> %new,
}
masked_store_float_double()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rcp
@@ -439,62 +497,15 @@ define <4 x float> @__sqrt_varying_float(<4 x float>) nounwind readonly alwaysin
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; svml stuff
declare <4 x float> @__svml_sinf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_cosf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_sincosf4(<4 x float> *, <4 x float>) nounwind readnone
declare <4 x float> @__svml_tanf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_atanf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_atan2f4(<4 x float>, <4 x float>) nounwind readnone
declare <4 x float> @__svml_expf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_logf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_powf4(<4 x float>, <4 x float>) nounwind readnone
include(`svml.m4')
;; single precision
svml_declare(float,f4,4)
svml_define(float,f4,4,f)
;; double precision
svml_declare(double,2,2)
svml_define_x(double,2,2,d,4)
define <4 x float> @__svml_sin(<4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_sinf4(<4 x float> %0)
ret <4 x float> %ret
}
define <4 x float> @__svml_cos(<4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_cosf4(<4 x float> %0)
ret <4 x float> %ret
}
define void @__svml_sincos(<4 x float>, <4 x float> *, <4 x float> *) nounwind readnone alwaysinline {
%s = call <4 x float> @__svml_sincosf4(<4 x float> * %2, <4 x float> %0)
store <4 x float> %s, <4 x float> * %1
ret void
}
define <4 x float> @__svml_tan(<4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_tanf4(<4 x float> %0)
ret <4 x float> %ret
}
define <4 x float> @__svml_atan(<4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_atanf4(<4 x float> %0)
ret <4 x float> %ret
}
define <4 x float> @__svml_atan2(<4 x float>, <4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_atan2f4(<4 x float> %0, <4 x float> %1)
ret <4 x float> %ret
}
define <4 x float> @__svml_exp(<4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_expf4(<4 x float> %0)
ret <4 x float> %ret
}
define <4 x float> @__svml_log(<4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_logf4(<4 x float> %0)
ret <4 x float> %ret
}
define <4 x float> @__svml_pow(<4 x float>, <4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_powf4(<4 x float> %0, <4 x float> %1)
ret <4 x float> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float min/max
@@ -543,35 +554,46 @@ define <4 x double> @__max_varying_double(<4 x double>, <4 x double>) nounwind r
masked_store_blend_8_16_by_4()
gen_masked_store(4, i8, 8)
gen_masked_store(4, i16, 16)
gen_masked_store(4, i32, 32)
gen_masked_store(4, i64, 64)
gen_masked_store(i8)
gen_masked_store(i16)
gen_masked_store(i32)
gen_masked_store(i64)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unaligned loads/loads+broadcasts
load_and_broadcast(4, i8, 8)
load_and_broadcast(4, i16, 16)
load_and_broadcast(4, i32, 32)
load_and_broadcast(4, i64, 64)
masked_load(4, i8, 8, 1)
masked_load(4, i16, 16, 2)
masked_load(4, i32, 32, 4)
masked_load(4, i64, 64, 8)
masked_load(i8, 1)
masked_load(i16, 2)
masked_load(i32, 4)
masked_load(float, 4)
masked_load(i64, 8)
masked_load(double, 8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather/scatter
; define these with the macros from stdlib.m4
gen_gather(4, i8)
gen_gather(4, i16)
gen_gather(4, i32)
gen_gather(4, i64)
gen_gather_factored(i8)
gen_gather_factored(i16)
gen_gather_factored(i32)
gen_gather_factored(float)
gen_gather_factored(i64)
gen_gather_factored(double)
gen_scatter(4, i8)
gen_scatter(4, i16)
gen_scatter(4, i32)
gen_scatter(4, i64)
gen_scatter(i8)
gen_scatter(i16)
gen_scatter(i32)
gen_scatter(float)
gen_scatter(i64)
gen_scatter(double)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reciprocals in double precision, if supported
rsqrtd_decl()
rcpd_decl()
transcendetals_decl()
trigonometry_decl()

500
builtins/target-sse4-16.ll Normal file
View File

@@ -0,0 +1,500 @@
;; Copyright (c) 2013, 2015, Google, Inc.
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Google, Inc. nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Define common 4-wide stuff
define(`WIDTH',`8')
define(`MASK',`i16')
include(`util.m4')
stdlib_core()
packed_load_and_store()
scans()
int64minmax()
saturation_arithmetic()
include(`target-sse4-common.ll')
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
declare float @__half_to_float_uniform(i16 %v) nounwind readnone
declare <WIDTH x float> @__half_to_float_varying(<WIDTH x i16> %v) nounwind readnone
declare i16 @__float_to_half_uniform(float %v) nounwind readnone
declare <WIDTH x i16> @__float_to_half_varying(<WIDTH x float> %v) nounwind readnone
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rcp
declare <4 x float> @llvm.x86.sse.rcp.ps(<4 x float>) nounwind readnone
define <WIDTH x float> @__rcp_varying_float(<WIDTH x float>) nounwind readonly alwaysinline {
unary4to8(call, float, @llvm.x86.sse.rcp.ps, %0)
; do one N-R iteration to improve precision
; float iv = __rcp_v(v);
; return iv * (2. - v * iv);
%v_iv = fmul <8 x float> %0, %call
%two_minus = fsub <8 x float> <float 2., float 2., float 2., float 2.,
float 2., float 2., float 2., float 2.>, %v_iv
%iv_mul = fmul <8 x float> %call, %two_minus
ret <8 x float> %iv_mul
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; rsqrt
declare <4 x float> @llvm.x86.sse.rsqrt.ps(<4 x float>) nounwind readnone
define <WIDTH x float> @__rsqrt_varying_float(<WIDTH x float> %v) nounwind readonly alwaysinline {
; float is = __rsqrt_v(v);
unary4to8(is, float, @llvm.x86.sse.rsqrt.ps, %v)
; Newton-Raphson iteration to improve precision
; return 0.5 * is * (3. - (v * is) * is);
%v_is = fmul <8 x float> %v, %is
%v_is_is = fmul <8 x float> %v_is, %is
%three_sub = fsub <8 x float> <float 3., float 3., float 3., float 3.,
float 3., float 3., float 3., float 3.>, %v_is_is
%is_mul = fmul <8 x float> %is, %three_sub
%half_scale = fmul <8 x float> <float 0.5, float 0.5, float 0.5, float 0.5,
float 0.5, float 0.5, float 0.5, float 0.5>, %is_mul
ret <8 x float> %half_scale
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; sqrt
declare <4 x float> @llvm.x86.sse.sqrt.ps(<4 x float>) nounwind readnone
define <8 x float> @__sqrt_varying_float(<8 x float>) nounwind readonly alwaysinline {
unary4to8(call, float, @llvm.x86.sse.sqrt.ps, %0)
ret <8 x float> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision sqrt
declare <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double>) nounwind readnone
define <8 x double> @__sqrt_varying_double(<8 x double>) nounwind
alwaysinline {
unary2to8(ret, double, @llvm.x86.sse2.sqrt.pd, %0)
ret <8 x double> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rounding floats
declare <4 x float> @llvm.x86.sse41.round.ps(<4 x float>, i32) nounwind readnone
define <8 x float> @__round_varying_float(<8 x float>) nounwind readonly alwaysinline {
; roundps, round mode nearest 0b00 | don't signal precision exceptions 0b1000 = 8
round4to8(%0, 8)
}
define <8 x float> @__floor_varying_float(<8 x float>) nounwind readonly alwaysinline {
; roundps, round down 0b01 | don't signal precision exceptions 0b1001 = 9
round4to8(%0, 9)
}
define <8 x float> @__ceil_varying_float(<8 x float>) nounwind readonly alwaysinline {
; roundps, round up 0b10 | don't signal precision exceptions 0b1010 = 10
round4to8(%0, 10)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rounding doubles
declare <2 x double> @llvm.x86.sse41.round.pd(<2 x double>, i32) nounwind readnone
define <8 x double> @__round_varying_double(<8 x double>) nounwind readonly alwaysinline {
round2to8double(%0, 8)
}
define <8 x double> @__floor_varying_double(<8 x double>) nounwind readonly alwaysinline {
round2to8double(%0, 9)
}
define <8 x double> @__ceil_varying_double(<8 x double>) nounwind readonly alwaysinline {
round2to8double(%0, 10)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float min/max
declare <4 x float> @llvm.x86.sse.max.ps(<4 x float>, <4 x float>) nounwind readnone
declare <4 x float> @llvm.x86.sse.min.ps(<4 x float>, <4 x float>) nounwind readnone
define <8 x float> @__max_varying_float(<8 x float>, <8 x float>) nounwind readonly alwaysinline {
binary4to8(call, float, @llvm.x86.sse.max.ps, %0, %1)
ret <8 x float> %call
}
define <8 x float> @__min_varying_float(<8 x float>, <8 x float>) nounwind readonly alwaysinline {
binary4to8(call, float, @llvm.x86.sse.min.ps, %0, %1)
ret <8 x float> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int32 min/max
define <8 x i32> @__min_varying_int32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(call, i32, @llvm.x86.sse41.pminsd, %0, %1)
ret <8 x i32> %call
}
define <8 x i32> @__max_varying_int32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(call, i32, @llvm.x86.sse41.pmaxsd, %0, %1)
ret <8 x i32> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; unsigned int min/max
define <8 x i32> @__min_varying_uint32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(call, i32, @llvm.x86.sse41.pminud, %0, %1)
ret <8 x i32> %call
}
define <8 x i32> @__max_varying_uint32(<8 x i32>, <8 x i32>) nounwind readonly alwaysinline {
binary4to8(call, i32, @llvm.x86.sse41.pmaxud, %0, %1)
ret <8 x i32> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision min/max
declare <2 x double> @llvm.x86.sse2.max.pd(<2 x double>, <2 x double>) nounwind readnone
declare <2 x double> @llvm.x86.sse2.min.pd(<2 x double>, <2 x double>) nounwind readnone
define <8 x double> @__min_varying_double(<8 x double>, <8 x double>) nounwind readnone {
binary2to8(ret, double, @llvm.x86.sse2.min.pd, %0, %1)
ret <8 x double> %ret
}
define <8 x double> @__max_varying_double(<8 x double>, <8 x double>) nounwind readnone {
binary2to8(ret, double, @llvm.x86.sse2.max.pd, %0, %1)
ret <8 x double> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; svml
; FIXME
include(`svml.m4')
svml_stubs(float,f,WIDTH)
svml_stubs(double,d,WIDTH)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; horizontal ops / reductions
declare i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8>) nounwind readnone
define i64 @__movmsk(<8 x MASK>) nounwind readnone alwaysinline {
%m8 = trunc <8 x MASK> %0 to <8 x i8>
%mask8 = shufflevector <8 x i8> %m8, <8 x i8> zeroinitializer,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
%m = call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %mask8)
%m64 = zext i32 %m to i64
ret i64 %m64
}
define i1 @__any(<8 x MASK>) nounwind readnone alwaysinline {
%m = call i64 @__movmsk(<8 x MASK> %0)
%mne = icmp ne i64 %m, 0
ret i1 %mne
}
define i1 @__all(<8 x MASK>) nounwind readnone alwaysinline {
%m = call i64 @__movmsk(<8 x MASK> %0)
%meq = icmp eq i64 %m, ALL_ON_MASK
ret i1 %meq
}
define i1 @__none(<8 x MASK>) nounwind readnone alwaysinline {
%m = call i64 @__movmsk(<8 x MASK> %0)
%meq = icmp eq i64 %m, 0
ret i1 %meq
}
declare <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8>, <16 x i8>) nounwind readnone
define i16 @__reduce_add_int8(<8 x i8>) nounwind readnone alwaysinline {
%wide8 = shufflevector <8 x i8> %0, <8 x i8> zeroinitializer,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
%rv = call <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8> %wide8,
<16 x i8> zeroinitializer)
%r0 = extractelement <2 x i64> %rv, i32 0
%r1 = extractelement <2 x i64> %rv, i32 1
%r = add i64 %r0, %r1
%r16 = trunc i64 %r to i16
ret i16 %r16
}
define internal <8 x i16> @__add_varying_i16(<8 x i16>,
<8 x i16>) nounwind readnone alwaysinline {
%r = add <8 x i16> %0, %1
ret <8 x i16> %r
}
define internal i16 @__add_uniform_i16(i16, i16) nounwind readnone alwaysinline {
%r = add i16 %0, %1
ret i16 %r
}
define i16 @__reduce_add_int16(<8 x i16>) nounwind readnone alwaysinline {
reduce8(i16, @__add_varying_i16, @__add_uniform_i16)
}
define internal <8 x float> @__add_varying_float(<8 x float>, <8 x float>) {
%r = fadd <8 x float> %0, %1
ret <8 x float> %r
}
define internal float @__add_uniform_float(float, float) {
%r = fadd float %0, %1
ret float %r
}
define float @__reduce_add_float(<8 x float>) nounwind readonly alwaysinline {
reduce8(float, @__add_varying_float, @__add_uniform_float)
}
define float @__reduce_min_float(<8 x float>) nounwind readnone {
reduce8(float, @__min_varying_float, @__min_uniform_float)
}
define float @__reduce_max_float(<8 x float>) nounwind readnone {
reduce8(float, @__max_varying_float, @__max_uniform_float)
}
define internal <8 x i32> @__add_varying_int32(<8 x i32>, <8 x i32>) {
%r = add <8 x i32> %0, %1
ret <8 x i32> %r
}
define internal i32 @__add_uniform_int32(i32, i32) {
%r = add i32 %0, %1
ret i32 %r
}
define i32 @__reduce_add_int32(<8 x i32>) nounwind readnone {
reduce8(i32, @__add_varying_int32, @__add_uniform_int32)
}
define i32 @__reduce_min_int32(<8 x i32>) nounwind readnone {
reduce8(i32, @__min_varying_int32, @__min_uniform_int32)
}
define i32 @__reduce_max_int32(<8 x i32>) nounwind readnone {
reduce8(i32, @__max_varying_int32, @__max_uniform_int32)
}
define i32 @__reduce_min_uint32(<8 x i32>) nounwind readnone {
reduce8(i32, @__min_varying_uint32, @__min_uniform_uint32)
}
define i32 @__reduce_max_uint32(<8 x i32>) nounwind readnone {
reduce8(i32, @__max_varying_uint32, @__max_uniform_uint32)
}
define internal <8 x double> @__add_varying_double(<8 x double>, <8 x double>) {
%r = fadd <8 x double> %0, %1
ret <8 x double> %r
}
define internal double @__add_uniform_double(double, double) {
%r = fadd double %0, %1
ret double %r
}
define double @__reduce_add_double(<8 x double>) nounwind readnone {
reduce8(double, @__add_varying_double, @__add_uniform_double)
}
define double @__reduce_min_double(<8 x double>) nounwind readnone {
reduce8(double, @__min_varying_double, @__min_uniform_double)
}
define double @__reduce_max_double(<8 x double>) nounwind readnone {
reduce8(double, @__max_varying_double, @__max_uniform_double)
}
define internal <8 x i64> @__add_varying_int64(<8 x i64>, <8 x i64>) {
%r = add <8 x i64> %0, %1
ret <8 x i64> %r
}
define internal i64 @__add_uniform_int64(i64, i64) {
%r = add i64 %0, %1
ret i64 %r
}
define i64 @__reduce_add_int64(<8 x i64>) nounwind readnone {
reduce8(i64, @__add_varying_int64, @__add_uniform_int64)
}
define i64 @__reduce_min_int64(<8 x i64>) nounwind readnone {
reduce8(i64, @__min_varying_int64, @__min_uniform_int64)
}
define i64 @__reduce_max_int64(<8 x i64>) nounwind readnone {
reduce8(i64, @__max_varying_int64, @__max_uniform_int64)
}
define i64 @__reduce_min_uint64(<8 x i64>) nounwind readnone {
reduce8(i64, @__min_varying_uint64, @__min_uniform_uint64)
}
define i64 @__reduce_max_uint64(<8 x i64>) nounwind readnone {
reduce8(i64, @__max_varying_uint64, @__max_uniform_uint64)
}
reduce_equal(8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; masked store
define void @__masked_store_blend_i64(<8 x i64>* nocapture, <8 x i64>,
<8 x MASK> %mask) nounwind
alwaysinline {
%mask_as_i1 = trunc <8 x MASK> %mask to <8 x i1>
%old = load PTR_OP_ARGS(`<8 x i64>') %0, align 4
%blend = select <8 x i1> %mask_as_i1, <8 x i64> %1, <8 x i64> %old
store <8 x i64> %blend, <8 x i64>* %0, align 4
ret void
}
define void @__masked_store_blend_i32(<8 x i32>* nocapture, <8 x i32>,
<8 x MASK> %mask) nounwind alwaysinline {
%mask_as_i1 = trunc <8 x MASK> %mask to <8 x i1>
%old = load PTR_OP_ARGS(`<8 x i32>') %0, align 4
%blend = select <8 x i1> %mask_as_i1, <8 x i32> %1, <8 x i32> %old
store <8 x i32> %blend, <8 x i32>* %0, align 4
ret void
}
define void @__masked_store_blend_i16(<8 x i16>* nocapture, <8 x i16>,
<8 x MASK> %mask) nounwind alwaysinline {
%mask_as_i1 = trunc <8 x MASK> %mask to <8 x i1>
%old = load PTR_OP_ARGS(`<8 x i16>') %0, align 4
%blend = select <8 x i1> %mask_as_i1, <8 x i16> %1, <8 x i16> %old
store <8 x i16> %blend, <8 x i16>* %0, align 4
ret void
}
define void @__masked_store_blend_i8(<8 x i8>* nocapture, <8 x i8>,
<8 x MASK> %mask) nounwind alwaysinline {
%mask_as_i1 = trunc <8 x MASK> %mask to <8 x i1>
%old = load PTR_OP_ARGS(`<8 x i8>') %0, align 4
%blend = select <8 x i1> %mask_as_i1, <8 x i8> %1, <8 x i8> %old
store <8 x i8> %blend, <8 x i8>* %0, align 4
ret void
}
gen_masked_store(i8)
gen_masked_store(i16)
gen_masked_store(i32)
gen_masked_store(i64)
masked_store_float_double()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unaligned loads/loads+broadcasts
masked_load(i8, 1)
masked_load(i16, 2)
masked_load(i32, 4)
masked_load(float, 4)
masked_load(i64, 8)
masked_load(double, 8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather/scatter
; define these with the macros from stdlib.m4
gen_gather_factored(i8)
gen_gather_factored(i16)
gen_gather_factored(i32)
gen_gather_factored(float)
gen_gather_factored(i64)
gen_gather_factored(double)
gen_scatter(i8)
gen_scatter(i16)
gen_scatter(i32)
gen_scatter(float)
gen_scatter(i64)
gen_scatter(double)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int8/int16 builtins
declare <16 x i8> @llvm.x86.sse2.pavg.b(<16 x i8>, <16 x i8>) nounwind readnone
define <8 x i8> @__avg_up_uint8(<8 x i8>, <8 x i8>) {
%v0 = shufflevector <8 x i8> %0, <8 x i8> undef,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 undef, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%v1 = shufflevector <8 x i8> %1, <8 x i8> undef,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 undef, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef, i32 undef, i32 undef>
%r16 = call <16 x i8> @llvm.x86.sse2.pavg.b(<16 x i8> %v0, <16 x i8> %v1)
%r = shufflevector <16 x i8> %r16, <16 x i8> undef,
<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
ret <8 x i8> %r
}
declare <8 x i16> @llvm.x86.sse2.pavg.w(<8 x i16>, <8 x i16>) nounwind readnone
define <8 x i16> @__avg_up_uint16(<8 x i16>, <8 x i16>) {
%r = call <8 x i16> @llvm.x86.sse2.pavg.w(<8 x i16> %0, <8 x i16> %1)
ret <8 x i16> %r
}
define_avg_up_int8()
define_avg_up_int16()
define_down_avgs()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reciprocals in double precision, if supported
rsqrtd_decl()
rcpd_decl()
transcendetals_decl()
trigonometry_decl()

498
builtins/target-sse4-8.ll Normal file
View File

@@ -0,0 +1,498 @@
;; Copyright (c) 2013, 2015, Google, Inc.
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
;; modification, are permitted provided that the following conditions are
;; met:
;;
;; * Redistributions of source code must retain the above copyright
;; notice, this list of conditions and the following disclaimer.
;;
;; * Redistributions in binary form must reproduce the above copyright
;; notice, this list of conditions and the following disclaimer in the
;; documentation and/or other materials provided with the distribution.
;;
;; * Neither the name of Google, Inc. nor the names of its
;; contributors may be used to endorse or promote products derived from
;; this software without specific prior written permission.
;;
;;
;; THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
;; IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
;; TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
;; PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
;; OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
;; EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
;; PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
;; PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
;; LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Define common 4-wide stuff
define(`WIDTH',`16')
define(`MASK',`i8')
include(`util.m4')
stdlib_core()
packed_load_and_store()
scans()
int64minmax()
saturation_arithmetic()
include(`target-sse4-common.ll')
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
declare float @__half_to_float_uniform(i16 %v) nounwind readnone
declare <WIDTH x float> @__half_to_float_varying(<WIDTH x i16> %v) nounwind readnone
declare i16 @__float_to_half_uniform(float %v) nounwind readnone
declare <WIDTH x i16> @__float_to_half_varying(<WIDTH x float> %v) nounwind readnone
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rcp
declare <4 x float> @llvm.x86.sse.rcp.ps(<4 x float>) nounwind readnone
define <WIDTH x float> @__rcp_varying_float(<WIDTH x float>) nounwind readonly alwaysinline {
unary4to16(call, float, @llvm.x86.sse.rcp.ps, %0)
; do one N-R iteration to improve precision
; float iv = __rcp_v(v);
; return iv * (2. - v * iv);
%v_iv = fmul <16 x float> %0, %call
%two_minus = fsub <16 x float> <float 2., float 2., float 2., float 2.,
float 2., float 2., float 2., float 2.,
float 2., float 2., float 2., float 2.,
float 2., float 2., float 2., float 2.>, %v_iv
%iv_mul = fmul <16 x float> %call, %two_minus
ret <16 x float> %iv_mul
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; rsqrt
declare <4 x float> @llvm.x86.sse.rsqrt.ps(<4 x float>) nounwind readnone
define <16 x float> @__rsqrt_varying_float(<16 x float> %v) nounwind readonly alwaysinline {
; float is = __rsqrt_v(v);
unary4to16(is, float, @llvm.x86.sse.rsqrt.ps, %v)
; Newton-Raphson iteration to improve precision
; return 0.5 * is * (3. - (v * is) * is);
%v_is = fmul <16 x float> %v, %is
%v_is_is = fmul <16 x float> %v_is, %is
%three_sub = fsub <16 x float> <float 3., float 3., float 3., float 3.,
float 3., float 3., float 3., float 3.,
float 3., float 3., float 3., float 3.,
float 3., float 3., float 3., float 3.>, %v_is_is
%is_mul = fmul <16 x float> %is, %three_sub
%half_scale = fmul <16 x float> <float 0.5, float 0.5, float 0.5, float 0.5,
float 0.5, float 0.5, float 0.5, float 0.5,
float 0.5, float 0.5, float 0.5, float 0.5,
float 0.5, float 0.5, float 0.5, float 0.5>, %is_mul
ret <16 x float> %half_scale
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; sqrt
declare <4 x float> @llvm.x86.sse.sqrt.ps(<4 x float>) nounwind readnone
define <16 x float> @__sqrt_varying_float(<16 x float>) nounwind readonly alwaysinline {
unary4to16(call, float, @llvm.x86.sse.sqrt.ps, %0)
ret <16 x float> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision sqrt
declare <2 x double> @llvm.x86.sse2.sqrt.pd(<2 x double>) nounwind readnone
define <16 x double> @__sqrt_varying_double(<16 x double>) nounwind
alwaysinline {
unary2to16(ret, double, @llvm.x86.sse2.sqrt.pd, %0)
ret <16 x double> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rounding floats
declare <4 x float> @llvm.x86.sse41.round.ps(<4 x float>, i32) nounwind readnone
define <16 x float> @__round_varying_float(<16 x float>) nounwind readonly alwaysinline {
; roundps, round mode nearest 0b00 | don't signal precision exceptions 0b1000 = 8
round4to16(%0, 8)
}
define <16 x float> @__floor_varying_float(<16 x float>) nounwind readonly alwaysinline {
; roundps, round down 0b01 | don't signal precision exceptions 0b1001 = 9
round4to16(%0, 9)
}
define <16 x float> @__ceil_varying_float(<16 x float>) nounwind readonly alwaysinline {
; roundps, round up 0b10 | don't signal precision exceptions 0b1010 = 10
round4to16(%0, 10)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rounding doubles
declare <2 x double> @llvm.x86.sse41.round.pd(<2 x double>, i32) nounwind readnone
define <16 x double> @__round_varying_double(<16 x double>) nounwind readonly alwaysinline {
round2to16double(%0, 8)
}
define <16 x double> @__floor_varying_double(<16 x double>) nounwind readonly alwaysinline {
; roundpd, round down 0b01 | don't signal precision exceptions 0b1001 = 9
round2to16double(%0, 9)
}
define <16 x double> @__ceil_varying_double(<16 x double>) nounwind readonly alwaysinline {
; roundpd, round up 0b10 | don't signal precision exceptions 0b1010 = 10
round2to16double(%0, 10)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float min/max
declare <4 x float> @llvm.x86.sse.max.ps(<4 x float>, <4 x float>) nounwind readnone
declare <4 x float> @llvm.x86.sse.min.ps(<4 x float>, <4 x float>) nounwind readnone
define <16 x float> @__max_varying_float(<16 x float>, <16 x float>) nounwind readonly alwaysinline {
binary4to16(call, float, @llvm.x86.sse.max.ps, %0, %1)
ret <16 x float> %call
}
define <16 x float> @__min_varying_float(<16 x float>, <16 x float>) nounwind readonly alwaysinline {
binary4to16(call, float, @llvm.x86.sse.min.ps, %0, %1)
ret <16 x float> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int32 min/max
define <16 x i32> @__min_varying_int32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(call, i32, @llvm.x86.sse41.pminsd, %0, %1)
ret <16 x i32> %call
}
define <16 x i32> @__max_varying_int32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(call, i32, @llvm.x86.sse41.pmaxsd, %0, %1)
ret <16 x i32> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; unsigned int min/max
define <16 x i32> @__min_varying_uint32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(call, i32, @llvm.x86.sse41.pminud, %0, %1)
ret <16 x i32> %call
}
define <16 x i32> @__max_varying_uint32(<16 x i32>, <16 x i32>) nounwind readonly alwaysinline {
binary4to16(call, i32, @llvm.x86.sse41.pmaxud, %0, %1)
ret <16 x i32> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision min/max
declare <2 x double> @llvm.x86.sse2.max.pd(<2 x double>, <2 x double>) nounwind readnone
declare <2 x double> @llvm.x86.sse2.min.pd(<2 x double>, <2 x double>) nounwind readnone
define <16 x double> @__min_varying_double(<16 x double>, <16 x double>) nounwind readnone {
binary2to16(ret, double, @llvm.x86.sse2.min.pd, %0, %1)
ret <16 x double> %ret
}
define <16 x double> @__max_varying_double(<16 x double>, <16 x double>) nounwind readnone {
binary2to16(ret, double, @llvm.x86.sse2.max.pd, %0, %1)
ret <16 x double> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; svml
; FIXME
include(`svml.m4')
svml_stubs(float,f,WIDTH)
svml_stubs(double,d,WIDTH)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; horizontal ops / reductions
declare i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8>) nounwind readnone
define i64 @__movmsk(<16 x i8>) nounwind readnone alwaysinline {
%m = call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %0)
%m64 = zext i32 %m to i64
ret i64 %m64
}
define i1 @__any(<16 x i8>) nounwind readnone alwaysinline {
%m = call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %0)
%mne = icmp ne i32 %m, 0
ret i1 %mne
}
define i1 @__all(<16 x i8>) nounwind readnone alwaysinline {
%m = call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %0)
%meq = icmp eq i32 %m, ALL_ON_MASK
ret i1 %meq
}
define i1 @__none(<16 x i8>) nounwind readnone alwaysinline {
%m = call i32 @llvm.x86.sse2.pmovmskb.128(<16 x i8> %0)
%meq = icmp eq i32 %m, 0
ret i1 %meq
}
declare <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8>, <16 x i8>) nounwind readnone
define i16 @__reduce_add_int8(<16 x i8>) nounwind readnone alwaysinline {
%rv = call <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8> %0,
<16 x i8> zeroinitializer)
%r0 = extractelement <2 x i64> %rv, i32 0
%r1 = extractelement <2 x i64> %rv, i32 1
%r = add i64 %r0, %r1
%r16 = trunc i64 %r to i16
ret i16 %r16
}
define internal <16 x i16> @__add_varying_i16(<16 x i16>,
<16 x i16>) nounwind readnone alwaysinline {
%r = add <16 x i16> %0, %1
ret <16 x i16> %r
}
define internal i16 @__add_uniform_i16(i16, i16) nounwind readnone alwaysinline {
%r = add i16 %0, %1
ret i16 %r
}
define i16 @__reduce_add_int16(<16 x i16>) nounwind readnone alwaysinline {
reduce16(i16, @__add_varying_i16, @__add_uniform_i16)
}
define internal <16 x float> @__add_varying_float(<16 x float>, <16 x float>) {
%r = fadd <16 x float> %0, %1
ret <16 x float> %r
}
define internal float @__add_uniform_float(float, float) {
%r = fadd float %0, %1
ret float %r
}
define float @__reduce_add_float(<16 x float>) nounwind readonly alwaysinline {
reduce16(float, @__add_varying_float, @__add_uniform_float)
}
define float @__reduce_min_float(<16 x float>) nounwind readnone {
reduce16(float, @__min_varying_float, @__min_uniform_float)
}
define float @__reduce_max_float(<16 x float>) nounwind readnone {
reduce16(float, @__max_varying_float, @__max_uniform_float)
}
define internal <16 x i32> @__add_varying_int32(<16 x i32>, <16 x i32>) {
%r = add <16 x i32> %0, %1
ret <16 x i32> %r
}
define internal i32 @__add_uniform_int32(i32, i32) {
%r = add i32 %0, %1
ret i32 %r
}
define i32 @__reduce_add_int32(<16 x i32>) nounwind readnone {
reduce16(i32, @__add_varying_int32, @__add_uniform_int32)
}
define i32 @__reduce_min_int32(<16 x i32>) nounwind readnone {
reduce16(i32, @__min_varying_int32, @__min_uniform_int32)
}
define i32 @__reduce_max_int32(<16 x i32>) nounwind readnone {
reduce16(i32, @__max_varying_int32, @__max_uniform_int32)
}
define i32 @__reduce_min_uint32(<16 x i32>) nounwind readnone {
reduce16(i32, @__min_varying_uint32, @__min_uniform_uint32)
}
define i32 @__reduce_max_uint32(<16 x i32>) nounwind readnone {
reduce16(i32, @__max_varying_uint32, @__max_uniform_uint32)
}
define internal <16 x double> @__add_varying_double(<16 x double>, <16 x double>) {
%r = fadd <16 x double> %0, %1
ret <16 x double> %r
}
define internal double @__add_uniform_double(double, double) {
%r = fadd double %0, %1
ret double %r
}
define double @__reduce_add_double(<16 x double>) nounwind readnone {
reduce16(double, @__add_varying_double, @__add_uniform_double)
}
define double @__reduce_min_double(<16 x double>) nounwind readnone {
reduce16(double, @__min_varying_double, @__min_uniform_double)
}
define double @__reduce_max_double(<16 x double>) nounwind readnone {
reduce16(double, @__max_varying_double, @__max_uniform_double)
}
define internal <16 x i64> @__add_varying_int64(<16 x i64>, <16 x i64>) {
%r = add <16 x i64> %0, %1
ret <16 x i64> %r
}
define internal i64 @__add_uniform_int64(i64, i64) {
%r = add i64 %0, %1
ret i64 %r
}
define i64 @__reduce_add_int64(<16 x i64>) nounwind readnone {
reduce16(i64, @__add_varying_int64, @__add_uniform_int64)
}
define i64 @__reduce_min_int64(<16 x i64>) nounwind readnone {
reduce16(i64, @__min_varying_int64, @__min_uniform_int64)
}
define i64 @__reduce_max_int64(<16 x i64>) nounwind readnone {
reduce16(i64, @__max_varying_int64, @__max_uniform_int64)
}
define i64 @__reduce_min_uint64(<16 x i64>) nounwind readnone {
reduce16(i64, @__min_varying_uint64, @__min_uniform_uint64)
}
define i64 @__reduce_max_uint64(<16 x i64>) nounwind readnone {
reduce16(i64, @__max_varying_uint64, @__max_uniform_uint64)
}
reduce_equal(16)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; masked store
define void @__masked_store_blend_i64(<16 x i64>* nocapture, <16 x i64>,
<16 x i8> %mask) nounwind
alwaysinline {
%mask_as_i1 = trunc <16 x MASK> %mask to <16 x i1>
%old = load PTR_OP_ARGS(`<16 x i64>') %0, align 4
%blend = select <16 x i1> %mask_as_i1, <16 x i64> %1, <16 x i64> %old
store <16 x i64> %blend, <16 x i64>* %0, align 4
ret void
}
define void @__masked_store_blend_i32(<16 x i32>* nocapture, <16 x i32>,
<16 x MASK> %mask) nounwind alwaysinline {
%mask_as_i1 = trunc <16 x MASK> %mask to <16 x i1>
%old = load PTR_OP_ARGS(`<16 x i32>') %0, align 4
%blend = select <16 x i1> %mask_as_i1, <16 x i32> %1, <16 x i32> %old
store <16 x i32> %blend, <16 x i32>* %0, align 4
ret void
}
define void @__masked_store_blend_i16(<16 x i16>* nocapture, <16 x i16>,
<16 x MASK> %mask) nounwind alwaysinline {
%mask_as_i1 = trunc <16 x MASK> %mask to <16 x i1>
%old = load PTR_OP_ARGS(`<16 x i16>') %0, align 4
%blend = select <16 x i1> %mask_as_i1, <16 x i16> %1, <16 x i16> %old
store <16 x i16> %blend, <16 x i16>* %0, align 4
ret void
}
declare <16 x i8> @llvm.x86.sse41.pblendvb(<16 x i8>, <16 x i8>, <16 x i8>) nounwind readnone
define void @__masked_store_blend_i8(<16 x i8>* nocapture, <16 x i8>,
<16 x MASK> %mask) nounwind alwaysinline {
%old = load PTR_OP_ARGS(`<16 x i8>') %0, align 4
%blend = call <16 x i8> @llvm.x86.sse41.pblendvb(<16 x i8> %old, <16 x i8> %1,
<16 x i8> %mask)
store <16 x i8> %blend, <16 x i8>* %0, align 4
ret void
}
gen_masked_store(i8)
gen_masked_store(i16)
gen_masked_store(i32)
gen_masked_store(i64)
masked_store_float_double()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unaligned loads/loads+broadcasts
masked_load(i8, 1)
masked_load(i16, 2)
masked_load(i32, 4)
masked_load(float, 4)
masked_load(i64, 8)
masked_load(double, 8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather/scatter
; define these with the macros from stdlib.m4
gen_gather_factored(i8)
gen_gather_factored(i16)
gen_gather_factored(i32)
gen_gather_factored(float)
gen_gather_factored(i64)
gen_gather_factored(double)
gen_scatter(i8)
gen_scatter(i16)
gen_scatter(i32)
gen_scatter(float)
gen_scatter(i64)
gen_scatter(double)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int8/int16 builtins
declare <16 x i8> @llvm.x86.sse2.pavg.b(<16 x i8>, <16 x i8>) nounwind readnone
define <16 x i8> @__avg_up_uint8(<16 x i8>, <16 x i8>) nounwind readnone {
%r = call <16 x i8> @llvm.x86.sse2.pavg.b(<16 x i8> %0, <16 x i8> %1)
ret <16 x i8> %r
}
declare <8 x i16> @llvm.x86.sse2.pavg.w(<8 x i16>, <8 x i16>) nounwind readnone
define <16 x i16> @__avg_up_uint16(<16 x i16>, <16 x i16>) nounwind readnone {
v16tov8(i16, %0, %a0, %b0)
v16tov8(i16, %1, %a1, %b1)
%r0 = call <8 x i16> @llvm.x86.sse2.pavg.w(<8 x i16> %a0, <8 x i16> %a1)
%r1 = call <8 x i16> @llvm.x86.sse2.pavg.w(<8 x i16> %b0, <8 x i16> %b1)
v8tov16(i16, %r0, %r1, %r)
ret <16 x i16> %r
}
define_avg_up_int8()
define_avg_up_int16()
define_down_avgs()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reciprocals in double precision, if supported
rsqrtd_decl()
rcpd_decl()
transcendetals_decl()
trigonometry_decl()

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2010-2011, Intel Corporation
;; Copyright (c) 2010-2015, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -29,10 +29,14 @@
;; NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
;; SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; SSE4 target implementation.
ctlztz()
define_prefetches()
define_shuffles()
aossoa()
rdrand_decls()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rounding floats
@@ -66,7 +70,7 @@ define float @__round_uniform_float(float) nounwind readonly alwaysinline {
define float @__floor_uniform_float(float) nounwind readonly alwaysinline {
; see above for round_ss instrinsic discussion...
%xi = insertelement <4 x float> undef, float %0, i32 0
; roundps, round down 0b01 | don't signal precision exceptions 0b1010 = 9
; roundps, round down 0b01 | don't signal precision exceptions 0b1001 = 9
%xr = call <4 x float> @llvm.x86.sse41.round.ss(<4 x float> %xi, <4 x float> %xi, i32 9)
%rs = extractelement <4 x float> %xr, i32 0
ret float %rs
@@ -96,7 +100,7 @@ define double @__round_uniform_double(double) nounwind readonly alwaysinline {
define double @__floor_uniform_double(double) nounwind readonly alwaysinline {
; see above for round_ss instrinsic discussion...
%xi = insertelement <2 x double> undef, double %0, i32 0
; roundpd, round down 0b01 | don't signal precision exceptions 0b1001 = 9
; roundsd, round down 0b01 | don't signal precision exceptions 0b1001 = 9
%xr = call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> %xi, <2 x double> %xi, i32 9)
%rs = extractelement <2 x double> %xr, i32 0
ret double %rs
@@ -105,7 +109,7 @@ define double @__floor_uniform_double(double) nounwind readonly alwaysinline {
define double @__ceil_uniform_double(double) nounwind readonly alwaysinline {
; see above for round_ss instrinsic discussion...
%xi = insertelement <2 x double> undef, double %0, i32 0
; roundps, round up 0b10 | don't signal precision exceptions 0b1010 = 10
; roundsd, round up 0b10 | don't signal precision exceptions 0b1010 = 10
%xr = call <2 x double> @llvm.x86.sse41.round.sd(<2 x double> %xi, <2 x double> %xi, i32 10)
%rs = extractelement <2 x double> %xr, i32 0
ret double %rs
@@ -118,6 +122,8 @@ declare <4 x float> @llvm.x86.sse.rcp.ss(<4 x float>) nounwind readnone
define float @__rcp_uniform_float(float) nounwind readonly alwaysinline {
; do the rcpss call
; uniform float iv = extract(__rcp_u(v), 0);
; return iv * (2. - v * iv);
%vecval = insertelement <4 x float> undef, float %0, i32 0
%call = call <4 x float> @llvm.x86.sse.rcp.ss(<4 x float> %vecval)
%scall = extractelement <4 x float> %call, i32 0
@@ -129,9 +135,8 @@ define float @__rcp_uniform_float(float) nounwind readonly alwaysinline {
ret float %iv_mul
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; rsqrt
;; rsqrt
declare <4 x float> @llvm.x86.sse.rsqrt.ss(<4 x float>) nounwind readnone
@@ -153,7 +158,7 @@ define float @__rsqrt_uniform_float(float) nounwind readonly alwaysinline {
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; sqrt
;; sqrt
declare <4 x float> @llvm.x86.sse.sqrt.ss(<4 x float>) nounwind readnone
@@ -162,6 +167,16 @@ define float @__sqrt_uniform_float(float) nounwind readonly alwaysinline {
ret float %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision sqrt
declare <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double>) nounwind readnone
define double @__sqrt_uniform_double(double) nounwind alwaysinline {
sse_unary_scalar(ret, 2, double, @llvm.x86.sse2.sqrt.sd, %0)
ret double %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; fast math mode
@@ -172,7 +187,7 @@ define void @__fastmath() nounwind alwaysinline {
%ptr = alloca i32
%ptr8 = bitcast i32 * %ptr to i8 *
call void @llvm.x86.sse.stmxcsr(i8 * %ptr8)
%oldval = load i32 *%ptr
%oldval = load PTR_OP_ARGS(`i32 ') %ptr
; turn on DAZ (64)/FTZ (32768) -> 32832
%update = or i32 %oldval, 32832
@@ -197,36 +212,25 @@ define float @__min_uniform_float(float, float) nounwind readonly alwaysinline {
ret float %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision sqrt
declare <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double>) nounwind readnone
define double @__sqrt_uniform_double(double) nounwind alwaysinline {
sse_unary_scalar(ret, 2, double, @llvm.x86.sse2.sqrt.sd, %0)
ret double %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision min/max
declare <2 x double> @llvm.x86.sse2.max.sd(<2 x double>, <2 x double>) nounwind readnone
declare <2 x double> @llvm.x86.sse2.min.sd(<2 x double>, <2 x double>) nounwind readnone
define double @__min_uniform_double(double, double) nounwind readnone {
define double @__min_uniform_double(double, double) nounwind readnone alwaysinline {
sse_binary_scalar(ret, 2, double, @llvm.x86.sse2.min.sd, %0, %1)
ret double %ret
}
define double @__max_uniform_double(double, double) nounwind readnone {
define double @__max_uniform_double(double, double) nounwind readnone alwaysinline {
sse_binary_scalar(ret, 2, double, @llvm.x86.sse2.max.sd, %0, %1)
ret double %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int32 min/max
;; int min/max
declare <4 x i32> @llvm.x86.sse41.pminsd(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.x86.sse41.pmaxsd(<4 x i32>, <4 x i32>) nounwind readnone
@@ -241,8 +245,9 @@ define i32 @__max_uniform_int32(i32, i32) nounwind readonly alwaysinline {
ret i32 %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; unsigned int min/max
;; unsigned int min/max
declare <4 x i32> @llvm.x86.sse41.pminud(<4 x i32>, <4 x i32>) nounwind readnone
declare <4 x i32> @llvm.x86.sse41.pmaxud(<4 x i32>, <4 x i32>) nounwind readnone
@@ -257,9 +262,8 @@ define i32 @__max_uniform_uint32(i32, i32) nounwind readonly alwaysinline {
ret i32 %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; horizontal ops / reductions
;; horizontal ops / reductions
declare i32 @llvm.ctpop.i32(i32) nounwind readnone
@@ -274,3 +278,5 @@ define i64 @__popcnt_int64(i64) nounwind readonly alwaysinline {
%call = call i64 @llvm.ctpop.i64(i64 %0)
ret i64 %call
}
declare_nvptx()

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2010-2011, Intel Corporation
;; Copyright (c) 2010-2015, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -44,9 +44,18 @@ stdlib_core()
packed_load_and_store()
scans()
int64minmax()
saturation_arithmetic()
include(`target-sse4-common.ll')
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
declare float @__half_to_float_uniform(i16 %v) nounwind readnone
declare <WIDTH x float> @__half_to_float_varying(<WIDTH x i16> %v) nounwind readnone
declare i16 @__float_to_half_uniform(float %v) nounwind readnone
declare <WIDTH x i16> @__float_to_half_varying(<WIDTH x float> %v) nounwind readnone
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rcp
@@ -97,87 +106,14 @@ define <8 x float> @__sqrt_varying_float(<8 x float>) nounwind readonly alwaysin
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; svml stuff
declare <4 x float> @__svml_sinf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_cosf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_sincosf4(<4 x float> *, <4 x float>) nounwind readnone
declare <4 x float> @__svml_tanf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_atanf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_atan2f4(<4 x float>, <4 x float>) nounwind readnone
declare <4 x float> @__svml_expf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_logf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_powf4(<4 x float>, <4 x float>) nounwind readnone
include(`svml.m4')
;; single precision
svml_declare(float,f4,4)
svml_define_x(float,f4,4,f,8)
define <8 x float> @__svml_sin(<8 x float>) nounwind readnone alwaysinline {
unary4to8(ret, float, @__svml_sinf4, %0)
ret <8 x float> %ret
}
define <8 x float> @__svml_cos(<8 x float>) nounwind readnone alwaysinline {
unary4to8(ret, float, @__svml_cosf4, %0)
ret <8 x float> %ret
}
define void @__svml_sincos(<8 x float>, <8 x float> *,
<8 x float> *) nounwind readnone alwaysinline {
; call svml_sincosf4 two times with the two 4-wide sub-vectors
%a = shufflevector <8 x float> %0, <8 x float> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
%b = shufflevector <8 x float> %0, <8 x float> undef,
<4 x i32> <i32 4, i32 5, i32 6, i32 7>
%cospa = alloca <4 x float>
%sa = call <4 x float> @__svml_sincosf4(<4 x float> * %cospa, <4 x float> %a)
%cospb = alloca <4 x float>
%sb = call <4 x float> @__svml_sincosf4(<4 x float> * %cospb, <4 x float> %b)
%sin = shufflevector <4 x float> %sa, <4 x float> %sb,
<8 x i32> <i32 0, i32 1, i32 2, i32 3,
i32 4, i32 5, i32 6, i32 7>
store <8 x float> %sin, <8 x float> * %1
%cosa = load <4 x float> * %cospa
%cosb = load <4 x float> * %cospb
%cos = shufflevector <4 x float> %cosa, <4 x float> %cosb,
<8 x i32> <i32 0, i32 1, i32 2, i32 3,
i32 4, i32 5, i32 6, i32 7>
store <8 x float> %cos, <8 x float> * %2
ret void
}
define <8 x float> @__svml_tan(<8 x float>) nounwind readnone alwaysinline {
unary4to8(ret, float, @__svml_tanf4, %0)
ret <8 x float> %ret
}
define <8 x float> @__svml_atan(<8 x float>) nounwind readnone alwaysinline {
unary4to8(ret, float, @__svml_atanf4, %0)
ret <8 x float> %ret
}
define <8 x float> @__svml_atan2(<8 x float>,
<8 x float>) nounwind readnone alwaysinline {
binary4to8(ret, float, @__svml_atan2f4, %0, %1)
ret <8 x float> %ret
}
define <8 x float> @__svml_exp(<8 x float>) nounwind readnone alwaysinline {
unary4to8(ret, float, @__svml_expf4, %0)
ret <8 x float> %ret
}
define <8 x float> @__svml_log(<8 x float>) nounwind readnone alwaysinline {
unary4to8(ret, float, @__svml_logf4, %0)
ret <8 x float> %ret
}
define <8 x float> @__svml_pow(<8 x float>,
<8 x float>) nounwind readnone alwaysinline {
binary4to8(ret, float, @__svml_powf4, %0, %1)
ret <8 x float> %ret
}
;; double precision
svml_declare(double,2,2)
svml_define_x(double,2,2,d,8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
@@ -213,13 +149,13 @@ define <8 x i32> @__max_varying_int32(<8 x i32>, <8 x i32>) nounwind readonly al
; unsigned int min/max
define <8 x i32> @__min_varying_uint32(<8 x i32>,
<8 x i32>) nounwind readonly alwaysinline {
<8 x i32>) nounwind readonly alwaysinline {
binary4to8(call, i32, @llvm.x86.sse41.pminud, %0, %1)
ret <8 x i32> %call
}
define <8 x i32> @__max_varying_uint32(<8 x i32>,
<8 x i32>) nounwind readonly alwaysinline {
<8 x i32>) nounwind readonly alwaysinline {
binary4to8(call, i32, @llvm.x86.sse41.pmaxud, %0, %1)
ret <8 x i32> %call
}
@@ -229,7 +165,7 @@ define <8 x i32> @__max_varying_uint32(<8 x i32>,
declare i32 @llvm.x86.sse.movmsk.ps(<4 x float>) nounwind readnone
define i32 @__movmsk(<8 x i32>) nounwind readnone alwaysinline {
define i64 @__movmsk(<8 x i32>) nounwind readnone alwaysinline {
; first do two 4-wide movmsk calls
%floatmask = bitcast <8 x i32> %0 to <8 x float>
%m0 = shufflevector <8 x float> %floatmask, <8 x float> undef,
@@ -243,7 +179,92 @@ define i32 @__movmsk(<8 x i32>) nounwind readnone alwaysinline {
; of the second one
%v1s = shl i32 %v1, 4
%v = or i32 %v0, %v1s
ret i32 %v
%v64 = zext i32 %v to i64
ret i64 %v64
}
define i1 @__any(<8 x i32>) nounwind readnone alwaysinline {
; first do two 4-wide movmsk calls
%floatmask = bitcast <8 x i32> %0 to <8 x float>
%m0 = shufflevector <8 x float> %floatmask, <8 x float> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
%v0 = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %m0) nounwind readnone
%m1 = shufflevector <8 x float> %floatmask, <8 x float> undef,
<4 x i32> <i32 4, i32 5, i32 6, i32 7>
%v1 = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %m1) nounwind readnone
; and shift the first one over by 4 before ORing it with the value
; of the second one
%v1s = shl i32 %v1, 4
%v = or i32 %v0, %v1s
%cmp = icmp ne i32 %v, 0
ret i1 %cmp
}
define i1 @__all(<8 x i32>) nounwind readnone alwaysinline {
; first do two 4-wide movmsk calls
%floatmask = bitcast <8 x i32> %0 to <8 x float>
%m0 = shufflevector <8 x float> %floatmask, <8 x float> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
%v0 = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %m0) nounwind readnone
%m1 = shufflevector <8 x float> %floatmask, <8 x float> undef,
<4 x i32> <i32 4, i32 5, i32 6, i32 7>
%v1 = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %m1) nounwind readnone
; and shift the first one over by 4 before ORing it with the value
; of the second one
%v1s = shl i32 %v1, 4
%v = or i32 %v0, %v1s
%cmp = icmp eq i32 %v, 255
ret i1 %cmp
}
define i1 @__none(<8 x i32>) nounwind readnone alwaysinline {
; first do two 4-wide movmsk calls
%floatmask = bitcast <8 x i32> %0 to <8 x float>
%m0 = shufflevector <8 x float> %floatmask, <8 x float> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
%v0 = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %m0) nounwind readnone
%m1 = shufflevector <8 x float> %floatmask, <8 x float> undef,
<4 x i32> <i32 4, i32 5, i32 6, i32 7>
%v1 = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %m1) nounwind readnone
; and shift the first one over by 4 before ORing it with the value
; of the second one
%v1s = shl i32 %v1, 4
%v = or i32 %v0, %v1s
%cmp = icmp eq i32 %v, 0
ret i1 %cmp
}
declare <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8>, <16 x i8>) nounwind readnone
define i16 @__reduce_add_int8(<8 x i8>) nounwind readnone alwaysinline {
%wide8 = shufflevector <8 x i8> %0, <8 x i8> zeroinitializer,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8, i32 8>
%rv = call <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8> %wide8,
<16 x i8> zeroinitializer)
%r0 = extractelement <2 x i64> %rv, i32 0
%r1 = extractelement <2 x i64> %rv, i32 1
%r = add i64 %r0, %r1
%r16 = trunc i64 %r to i16
ret i16 %r16
}
define internal <8 x i16> @__add_varying_i16(<8 x i16>,
<8 x i16>) nounwind readnone alwaysinline {
%r = add <8 x i16> %0, %1
ret <8 x i16> %r
}
define internal i16 @__add_uniform_i16(i16, i16) nounwind readnone alwaysinline {
%r = add i16 %0, %1
ret i16 %r
}
define i16 @__reduce_add_int16(<8 x i16>) nounwind readnone alwaysinline {
reduce8(i16, @__add_varying_i16, @__add_uniform_i16)
}
define float @__reduce_min_float(<8 x float>) nounwind readnone alwaysinline {
@@ -279,11 +300,6 @@ define i32 @__reduce_max_int32(<8 x i32>) nounwind readnone alwaysinline {
reduce8by4(i32, @llvm.x86.sse41.pmaxsd, @__max_uniform_int32)
}
define i32 @__reduce_add_uint32(<8 x i32> %v) nounwind readnone alwaysinline {
%r = call i32 @__reduce_add_int32(<8 x i32> %v)
ret i32 %r
}
define i32 @__reduce_min_uint32(<8 x i32>) nounwind readnone alwaysinline {
reduce8by4(i32, @llvm.x86.sse41.pminud, @__min_uniform_uint32)
}
@@ -316,7 +332,7 @@ define double @__reduce_max_double(<8 x double>) nounwind readnone {
}
define <4 x i64> @__add_varying_int64(<4 x i64>,
<4 x i64>) nounwind readnone alwaysinline {
<4 x i64>) nounwind readnone alwaysinline {
%r = add <4 x i64> %0, %1
ret <4 x i64> %r
}
@@ -351,28 +367,30 @@ reduce_equal(8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unaligned loads/loads+broadcasts
load_and_broadcast(8, i8, 8)
load_and_broadcast(8, i16, 16)
load_and_broadcast(8, i32, 32)
load_and_broadcast(8, i64, 64)
masked_load(8, i8, 8, 1)
masked_load(8, i16, 16, 2)
masked_load(8, i32, 32, 4)
masked_load(8, i64, 64, 8)
masked_load(i8, 1)
masked_load(i16, 2)
masked_load(i32, 4)
masked_load(float, 4)
masked_load(i64, 8)
masked_load(double, 8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather/scatter
gen_gather(8, i8)
gen_gather(8, i16)
gen_gather(8, i32)
gen_gather(8, i64)
gen_gather_factored(i8)
gen_gather_factored(i16)
gen_gather_factored(i32)
gen_gather_factored(float)
gen_gather_factored(i64)
gen_gather_factored(double)
gen_scatter(8, i8)
gen_scatter(8, i16)
gen_scatter(8, i32)
gen_scatter(8, i64)
gen_scatter(i8)
gen_scatter(i16)
gen_scatter(i32)
gen_scatter(float)
gen_scatter(i64)
gen_scatter(double)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; float rounding
@@ -435,25 +453,25 @@ define float @__reduce_add_float(<8 x float>) nounwind readonly alwaysinline {
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; masked store
gen_masked_store(8, i8, 8)
gen_masked_store(8, i16, 16)
gen_masked_store(8, i32, 32)
gen_masked_store(8, i64, 64)
gen_masked_store(i8)
gen_masked_store(i16)
gen_masked_store(i32)
gen_masked_store(i64)
masked_store_blend_8_16_by_8()
declare <4 x float> @llvm.x86.sse41.blendvps(<4 x float>, <4 x float>,
<4 x float>) nounwind readnone
define void @__masked_store_blend_32(<8 x i32>* nocapture, <8 x i32>,
<8 x i32> %mask) nounwind alwaysinline {
define void @__masked_store_blend_i32(<8 x i32>* nocapture, <8 x i32>,
<8 x i32> %mask) nounwind alwaysinline {
; do two 4-wide blends with blendvps
%mask_as_float = bitcast <8 x i32> %mask to <8 x float>
%mask_a = shufflevector <8 x float> %mask_as_float, <8 x float> undef,
<4 x i32> <i32 0, i32 1, i32 2, i32 3>
%mask_b = shufflevector <8 x float> %mask_as_float, <8 x float> undef,
<4 x i32> <i32 4, i32 5, i32 6, i32 7>
%oldValue = load <8 x i32>* %0, align 4
%oldValue = load PTR_OP_ARGS(`<8 x i32>') %0, align 4
%oldAsFloat = bitcast <8 x i32> %oldValue to <8 x float>
%newAsFloat = bitcast <8 x i32> %1 to <8 x float>
%old_a = shufflevector <8 x float> %oldAsFloat, <8 x float> undef,
@@ -475,14 +493,14 @@ define void @__masked_store_blend_32(<8 x i32>* nocapture, <8 x i32>,
ret void
}
define void @__masked_store_blend_64(<8 x i64>* nocapture %ptr, <8 x i64> %new,
<8 x i32> %mask) nounwind alwaysinline {
define void @__masked_store_blend_i64(<8 x i64>* nocapture %ptr, <8 x i64> %new,
<8 x i32> %mask) nounwind alwaysinline {
; implement this as 4 blends of <4 x i32>s, which are actually bitcast
; <2 x i64>s...
%mask_as_float = bitcast <8 x i32> %mask to <8 x float>
%old = load <8 x i64>* %ptr, align 8
%old = load PTR_OP_ARGS(`<8 x i64>') %ptr, align 8
; set up the first two 64-bit values
%old01 = shufflevector <8 x i64> %old, <8 x i64> undef, <2 x i32> <i32 0, i32 1>
@@ -542,6 +560,7 @@ define void @__masked_store_blend_64(<8 x i64>* nocapture %ptr, <8 x i64> %new,
ret void
}
masked_store_float_double()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision sqrt
@@ -568,3 +587,17 @@ define <8 x double> @__max_varying_double(<8 x double>, <8 x double>) nounwind r
binary2to8(ret, double, @llvm.x86.sse2.max.pd, %0, %1)
ret <8 x double> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int8/int16 builtins
define_avgs()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reciprocals in double precision, if supported
rsqrtd_decl()
rcpd_decl()
transcendetals_decl()
trigonometry_decl()

View File

@@ -1,4 +1,4 @@
;; Copyright (c) 2010-2011, Intel Corporation
;; Copyright (c) 2010-2015, Intel Corporation
;; All rights reserved.
;;
;; Redistribution and use in source and binary forms, with or without
@@ -41,19 +41,28 @@ stdlib_core()
packed_load_and_store()
scans()
int64minmax()
saturation_arithmetic()
include(`target-sse4-common.ll')
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; half conversion routines
declare float @__half_to_float_uniform(i16 %v) nounwind readnone
declare <WIDTH x float> @__half_to_float_varying(<WIDTH x i16> %v) nounwind readnone
declare i16 @__float_to_half_uniform(float %v) nounwind readnone
declare <WIDTH x i16> @__float_to_half_varying(<WIDTH x float> %v) nounwind readnone
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; rcp
declare <4 x float> @llvm.x86.sse.rcp.ps(<4 x float>) nounwind readnone
define <4 x float> @__rcp_varying_float(<4 x float>) nounwind readonly alwaysinline {
%call = call <4 x float> @llvm.x86.sse.rcp.ps(<4 x float> %0)
; do one N-R iteration to improve precision
; float iv = __rcp_v(v);
; return iv * (2. - v * iv);
%call = call <4 x float> @llvm.x86.sse.rcp.ps(<4 x float> %0)
%v_iv = fmul <4 x float> %0, %call
%two_minus = fsub <4 x float> <float 2., float 2., float 2., float 2.>, %v_iv
%iv_mul = fmul <4 x float> %call, %two_minus
@@ -79,7 +88,7 @@ define <4 x float> @__rsqrt_varying_float(<4 x float> %v) nounwind readonly alwa
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; sqrt
;; sqrt
declare <4 x float> @llvm.x86.sse.sqrt.ps(<4 x float>) nounwind readnone
@@ -146,16 +155,34 @@ define <4 x double> @__ceil_varying_double(<4 x double>) nounwind readonly alway
declare <4 x float> @llvm.x86.sse.max.ps(<4 x float>, <4 x float>) nounwind readnone
declare <4 x float> @llvm.x86.sse.min.ps(<4 x float>, <4 x float>) nounwind readnone
define <4 x float> @__max_varying_float(<4 x float>, <4 x float>) nounwind readonly alwaysinline {
define <4 x float> @__max_varying_float(<4 x float>,
<4 x float>) nounwind readonly alwaysinline {
%call = call <4 x float> @llvm.x86.sse.max.ps(<4 x float> %0, <4 x float> %1)
ret <4 x float> %call
}
define <4 x float> @__min_varying_float(<4 x float>, <4 x float>) nounwind readonly alwaysinline {
define <4 x float> @__min_varying_float(<4 x float>,
<4 x float>) nounwind readonly alwaysinline {
%call = call <4 x float> @llvm.x86.sse.min.ps(<4 x float> %0, <4 x float> %1)
ret <4 x float> %call
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision min/max
declare <2 x double> @llvm.x86.sse2.max.pd(<2 x double>, <2 x double>) nounwind readnone
declare <2 x double> @llvm.x86.sse2.min.pd(<2 x double>, <2 x double>) nounwind readnone
define <4 x double> @__min_varying_double(<4 x double>, <4 x double>) nounwind readnone {
binary2to4(ret, double, @llvm.x86.sse2.min.pd, %0, %1)
ret <4 x double> %ret
}
define <4 x double> @__max_varying_double(<4 x double>, <4 x double>) nounwind readnone {
binary2to4(ret, double, @llvm.x86.sse2.max.pd, %0, %1)
ret <4 x double> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int32 min/max
@@ -183,92 +210,56 @@ define <4 x i32> @__max_varying_uint32(<4 x i32>, <4 x i32>) nounwind readonly a
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; double precision min/max
;; svml stuff
declare <2 x double> @llvm.x86.sse2.max.pd(<2 x double>, <2 x double>) nounwind readnone
declare <2 x double> @llvm.x86.sse2.min.pd(<2 x double>, <2 x double>) nounwind readnone
include(`svml.m4')
;; single precision
svml_declare(float,f4,4)
svml_define(float,f4,4,f)
define <4 x double> @__min_varying_double(<4 x double>, <4 x double>) nounwind readnone {
binary2to4(ret, double, @llvm.x86.sse2.min.pd, %0, %1)
ret <4 x double> %ret
}
define <4 x double> @__max_varying_double(<4 x double>, <4 x double>) nounwind readnone {
binary2to4(ret, double, @llvm.x86.sse2.max.pd, %0, %1)
ret <4 x double> %ret
}
;; double precision
svml_declare(double,2,2)
svml_define_x(double,2,2,d,4)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; svml stuff
declare <4 x float> @__svml_sinf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_cosf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_sincosf4(<4 x float> *, <4 x float>) nounwind readnone
declare <4 x float> @__svml_tanf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_atanf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_atan2f4(<4 x float>, <4 x float>) nounwind readnone
declare <4 x float> @__svml_expf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_logf4(<4 x float>) nounwind readnone
declare <4 x float> @__svml_powf4(<4 x float>, <4 x float>) nounwind readnone
define <4 x float> @__svml_sin(<4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_sinf4(<4 x float> %0)
ret <4 x float> %ret
}
define <4 x float> @__svml_cos(<4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_cosf4(<4 x float> %0)
ret <4 x float> %ret
}
define void @__svml_sincos(<4 x float>, <4 x float> *, <4 x float> *) nounwind readnone alwaysinline {
%s = call <4 x float> @__svml_sincosf4(<4 x float> * %2, <4 x float> %0)
store <4 x float> %s, <4 x float> * %1
ret void
}
define <4 x float> @__svml_tan(<4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_tanf4(<4 x float> %0)
ret <4 x float> %ret
}
define <4 x float> @__svml_atan(<4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_atanf4(<4 x float> %0)
ret <4 x float> %ret
}
define <4 x float> @__svml_atan2(<4 x float>, <4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_atan2f4(<4 x float> %0, <4 x float> %1)
ret <4 x float> %ret
}
define <4 x float> @__svml_exp(<4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_expf4(<4 x float> %0)
ret <4 x float> %ret
}
define <4 x float> @__svml_log(<4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_logf4(<4 x float> %0)
ret <4 x float> %ret
}
define <4 x float> @__svml_pow(<4 x float>, <4 x float>) nounwind readnone alwaysinline {
%ret = call <4 x float> @__svml_powf4(<4 x float> %0, <4 x float> %1)
ret <4 x float> %ret
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; horizontal ops / reductions
;; mask handling
declare i32 @llvm.x86.sse.movmsk.ps(<4 x float>) nounwind readnone
define i32 @__movmsk(<4 x i32>) nounwind readnone alwaysinline {
define i64 @__movmsk(<4 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <4 x i32> %0 to <4 x float>
%v = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %floatmask) nounwind readnone
ret i32 %v
%v64 = zext i32 %v to i64
ret i64 %v64
}
define i1 @__any(<4 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <4 x i32> %0 to <4 x float>
%v = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %floatmask) nounwind readnone
%cmp = icmp ne i32 %v, 0
ret i1 %cmp
}
define i1 @__all(<4 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <4 x i32> %0 to <4 x float>
%v = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %floatmask) nounwind readnone
%cmp = icmp eq i32 %v, 15
ret i1 %cmp
}
define i1 @__none(<4 x i32>) nounwind readnone alwaysinline {
%floatmask = bitcast <4 x i32> %0 to <4 x float>
%v = call i32 @llvm.x86.sse.movmsk.ps(<4 x float> %floatmask) nounwind readnone
%cmp = icmp eq i32 %v, 0
ret i1 %cmp
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal ops / reductions
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal float ops
declare <4 x float> @llvm.x86.sse3.hadd.ps(<4 x float>, <4 x float>) nounwind readnone
define float @__reduce_add_float(<4 x float>) nounwind readonly alwaysinline {
@@ -278,47 +269,18 @@ define float @__reduce_add_float(<4 x float>) nounwind readonly alwaysinline {
ret float %scalar
}
define float @__reduce_min_float(<4 x float>) nounwind readnone {
define float @__reduce_min_float(<4 x float>) nounwind readnone alwaysinline {
reduce4(float, @__min_varying_float, @__min_uniform_float)
}
define float @__reduce_max_float(<4 x float>) nounwind readnone {
define float @__reduce_max_float(<4 x float>) nounwind readnone alwaysinline {
reduce4(float, @__max_varying_float, @__max_uniform_float)
}
define i32 @__reduce_add_int32(<4 x i32> %v) nounwind readnone {
%v1 = shufflevector <4 x i32> %v, <4 x i32> undef,
<4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
%m1 = add <4 x i32> %v1, %v
%m1a = extractelement <4 x i32> %m1, i32 0
%m1b = extractelement <4 x i32> %m1, i32 1
%sum = add i32 %m1a, %m1b
ret i32 %sum
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal double ops
define i32 @__reduce_min_int32(<4 x i32>) nounwind readnone {
reduce4(i32, @__min_varying_int32, @__min_uniform_int32)
}
define i32 @__reduce_max_int32(<4 x i32>) nounwind readnone {
reduce4(i32, @__max_varying_int32, @__max_uniform_int32)
}
define i32 @__reduce_add_uint32(<4 x i32> %v) nounwind readnone {
%r = call i32 @__reduce_add_int32(<4 x i32> %v)
ret i32 %r
}
define i32 @__reduce_min_uint32(<4 x i32>) nounwind readnone {
reduce4(i32, @__min_varying_uint32, @__min_uniform_uint32)
}
define i32 @__reduce_max_uint32(<4 x i32>) nounwind readnone {
reduce4(i32, @__max_varying_uint32, @__max_uniform_uint32)
}
define double @__reduce_add_double(<4 x double>) nounwind readnone {
define double @__reduce_add_double(<4 x double>) nounwind readnone alwaysinline {
%v0 = shufflevector <4 x double> %0, <4 x double> undef,
<2 x i32> <i32 0, i32 1>
%v1 = shufflevector <4 x double> %0, <4 x double> undef,
@@ -330,15 +292,85 @@ define double @__reduce_add_double(<4 x double>) nounwind readnone {
ret double %m
}
define double @__reduce_min_double(<4 x double>) nounwind readnone {
define double @__reduce_min_double(<4 x double>) nounwind readnone alwaysinline {
reduce4(double, @__min_varying_double, @__min_uniform_double)
}
define double @__reduce_max_double(<4 x double>) nounwind readnone {
define double @__reduce_max_double(<4 x double>) nounwind readnone alwaysinline {
reduce4(double, @__max_varying_double, @__max_uniform_double)
}
define i64 @__reduce_add_int64(<4 x i64>) nounwind readnone {
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int8 ops
declare <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8>, <16 x i8>) nounwind readnone
define i16 @__reduce_add_int8(<4 x i8>) nounwind readnone alwaysinline {
%wide8 = shufflevector <4 x i8> %0, <4 x i8> zeroinitializer,
<16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 4, i32 4, i32 4,
i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4>
%rv = call <2 x i64> @llvm.x86.sse2.psad.bw(<16 x i8> %wide8,
<16 x i8> zeroinitializer)
%r0 = extractelement <2 x i64> %rv, i32 0
%r1 = extractelement <2 x i64> %rv, i32 1
%r = add i64 %r0, %r1
%r16 = trunc i64 %r to i16
ret i16 %r16
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int16 ops
define internal <4 x i16> @__add_varying_i16(<4 x i16>,
<4 x i16>) nounwind readnone alwaysinline {
%r = add <4 x i16> %0, %1
ret <4 x i16> %r
}
define internal i16 @__add_uniform_i16(i16, i16) nounwind readnone alwaysinline {
%r = add i16 %0, %1
ret i16 %r
}
define i16 @__reduce_add_int16(<4 x i16>) nounwind readnone alwaysinline {
reduce4(i16, @__add_varying_i16, @__add_uniform_i16)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int32 ops
;; reduction functions
define i32 @__reduce_add_int32(<4 x i32> %v) nounwind readnone alwaysinline {
%v1 = shufflevector <4 x i32> %v, <4 x i32> undef,
<4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
%m1 = add <4 x i32> %v1, %v
%m1a = extractelement <4 x i32> %m1, i32 0
%m1b = extractelement <4 x i32> %m1, i32 1
%sum = add i32 %m1a, %m1b
ret i32 %sum
}
define i32 @__reduce_min_int32(<4 x i32>) nounwind readnone alwaysinline {
reduce4(i32, @__min_varying_int32, @__min_uniform_int32)
}
define i32 @__reduce_max_int32(<4 x i32>) nounwind readnone alwaysinline {
reduce4(i32, @__max_varying_int32, @__max_uniform_int32)
}
define i32 @__reduce_min_uint32(<4 x i32>) nounwind readnone alwaysinline {
reduce4(i32, @__min_varying_uint32, @__min_uniform_uint32)
}
define i32 @__reduce_max_uint32(<4 x i32>) nounwind readnone alwaysinline {
reduce4(i32, @__max_varying_uint32, @__max_uniform_uint32)
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; horizontal int64 ops
;; reduction functions
define i64 @__reduce_add_int64(<4 x i64>) nounwind readnone alwaysinline {
%v0 = shufflevector <4 x i64> %0, <4 x i64> undef,
<2 x i32> <i32 0, i32 1>
%v1 = shufflevector <4 x i64> %0, <4 x i64> undef,
@@ -350,35 +382,58 @@ define i64 @__reduce_add_int64(<4 x i64>) nounwind readnone {
ret i64 %m
}
define i64 @__reduce_min_int64(<4 x i64>) nounwind readnone {
define i64 @__reduce_min_int64(<4 x i64>) nounwind readnone alwaysinline {
reduce4(i64, @__min_varying_int64, @__min_uniform_int64)
}
define i64 @__reduce_max_int64(<4 x i64>) nounwind readnone {
define i64 @__reduce_max_int64(<4 x i64>) nounwind readnone alwaysinline {
reduce4(i64, @__max_varying_int64, @__max_uniform_int64)
}
define i64 @__reduce_min_uint64(<4 x i64>) nounwind readnone {
define i64 @__reduce_min_uint64(<4 x i64>) nounwind readnone alwaysinline {
reduce4(i64, @__min_varying_uint64, @__min_uniform_uint64)
}
define i64 @__reduce_max_uint64(<4 x i64>) nounwind readnone {
define i64 @__reduce_max_uint64(<4 x i64>) nounwind readnone alwaysinline {
reduce4(i64, @__max_varying_uint64, @__max_uniform_uint64)
}
reduce_equal(4)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unaligned loads/loads+broadcasts
masked_load(i8, 1)
masked_load(i16, 2)
masked_load(i32, 4)
masked_load(float, 4)
masked_load(i64, 8)
masked_load(double, 8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; masked store
gen_masked_store(i8)
gen_masked_store(i16)
gen_masked_store(i32)
gen_masked_store(i64)
masked_store_float_double()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; masked store blend
masked_store_blend_8_16_by_4()
declare <4 x float> @llvm.x86.sse41.blendvps(<4 x float>, <4 x float>,
<4 x float>) nounwind readnone
define void @__masked_store_blend_32(<4 x i32>* nocapture, <4 x i32>,
<4 x i32> %mask) nounwind alwaysinline {
define void @__masked_store_blend_i32(<4 x i32>* nocapture, <4 x i32>,
<4 x i32> %mask) nounwind alwaysinline {
%mask_as_float = bitcast <4 x i32> %mask to <4 x float>
%oldValue = load <4 x i32>* %0, align 4
%oldValue = load PTR_OP_ARGS(`<4 x i32>') %0, align 4
%oldAsFloat = bitcast <4 x i32> %oldValue to <4 x float>
%newAsFloat = bitcast <4 x i32> %1 to <4 x float>
%blend = call <4 x float> @llvm.x86.sse41.blendvps(<4 x float> %oldAsFloat,
@@ -390,9 +445,9 @@ define void @__masked_store_blend_32(<4 x i32>* nocapture, <4 x i32>,
}
define void @__masked_store_blend_64(<4 x i64>* nocapture %ptr, <4 x i64> %new,
<4 x i32> %i32mask) nounwind alwaysinline {
%oldValue = load <4 x i64>* %ptr, align 8
define void @__masked_store_blend_i64(<4 x i64>* nocapture %ptr, <4 x i64> %new,
<4 x i32> %i32mask) nounwind alwaysinline {
%oldValue = load PTR_OP_ARGS(`<4 x i64>') %ptr, align 8
%mask = bitcast <4 x i32> %i32mask to <4 x float>
; Do 4x64-bit blends by doing two <4 x i32> blends, where the <4 x i32> values
@@ -437,40 +492,35 @@ define void @__masked_store_blend_64(<4 x i64>* nocapture %ptr, <4 x i64> %new,
ret void
}
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; masked store
masked_store_blend_8_16_by_4()
gen_masked_store(4, i8, 8)
gen_masked_store(4, i16, 16)
gen_masked_store(4, i32, 32)
gen_masked_store(4, i64, 64)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; unaligned loads/loads+broadcasts
load_and_broadcast(4, i8, 8)
load_and_broadcast(4, i16, 16)
load_and_broadcast(4, i32, 32)
load_and_broadcast(4, i64, 64)
masked_load(4, i8, 8, 1)
masked_load(4, i16, 16, 2)
masked_load(4, i32, 32, 4)
masked_load(4, i64, 64, 8)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; gather/scatter
; define these with the macros from stdlib.m4
gen_gather(4, i8)
gen_gather(4, i16)
gen_gather(4, i32)
gen_gather(4, i64)
gen_gather_factored(i8)
gen_gather_factored(i16)
gen_gather_factored(i32)
gen_gather_factored(float)
gen_gather_factored(i64)
gen_gather_factored(double)
gen_scatter(4, i8)
gen_scatter(4, i16)
gen_scatter(4, i32)
gen_scatter(4, i64)
gen_scatter(i8)
gen_scatter(i16)
gen_scatter(i32)
gen_scatter(float)
gen_scatter(i64)
gen_scatter(double)
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; int8/int16 builtins
define_avgs()
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; reciprocals in double precision, if supported
rsqrtd_decl()
rcpd_decl()
transcendetals_decl()
trigonometry_decl()

3511
builtins/util-nvptx.m4 Normal file

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

102
check_env.py Executable file
View File

@@ -0,0 +1,102 @@
#!/usr/bin/python
#
# Copyright (c) 2013, Intel Corporation
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# // Author: Filippov Ilia
import common
import sys
import os
import string
print_debug = common.print_debug
error = common.error
take_lines = common.take_lines
exists = [False, False, False, False, False, False, False, False]
names = ["m4", "bison", "flex", "sde", "ispc", "clang", "gcc", "icc"]
PATH_dir = string.split(os.getenv("PATH"), os.pathsep)
for counter in PATH_dir:
for i in range(0,8):
if os.path.exists(counter + os.sep + names[i]):
exists[i] = True
print_debug("=== in PATH: ===\n", False, "")
print_debug("Tools:\n", False, "")
for i in range(0,3):
if exists[i]:
print_debug(take_lines(names[i] + " --version", "first"), False, "")
else:
error("you don't have " + names[i], 0)
if exists[0] and exists[1] and exists[2]:
if common.check_tools(2):
print_debug("Tools' versions are ok\n", False, "")
print_debug("\nSDE:\n", False, "")
if exists[3]:
print_debug(take_lines(names[3] + " --version", "first"), False, "")
else:
error("you don't have " + names[3], 2)
print_debug("\nISPC:\n", False, "")
if exists[4]:
print_debug(take_lines(names[4] + " --version", "first"), False, "")
else:
error("you don't have " + names[4], 2)
print_debug("\nC/C++ compilers:\n", False, "")
for i in range(5,8):
if exists[i]:
print_debug(take_lines(names[i] + " --version", "first"), False, "")
else:
error("you don't have " + names[i], 2)
print_debug("\n=== in ISPC specific environment variables: ===\n", False, "")
if os.environ.get("LLVM_HOME") == None:
error("you have no LLVM_HOME", 2)
else:
print_debug("Your LLVM_HOME:" + os.environ.get("LLVM_HOME") + "\n", False, "")
if os.environ.get("ISPC_HOME") == None:
error("you have no ISPC_HOME", 2)
else:
print_debug("Your ISPC_HOME:" + os.environ.get("ISPC_HOME") + "\n", False, "")
if os.path.exists(os.environ.get("ISPC_HOME") + os.sep + "ispc"):
print_debug("You have ISPC in your ISPC_HOME: " +
take_lines(os.environ.get("ISPC_HOME") + os.sep + "ispc" + " --version", "first"), False, "")
else:
error("you don't have ISPC in your ISPC_HOME", 2)
if os.environ.get("SDE_HOME") == None:
error("You have no SDE_HOME", 2)
else:
print_debug("Your SDE_HOME:" + os.environ.get("SDE_HOME") + "\n", False, "")
if os.path.exists(os.environ.get("SDE_HOME") + os.sep + "sde"):
print_debug("You have sde in your SDE_HOME: " +
take_lines(os.environ.get("SDE_HOME") + os.sep + "sde" + " --version", "first"), False, "")
else:
error("you don't have any SDE in your ISPC_HOME", 2)

170
check_isa.cpp Normal file
View File

@@ -0,0 +1,170 @@
/*
Copyright (c) 2013-2015, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
///////////////////////////////////////////////////////////////////////////////
// //
// This file is a standalone program, which detects the best supported ISA. //
// //
///////////////////////////////////////////////////////////////////////////////
#include <stdio.h>
#if defined(_WIN32) || defined(_WIN64)
#define ISPC_IS_WINDOWS
#include <intrin.h>
#endif
#if !defined (__arm__)
#if !defined(ISPC_IS_WINDOWS)
static void __cpuid(int info[4], int infoType) {
__asm__ __volatile__ ("cpuid"
: "=a" (info[0]), "=b" (info[1]), "=c" (info[2]), "=d" (info[3])
: "0" (infoType));
}
/* Save %ebx in case it's the PIC register */
static void __cpuidex(int info[4], int level, int count) {
__asm__ __volatile__ ("xchg{l}\t{%%}ebx, %1\n\t"
"cpuid\n\t"
"xchg{l}\t{%%}ebx, %1\n\t"
: "=a" (info[0]), "=r" (info[1]), "=c" (info[2]), "=d" (info[3])
: "0" (level), "2" (count));
}
#endif // !ISPC_IS_WINDOWS
static bool __os_has_avx_support() {
#if defined(ISPC_IS_WINDOWS)
// Check if the OS will save the YMM registers
unsigned long long xcrFeatureMask = _xgetbv(_XCR_XFEATURE_ENABLED_MASK);
return (xcrFeatureMask & 6) == 6;
#else // !defined(ISPC_IS_WINDOWS)
// Check xgetbv; this uses a .byte sequence instead of the instruction
// directly because older assemblers do not include support for xgetbv and
// there is no easy way to conditionally compile based on the assembler used.
int rEAX, rEDX;
__asm__ __volatile__ (".byte 0x0f, 0x01, 0xd0" : "=a" (rEAX), "=d" (rEDX) : "c" (0));
return (rEAX & 6) == 6;
#endif // !defined(ISPC_IS_WINDOWS)
}
static bool __os_has_avx512_support() {
#if defined(ISPC_IS_WINDOWS)
// Check if the OS saves the XMM, YMM and ZMM registers, i.e. it supports AVX2 and AVX512.
// See section 2.1 of software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
unsigned long long xcrFeatureMask = _xgetbv(_XCR_XFEATURE_ENABLED_MASK);
return (xcrFeatureMask & 0xE6) == 0xE6;
#else // !defined(ISPC_IS_WINDOWS)
// Check xgetbv; this uses a .byte sequence instead of the instruction
// directly because older assemblers do not include support for xgetbv and
// there is no easy way to conditionally compile based on the assembler used.
int rEAX, rEDX;
__asm__ __volatile__ (".byte 0x0f, 0x01, 0xd0" : "=a" (rEAX), "=d" (rEDX) : "c" (0));
return (rEAX & 0xE6) == 0xE6;
#endif // !defined(ISPC_IS_WINDOWS)
}
#endif // !__arm__
static const char *
lGetSystemISA() {
#ifdef __arm__
return "ARM NEON";
#else
int info[4];
__cpuid(info, 1);
int info2[4];
// Call cpuid with eax=7, ecx=0
__cpuidex(info2, 7, 0);
if ((info[2] & (1 << 27)) != 0 && // OSXSAVE
(info2[1] & (1 << 5)) != 0 && // AVX2
(info2[1] & (1 << 16)) != 0 && // AVX512 F
__os_has_avx512_support()) {
// We need to verify that AVX2 is also available,
// as well as AVX512, because our targets are supposed
// to use both.
if ((info2[1] & (1 << 17)) != 0 && // AVX512 DQ
(info2[1] & (1 << 28)) != 0 && // AVX512 CDI
(info2[1] & (1 << 30)) != 0 && // AVX512 BW
(info2[1] & (1 << 31)) != 0) { // AVX512 VL
return "SKX";
}
else if ((info2[1] & (1 << 26)) != 0 && // AVX512 PF
(info2[1] & (1 << 27)) != 0 && // AVX512 ER
(info2[1] & (1 << 28)) != 0) { // AVX512 CDI
return "KNL";
}
// If it's unknown AVX512 target, fall through and use AVX2
// or whatever is available in the machine.
}
if ((info[2] & (1 << 27)) != 0 && // OSXSAVE
(info[2] & (1 << 28)) != 0 &&
__os_has_avx_support()) { // AVX
// AVX1 for sure....
// Ivy Bridge?
if ((info[2] & (1 << 29)) != 0 && // F16C
(info[2] & (1 << 30)) != 0) { // RDRAND
// So far, so good. AVX2?
if ((info2[1] & (1 << 5)) != 0) {
return "AVX2 (codename Haswell)";
}
else {
return "AVX1.1 (codename Ivy Bridge)";
}
}
// Regular AVX
return "AVX (codename Sandy Bridge)";
}
else if ((info[2] & (1 << 19)) != 0) {
return "SSE4";
}
else if ((info[3] & (1 << 26)) != 0) {
return "SSE2";
}
else {
return "Error";
}
#endif
}
int main () {
const char* isa = lGetSystemISA();
printf("ISA: %s\n", isa);
return 0;
}

502
common.py Executable file
View File

@@ -0,0 +1,502 @@
#!/usr/bin/python
#
# Copyright (c) 2013, Intel Corporation
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are
# met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# * Neither the name of Intel Corporation nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
# IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
# TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
# EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
# PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
# PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# // Author: Filippov Ilia, Anton Mitrokhin, Vsevolod Livinskiy
import sys
import os
import errno
import shutil
# generic empty class
class EmptyClass(object): pass
# load/save almost every object to a file (good for bug reproducing)
def dump(fname, obj):
import pickle
with open(fname, 'w') as fp:
pickle.dump(obj, fp)
def undump(fname):
import pickle
with open(fname, 'r') as fp:
obj = pickle.load(fp)
return obj
# retrieve the host name
def get_host_name():
import socket
return socket.gethostname()
def write_to_file(filename, line):
f = open(filename, 'a')
f.writelines(line)
f.close()
# remove file if it exists
def remove_if_exists(filename):
if os.path.exists(filename):
if os.path.isdir(filename):
shutil.rmtree(filename)
else:
os.remove(filename)
def make_sure_dir_exists(path):
try:
os.makedirs(path)
except OSError as exception:
if exception.errno != errno.EEXIST:
raise
# detect version which is printed after command
def take_lines(command, which):
os.system(command + " > " + "temp_detect_version")
version = open("temp_detect_version")
if which == "first":
answer = version.readline()
if which == "all":
answer = version.readlines()
version.close()
remove_if_exists("temp_detect_version")
return answer
# print versions of compilers
def print_version(ispc_test, ispc_ref, ref_compiler, s, perf_log, is_windows):
print_debug("\nUsing test compiler: " + take_lines(ispc_test + " --version", "first"), s, perf_log)
if ispc_ref != "":
print_debug("Using ref compiler: " + take_lines(ispc_ref + " --version", "first"), s, perf_log)
if is_windows == False:
temp1 = take_lines(ref_compiler + " --version", "first")
else:
os.system(ref_compiler + " 2>&1" + " 2> temp_detect_version > temp_detect_version1" )
version = open("temp_detect_version")
temp1 = version.readline()
version.close()
remove_if_exists("temp_detect_version")
remove_if_exists("temp_detect_version1")
print_debug("Using C/C++ compiler: " + temp1 + "\n", s, perf_log)
# print everything from scripts instead errors
def print_debug(line, silent, filename):
if silent == False:
sys.stdout.write(line)
sys.stdout.flush()
if os.environ.get("ISPC_HOME") != None:
if os.path.exists(os.environ.get("ISPC_HOME")):
write_to_file(os.environ["ISPC_HOME"] + os.sep + "notify_log.log", line)
if filename != "":
write_to_file(filename, line)
# print errors from scripts
# type 1 for error in environment
# type 2 for warning
# type 3 for error of compiler or test which isn't the goal of script
def error(line, error_type):
line = line + "\n"
if error_type == 1:
sys.stderr.write("Fatal error: " + line)
sys.exit(1)
if error_type == 2:
sys.stderr.write("Warning: " + line)
if error_type == 0:
print_debug("FIND ERROR: " + line, False, "")
def check_tools(m):
input_tools=[[[1,4],"m4 --version", "bad m4 version"],
[[2,4],"bison --version", "bad bison version"],
[[2,5], "flex --version", "bad flex version"]]
ret = 1
for t in range(0,len(input_tools)):
t1 = ((take_lines(input_tools[t][1], "first"))[:-1].split(" "))
for i in range(0,len(t1)):
t11 = t1[i].split(".")
f = True
for j in range(0,len(t11)):
if not t11[j].isdigit():
f = False
if f == True:
for j in range(0,len(t11)):
if j < len(input_tools[t][0]):
if int(t11[j])<input_tools[t][0][j]:
error(input_tools[t][2], m)
ret = 0
break
if int(t11[j])>input_tools[t][0][j]:
break
return ret
# regression testing functionality
class TestResult(object):
"""
this class stores basicly two integers which stand for the result
of the test: (runfail{0/1}, compfail{0/1}). other values are
deemed invalid. the __cmp__ function of this class is used to
define what test regression actually is.
"""
def __init__(self, runfailed, compfailed):
self.runfailed, self.compfailed = (runfailed, compfailed)
def __cmp__(self, other):
if isinstance(other, TestResult):
if self.runfailed == other.runfailed and \
self.compfailed == other.compfailed:
return 0
elif self.compfailed > other.compfailed:
return 1
elif self.runfailed > other.runfailed and \
self.compfailed == other.compfailed:
return 1
else:
return -1
raise RuntimeError("Wrong type for comparioson")
return NotImplemented
def __repr__(self):
if (self.runfailed < 0 or self.compfailed < 0):
return "(Undefined)"
return "(r%d c%d)" % (self.runfailed, self.compfailed)
class TestCase(object):
"""
the TestCase() is a combination of parameters the tast was run with:
the architecture (x86, x86-64 ...), compiler optimization (-O0, -O2 ...)
and target (sse, avx ...). we also store the result of the test here.
"""
def __init__(self, arch, opt, target):
self.arch, self.opt, self.target = (arch, opt, target)
self.result = TestResult(-1, -1)
def __repr__(self):
string = "%s %s %s: " % (self.arch, self.opt, self.target)
string = string + repr(self.result)
return string
def __hash__(self):
return hash(self.arch + self.opt + self.target)
def __ne__(self, other):
if isinstance(other, TestCase):
if hash(self.arch + self.opt + self.target) != hash(other):
return True
return False
raise RuntimeError("Wrong type for comparioson")
return NotImplemented
def __eq__(self, other):
if isinstance(other, TestCase):
return not self.__ne__(other)
raise RuntimeError("Wrong type for comparioson")
return NotImplemented
class Test(object):
"""
Test() stores all TestCase() objects for a given test file name
i.e. all archs/opts/targets/ and corresponding testing results.
"""
def __init__(self, name):
self.name = name
self.test_cases = []
def add_result(self, test_case):
if test_case in self.test_cases:
raise RuntimeError("This test case is already in the list: " + repr(test_case))
return
self.test_cases.append(test_case)
def __repr__(self):
string = self.name + '\n'
string = string.rjust(20)
for test_case in self.test_cases:
string += repr(test_case).rjust(60) + '\n'
return string
def __hash__(self):
return hash(self.name)
def __ne__(self, other):
if isinstance(other, Test):
if hash(self) != hash(other):
return True
return False
return NotImplemented
def __eq__(self, other):
if isinstance(other, Test):
return not self.__ne__(other)
return NotImplemented
class RegressionInfo(object):
"""
service class which provides some statistics on a given regression.
the regression test names and cases are given in a form of Test() objects
with empty (-1, -1) results
"""
def __init__(self, revision_old, revision_new, tests):
self.revision_old, self.revision_new = (revision_old, revision_new)
self.tests = tests
self.archfailes = {}
self.optfails = {}
self.targetfails = {}
self.testfails = {}
self.archs = []
self.opts = []
self.targets = []
for test in tests:
for test_case in test.test_cases:
self.inc_dictionary(self.testfails, test.name)
self.inc_dictionary(self.archfailes, test_case.arch)
self.inc_dictionary(self.optfails, test_case.opt)
self.inc_dictionary(self.targetfails, test_case.target)
self.archs = self.archfailes.keys()
self.opts = self.optfails.keys()
self.targets = self.targetfails.keys()
def inc_dictionary(self, dictionary, key):
if key not in dictionary:
dictionary[key] = 0
dictionary[key] += 1
def __repr__(self):
string = "Regression of LLVM revision %s in comparison to %s\n" % (self.revision_new, self.revision_old)
string += repr(self.tests) + '\n'
string += str(self.testfails) + '\n'
string += str(self.archfailes) + '\n'
string += str(self.optfails) + '\n'
string += str(self.targetfails) + '\n'
return string
class TestTable(object):
"""
the table which stores a tuple of Test() objects (one per revision) and has some
convenience methods for dealing with them
"""
def __init__(self):
""" This dictionary contains {rev: [test1, test2, ...], ...}, where 'rev' is a string (revision name) and 'test#'
is a Test() object instance """
self.table = {}
def add_result(self, revision_name, test_name, arch, opt, target, runfailed, compfailed):
revision_name = str(revision_name)
if revision_name not in self.table:
self.table[revision_name] = []
test_case = TestCase(arch, opt, target)
test_case.result = TestResult(runfailed, compfailed)
for test in self.table[revision_name]:
if test.name == test_name:
test.add_result(test_case)
return
test = Test(test_name)
test.add_result(test_case)
self.table[revision_name].append(test)
def test_intersection(self, test1, test2):
""" Return test cases common for test1 and test2. If test names are different than there is nothing in common """
if test1.name != test2.name:
return []
return list(set(test1.test_cases) & set(test2.test_cases))
def test_regression(self, test1, test2):
""" Return the tuple of empty (i.e. with undefined results) TestCase() objects
corresponding to regression in test2 comparing to test1 """
if test1.name != test2.name:
return []
regressed = []
for tc1 in test1.test_cases:
for tc2 in test2.test_cases:
""" If test cases are equal (same arch, opt and target) but tc2 has more runfails or compfails """
if tc1 == tc2 and tc1.result < tc2.result:
regressed.append(TestCase(tc1.arch, tc1.opt, tc1.target))
return regressed
def regression(self, revision_old, revision_new):
""" Return a tuple of Test() objects containing TestCase() object which show regression along given revisions """
revision_old, revision_new = (str(revision_old), str(revision_new))
if revision_new not in self.table:
raise RuntimeError("This revision in not in the database: " + str(revision_new) + " (" + str(self.table.keys()) + ")")
return
if revision_old not in self.table:
raise RuntimeError("This revision in not in the database: " + str(revision_old) + " (" + str(self.table.keys()) + ")")
return
regressed = []
for test_old in self.table[revision_old]:
for test_new in self.table[revision_new]:
tr = self.test_regression(test_old, test_new)
if len(tr) == 0:
continue
test = Test(test_new.name)
for test_case in tr:
test.add_result(test_case)
regressed.append(test)
return RegressionInfo(revision_old, revision_new, regressed)
def __repr__(self):
string = ""
for rev in self.table.keys():
string += "[" + rev + "]:\n"
for test in self.table[rev]:
string += repr(test) + '\n'
return string
class RevisionInfo(object):
"""
this class is intended to store some relevant information about curent LLVM revision
"""
def __init__(self, hostname, revision):
self.hostname, self.revision = hostname, revision
self.archs = []
self.opts = []
self.targets = []
self.succeed = 0
self.runfailed = 0
self.compfailed = 0
self.skipped = 0
self.testall = 0
self.regressions = {}
def register_test(self, arch, opt, target, succeed, runfailed, compfailed, skipped):
if arch not in self.archs:
self.archs.append(arch)
if opt not in self.opts:
self.opts.append(opt)
if target not in self.targets:
self.targets.append(target)
self.runfailed += runfailed
self.compfailed += compfailed
self.skipped += skipped
self.succeed += succeed
def add_regression(self, revision, regression_info):
""" input is intended to be from 'TestTable.regression(..)', 'regression_info' is a tuple of RegressionInfo() object
(regression.py) and 'revision' is tested (not current) LLVM revision name """
if revision == self.revision:
raise RuntimeError("No regression can be found along the same LLVM revision!")
if revision in self.regressions:
raise RuntimeError("This revision regression info is already in self.regressions!")
self.regressions[revision] = regression_info
def __repr__(self):
string = "%s: LLVM(%s)\n" % (self.hostname, self.revision)
string += "archs : %s\n" % (str(self.archs))
string += "opts : %s\n" % (str(self.opts))
string += "targets: %s\n" % (str(self.targets))
string += "runfails: %d/%d\n" % (self.runfailed, self.testall)
string += "compfails: %d/%d\n" % (self.compfailed, self.testall)
string += "skipped: %d/%d\n" % (self.skipped, self.testall)
string += "succeed: %d/%d\n" % (self.succeed, self.testall)
return string
class ExecutionStateGatherer(object):
def __init__(self):
self.hostname = self.get_host_name()
self.revision = ""
self.rinf = []
self.tt = TestTable()
self.switch_revision("undefined")
def switch_revision(self, revision):
self.revision = revision
self.rinf.append(RevisionInfo(self.hostname, self.revision))
def current_rinf(self):
if len(self.rinf) == 0:
raise RuntimeError("self.rinf is empty. Apparently you've never invoked switch_revision")
return self.rinf[len(self.rinf) - 1]
def add_to_tt(self, test_name, arch, opt, target, runfailed, compfailed):
if len(self.rinf) == 0:
raise RuntimeError("self.rinf is empty. Apparently you've never invoked switch_revision")
self.tt.add_result(self.revision, test_name, arch, opt, target, runfailed, compfailed)
def add_to_rinf(self, arch, opt, target, succeed, runfailed, compfailed, skipped):
self.current_rinf().register_test(arch, opt, target, succeed, runfailed, compfailed, skipped)
def add_to_rinf_testall(self, tried_to_test):
self.current_rinf().testall += tried_to_test
def load_from_tt(self, tt):
# TODO: fill in self.rinf field!
self.tt = tt
REVISIONS = tt.table.keys()
self.revision = ""
if len(REVISIONS) != 0:
self.revision = REVISIONS[0]
print "ESG: loaded from 'TestTable()' with revisions", REVISIONS
def dump(self, fname, obj):
import pickle
with open(fname, 'w') as fp:
pickle.dump(obj, fp)
def undump(self, fname):
import pickle
with open(fname, 'r') as fp:
obj = pickle.load(fp)
return obj
def get_host_name(self):
import socket
return socket.gethostname()
def __repr__(self):
string = "Hostname: %s\n" % (self.hostname)
string += "Current LLVM Revision = %s\n\n" % (self.revision)
for rev_info in self.rinf:
string += repr(rev_info) + '\n'
return string
# this class instance is intended to gather and store all information
# regarding the testing process.
ex_state = ExecutionStateGatherer()

View File

@@ -1,7 +1,7 @@
" Vim syntax file
" Language: ISPC
" Maintainer: Andreas Wendleder <andreas.wendleder@gmail.com>
" Last Change: 2011 Aug 3
" Last Change: 2016 May 04
" Quit when a syntax file was already loaded
if exists("b:current_syntax")
@@ -13,11 +13,19 @@ runtime! syntax/c.vim
unlet b:current_syntax
" New keywords
syn keyword ispcStatement cbreak ccontinue creturn launch print reference soa sync task
syn keyword ispcStatement cbreak ccontinue creturn launch print reference soa sync
syn keyword ispcConditional cif
syn keyword ispcRepeat cdo cfor cwhile
syn keyword ispcBuiltin programCount programIndex
syn keyword ispcType export int8 int16 int32 int64
syn keyword ispcRepeat cdo cfor cwhile foreach foreach_tiled foreach_unique foreach_active
syn keyword ispcBuiltin programCount programIndex taskCount taskCount0 taskCount1 taskCount3 taskIndex taskIndex0 taskIndex1 taskIndex2
syn keyword ispcType export uniform varying int8 int16 int32 int64 task new delete
syn keyword ispcOperator operator
"double precision floating point number, with dot, optional exponent
syn match cFloat display contained "\d\+\.\d*d[-+]\=\d*\>"
"double precision floating point number, starting with dot, optional exponent
syn match cFloat display contained ".\d*d[-+]\=\d*\>"
"double precision floating point number, without dot, with exponent
syn match cFloat display contained "\d\+d[-+]\=\d\+\>"
" Default highlighting
command -nargs=+ HiLink hi def link <args>
@@ -26,6 +34,7 @@ HiLink ispcConditional Conditional
HiLink ispcRepeat Repeat
HiLink ispcBuiltin Statement
HiLink ispcType Type
HiLink ispcOperator Operator
delcommand HiLink
let b:current_syntax = "ispc"

8
contrib/ispc.vim.README Normal file
View File

@@ -0,0 +1,8 @@
To install vim syntax highlighting for ispc files:
1) Copy ispc.vim into ~/.vim/syntax/ispc.vim (create if necessary)
2) Create a filetype for ispc files to correspond to that syntax file
To do this, create and append the following line to ~/.vim/ftdetect/ispc.vim
au BufRead,BufNewFile *.ispc set filetype=ispc

3107
ctx.cpp

File diff suppressed because it is too large Load Diff

313
ctx.h
View File

@@ -1,5 +1,5 @@
/*
Copyright (c) 2010-2011, Intel Corporation
Copyright (c) 2010-2015, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -28,11 +28,11 @@
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file ctx.h
@brief Declaration of the FunctionEmitContext class
@brief %Declaration of the FunctionEmitContext class
*/
#ifndef ISPC_CTX_H
@@ -40,10 +40,20 @@
#include "ispc.h"
#include <map>
#include <llvm/InstrTypes.h>
#include <llvm/Instructions.h>
#include <llvm/Analysis/DIBuilder.h>
#include <llvm/Analysis/DebugInfo.h>
#if ISPC_LLVM_VERSION == ISPC_LLVM_3_2
#include <llvm/InstrTypes.h>
#include <llvm/Instructions.h>
#else // 3.3+
#include <llvm/IR/InstrTypes.h>
#include <llvm/IR/Instructions.h>
#endif
#if ISPC_LLVM_VERSION <= ISPC_LLVM_3_4
#include <llvm/DebugInfo.h>
#include <llvm/DIBuilder.h>
#else // 3.5+
#include <llvm/IR/DebugInfo.h>
#include <llvm/IR/DIBuilder.h>
#endif
struct CFInfo;
@@ -65,7 +75,7 @@ public:
@param firstStmtPos Source file position of the first statement in the
function
*/
FunctionEmitContext(Function *function, Symbol *funSym,
FunctionEmitContext(Function *function, Symbol *funSym,
llvm::Function *llvmFunction,
SourcePos firstStmtPos);
~FunctionEmitContext();
@@ -77,9 +87,9 @@ public:
/** @name Current basic block management
@{
*/
/** Returns the current basic block pointer */
/** Returns the current basic block pointer */
llvm::BasicBlock *GetCurrentBasicBlock();
/** Set the given llvm::BasicBlock to be the basic block to emit
forthcoming instructions into. */
void SetCurrentBasicBlock(llvm::BasicBlock *bblock);
@@ -87,7 +97,7 @@ public:
/** @name Mask management
@{
*/
/** Returns the mask value at entry to the current function. */
/** Returns the mask value at entry to the current function. */
llvm::Value *GetFunctionMask();
/** Returns the mask value corresponding to "varying" control flow
@@ -96,7 +106,7 @@ public:
llvm::Value *GetInternalMask();
/** Returns the complete current mask value--i.e. the logical AND of
the function entry mask and the internal mask. */
the function entry mask and the internal mask. */
llvm::Value *GetFullMask();
/** Returns a pointer to storage in memory that stores the current full
@@ -149,22 +159,21 @@ public:
'continue' statements should jump to (if all running lanes want to
break or continue), uniformControlFlow indicates whether the loop
condition is 'uniform'. */
void StartLoop(llvm::BasicBlock *breakTarget, llvm::BasicBlock *continueTarget,
void StartLoop(llvm::BasicBlock *breakTarget, llvm::BasicBlock *continueTarget,
bool uniformControlFlow);
/** Informs FunctionEmitContext of the value of the mask at the start
of a loop body. */
void SetLoopMask(llvm::Value *mask);
of a loop body or switch statement. */
void SetBlockEntryMask(llvm::Value *mask);
/** Informs FunctionEmitContext that code generation for a loop is
finished. */
void EndLoop();
/** Indicates that code generation for a 'foreach' or 'foreach_tiled'
loop is about to start. The provided basic block pointer indicates
where control flow should go if a 'continue' statement is executed
in the loop. */
void StartForeach(llvm::BasicBlock *continueTarget);
/** Indicates that code generation for a 'foreach', 'foreach_tiled',
'foreach_active', or 'foreach_unique' loop is about to start. */
enum ForeachType { FOREACH_REGULAR, FOREACH_ACTIVE, FOREACH_UNIQUE };
void StartForeach(ForeachType ft);
void EndForeach();
/** Emit code for a 'break' statement in a loop. If doCoherenceCheck
@@ -186,6 +195,52 @@ public:
'continue' statement when going through the loop body in the
previous iteration. */
void RestoreContinuedLanes();
/** This method is called by code emitting IR for a loop. It clears
any lanes that contained a break since the mask has been updated to take
them into account. This is necessary as all the bail out checks for
breaks are meant to only deal with lanes breaking on the current iteration.
*/
void ClearBreakLanes();
/** Indicates that code generation for a "switch" statement is about to
start. isUniform indicates whether the "switch" value is uniform,
and bbAfterSwitch gives the basic block immediately following the
"switch" statement. (For example, if the switch condition is
uniform, we jump here upon executing a "break" statement.) */
void StartSwitch(bool isUniform, llvm::BasicBlock *bbAfterSwitch);
/** Indicates the end of code generation for a "switch" statement. */
void EndSwitch();
/** Emits code for a "switch" statement in the program.
@param expr Gives the value of the expression after the "switch"
@param defaultBlock Basic block to execute for the "default" case. This
should be NULL if there is no "default" label inside
the switch.
@param caseBlocks vector that stores the mapping from label values
after "case" statements to basic blocks corresponding
to the "case" labels.
@param nextBlocks For each basic block for a "case" or "default"
label, this gives the basic block for the
immediately-following "case" or "default" label (or
the basic block after the "switch" statement for the
last label.)
*/
void SwitchInst(llvm::Value *expr, llvm::BasicBlock *defaultBlock,
const std::vector<std::pair<int, llvm::BasicBlock *> > &caseBlocks,
const std::map<llvm::BasicBlock *, llvm::BasicBlock *> &nextBlocks);
/** Generates code for a "default" label after a "switch" statement.
The checkMask parameter indicates whether additional code should be
generated to check to see if the execution mask is all off after
the default label (in which case a jump to the following label will
be issued. */
void EmitDefaultLabel(bool checkMask, SourcePos pos);
/** Generates code for a "case" label after a "switch" statement. See
the documentation for EmitDefaultLabel() for discussion of the
checkMask parameter. */
void EmitCaseLabel(int value, bool checkMask, SourcePos pos);
/** Returns the current number of nested levels of 'varying' control
flow */
@@ -193,6 +248,15 @@ public:
bool InForeachLoop() const;
/** Temporarily disables emission of performance warnings from gathers
and scatters from subsequent code. */
void DisableGatherScatterWarnings();
/** Reenables emission of gather/scatter performance warnings. */
void EnableGatherScatterWarnings();
void SetContinueTarget(llvm::BasicBlock *bb) { continueTarget = bb; }
/** Step through the code and find label statements; create a basic
block for each one, so that subsequent calls to
GetLabeledBasicBlock() return the corresponding basic block. */
@@ -202,6 +266,10 @@ public:
new basic block that it starts. */
llvm::BasicBlock *GetLabeledBasicBlock(const std::string &label);
/** Returns a vector of all labels in the context. This is
simply the key set of the labelMap */
std::vector<std::string> GetLabels();
/** Called to generate code for 'return' statement; value is the
expression in the return statement (if non-NULL), and
doCoherenceCheck indicates whether instructions should be generated
@@ -211,7 +279,7 @@ public:
/** @} */
/** @name Small helper/utility routines
@{
@{
*/
/** Given a boolean mask value of type LLVMTypes::MaskType, return an
i1 value that indicates if any of the mask lanes are on. */
@@ -222,7 +290,11 @@ public:
llvm::Value *All(llvm::Value *mask);
/** Given a boolean mask value of type LLVMTypes::MaskType, return an
i32 value wherein the i'th bit is on if and only if the i'th lane
i1 value that indicates if all of the mask lanes are off. */
llvm::Value *None(llvm::Value *mask);
/** Given a boolean mask value of type LLVMTypes::MaskType, return an
i64 value wherein the i'th bit is on if and only if the i'th lane
of the mask is on. */
llvm::Value *LaneMask(llvm::Value *mask);
@@ -230,6 +302,18 @@ public:
that indicates whether the two masks are equal. */
llvm::Value *MasksAllEqual(llvm::Value *mask1, llvm::Value *mask2);
/** generate constantvector, which contains programindex, i.e.
< i32 0, i32 1, i32 2, i32 3> */
llvm::Value *ProgramIndexVector(bool is32bits = true);
#ifdef ISPC_NVPTX_ENABLED
llvm::Value *ProgramIndexVectorPTX(bool is32bits = true);
/** Issues a call to __insert_int8/int16/int32/int64/float/double */
llvm::Value* Insert(llvm::Value *vector, llvm::Value *lane, llvm::Value *scalar);
/** Issues a call to __extract_int8/int16/int32/int64/float/double */
llvm::Value* Extract(llvm::Value *vector, llvm::Value *lane);
#endif
/** Given a string, create an anonymous global variable to hold its
value and return the pointer to the string. */
llvm::Value *GetStringPtr(const std::string &str);
@@ -267,8 +351,13 @@ public:
llvm::Instruction for convenience; in calling code we often have
Instructions stored using Value pointers; the code here returns
silently if it's not actually given an instruction. */
void AddDebugPos(llvm::Value *instruction, const SourcePos *pos = NULL,
void AddDebugPos(llvm::Value *instruction, const SourcePos *pos = NULL,
#if ISPC_LLVM_VERSION <= ISPC_LLVM_3_6
llvm::DIScope *scope = NULL);
#else /* LLVM 3.7+ */
llvm::DIScope *scope = NULL);
//llvm::MDScope *scope = NULL );
#endif
/** Inform the debugging information generation code that a new scope
is starting in the source program. */
@@ -280,7 +369,11 @@ public:
/** Returns the llvm::DIScope corresponding to the current program
scope. */
#if ISPC_LLVM_VERSION <= ISPC_LLVM_3_6
llvm::DIScope GetDIScope() const;
#else // LLVM 3.7++
llvm::DIScope *GetDIScope() const;
#endif
/** Emits debugging information for the variable represented by
sym. */
@@ -288,7 +381,7 @@ public:
/** Emits debugging information for the function parameter represented
by sym. */
void EmitFunctionParameterDebugInfo(Symbol *sym);
void EmitFunctionParameterDebugInfo(Symbol *sym, int parameterNum);
/** @} */
/** @name IR instruction emission
@@ -296,7 +389,7 @@ public:
instructions. See the LLVM assembly language reference manual
(http://llvm.org/docs/LangRef.html) and the LLVM doxygen documentaion
(http://llvm.org/doxygen) for more information. Here we will only
document significant generalizations to the functionality of the
document significant generalizations to the functionality of the
corresponding basic LLVM instructions.
Beyond actually emitting the instruction, the implementations of
@@ -312,7 +405,7 @@ public:
this also handles applying the given operation to the vector
elements. */
llvm::Value *BinaryOperator(llvm::Instruction::BinaryOps inst,
llvm::Value *v0, llvm::Value *v1,
llvm::Value *v0, llvm::Value *v1,
const char *name = NULL);
/** Emit the "not" operator. Like BinaryOperator(), this also handles
@@ -322,7 +415,7 @@ public:
/** Emit a comparison instruction. If the operands are VectorTypes,
then a value for the corresponding boolean VectorType is
returned. */
llvm::Value *CmpInst(llvm::Instruction::OtherOps inst,
llvm::Value *CmpInst(llvm::Instruction::OtherOps inst,
llvm::CmpInst::Predicate pred,
llvm::Value *v0, llvm::Value *v1, const char *name = NULL);
@@ -330,25 +423,35 @@ public:
array, for pointer types). */
llvm::Value *SmearUniform(llvm::Value *value, const char *name = NULL);
llvm::Value *BitCastInst(llvm::Value *value, LLVM_TYPE_CONST llvm::Type *type,
llvm::Value *BitCastInst(llvm::Value *value, llvm::Type *type,
const char *name = NULL);
llvm::Value *PtrToIntInst(llvm::Value *value, const char *name = NULL);
llvm::Value *PtrToIntInst(llvm::Value *value, LLVM_TYPE_CONST llvm::Type *type,
llvm::Value *PtrToIntInst(llvm::Value *value, llvm::Type *type,
const char *name = NULL);
llvm::Value *IntToPtrInst(llvm::Value *value, LLVM_TYPE_CONST llvm::Type *type,
llvm::Value *IntToPtrInst(llvm::Value *value, llvm::Type *type,
const char *name = NULL);
llvm::Instruction *TruncInst(llvm::Value *value, LLVM_TYPE_CONST llvm::Type *type,
llvm::Instruction *TruncInst(llvm::Value *value, llvm::Type *type,
const char *name = NULL);
llvm::Instruction *CastInst(llvm::Instruction::CastOps op, llvm::Value *value,
LLVM_TYPE_CONST llvm::Type *type, const char *name = NULL);
llvm::Instruction *FPCastInst(llvm::Value *value, LLVM_TYPE_CONST llvm::Type *type,
llvm::Type *type, const char *name = NULL);
llvm::Instruction *FPCastInst(llvm::Value *value, llvm::Type *type,
const char *name = NULL);
llvm::Instruction *SExtInst(llvm::Value *value, LLVM_TYPE_CONST llvm::Type *type,
llvm::Instruction *SExtInst(llvm::Value *value, llvm::Type *type,
const char *name = NULL);
llvm::Instruction *ZExtInst(llvm::Value *value, LLVM_TYPE_CONST llvm::Type *type,
llvm::Instruction *ZExtInst(llvm::Value *value, llvm::Type *type,
const char *name = NULL);
/** Given two integer-typed values (but possibly one vector and the
other not, and or of possibly-different bit-widths), update their
values as needed so that the two have the same (more general)
type. */
void MatchIntegerTypes(llvm::Value **v0, llvm::Value **v1);
/** Create a new slice pointer out of the given pointer to an soa type
and an integer offset to a slice within that type. */
llvm::Value *MakeSlicePointer(llvm::Value *ptr, llvm::Value *offset);
/** These GEP methods are generalizations of the standard ones in LLVM;
they support both uniform and varying basePtr values as well as
uniform and varying index values (arrays of indices). Varying base
@@ -369,7 +472,8 @@ public:
the type of the pointer, though it may be NULL if the base pointer
is uniform. */
llvm::Value *AddElementOffset(llvm::Value *basePtr, int elementNum,
const Type *ptrType, const char *name = NULL);
const Type *ptrType, const char *name = NULL,
const PointerType **resultPtrType = NULL);
/** Load from the memory location(s) given by lvalue, using the given
mask. The lvalue may be varying, in which case this corresponds to
@@ -377,7 +481,8 @@ public:
pointer values given by the lvalue. If the lvalue is not varying,
then both the mask pointer and the type pointer may be NULL. */
llvm::Value *LoadInst(llvm::Value *ptr, llvm::Value *mask,
const Type *ptrType, const char *name = NULL);
const Type *ptrType, const char *name = NULL,
bool one_elem = false);
llvm::Value *LoadInst(llvm::Value *ptr, const char *name = NULL);
@@ -386,9 +491,9 @@ public:
allocated at the given alignment. By default, the alloca
instruction is added at the start of the function in the entry
basic block; if it should be added to the current basic block, then
the atEntryBlock parameter should be false. */
llvm::Value *AllocaInst(LLVM_TYPE_CONST llvm::Type *llvmType,
const char *name = NULL, int align = 0,
the atEntryBlock parameter should be false. */
llvm::Value *AllocaInst(llvm::Type *llvmType,
const char *name = NULL, int align = 0,
bool atEntryBlock = true);
/** Standard store instruction; for this variant, the lvalue must be a
@@ -400,7 +505,14 @@ public:
varying, the given storeMask is used to mask the stores so that
they only execute for the active program instances. */
void StoreInst(llvm::Value *value, llvm::Value *ptr,
llvm::Value *storeMask, const Type *ptrType);
llvm::Value *storeMask, const Type *valueType,
const Type *ptrType);
/** Copy count bytes of memory from the location pointed to by src to
the location pointed to by dest. (src and dest must not be
overlapping.) */
void MemcpyInst(llvm::Value *dest, llvm::Value *src, llvm::Value *count,
llvm::Value *align = NULL);
void BranchInst(llvm::BasicBlock *block);
void BranchInst(llvm::BasicBlock *trueBlock, llvm::BasicBlock *falseBlock,
@@ -414,10 +526,20 @@ public:
/** This convenience method maps to an llvm::InsertElementInst if the
given value is a llvm::VectorType, and to an llvm::InsertValueInst
otherwise. */
llvm::Value *InsertInst(llvm::Value *v, llvm::Value *eltVal, int elt,
llvm::Value *InsertInst(llvm::Value *v, llvm::Value *eltVal, int elt,
const char *name = NULL);
llvm::PHINode *PhiNode(LLVM_TYPE_CONST llvm::Type *type, int count,
/** This convenience method maps to an llvm::ShuffleVectorInst. */
llvm::Value *ShuffleInst(llvm::Value *v1, llvm::Value *v2, llvm::Value *mask,
const char *name = NULL);
/** This convenience method to generate broadcast pattern. It takes a value
and a vector type. Type of the value must match element type of the
vector. */
llvm::Value *BroadcastValue(llvm::Value *v, llvm::Type *vecType,
const char *name = NULL);
llvm::PHINode *PhiNode(llvm::Type *type, int count,
const char *name = NULL);
llvm::Instruction *SelectInst(llvm::Value *test, llvm::Value *val0,
llvm::Value *val1, const char *name = NULL);
@@ -443,9 +565,9 @@ public:
/** Launch an asynchronous task to run the given function, passing it
he given argument values. */
llvm::Value *LaunchInst(llvm::Value *callee,
llvm::Value *LaunchInst(llvm::Value *callee,
std::vector<llvm::Value *> &argVals,
llvm::Value *launchCount);
llvm::Value *launchCount[3]);
void SyncInst();
@@ -488,14 +610,14 @@ private:
for error messages and debugging symbols. */
SourcePos funcStartPos;
/** If currently in a loop body, the value of the mask at the start of
the loop. */
llvm::Value *loopMask;
/** If currently in a loop body or switch statement, the value of the
mask at the start of it. */
llvm::Value *blockEntryMask;
/** If currently in a loop body, this is a pointer to memory to store a
mask value that represents which of the lanes have executed a
'break' statement. If we're not in a loop body, this should be
NULL. */
/** If currently in a loop body or switch statement, this is a pointer
to memory to store a mask value that represents which of the lanes
have executed a 'break' statement. If we're not in a loop body or
switch, this should be NULL. */
llvm::Value *breakLanesPtr;
/** Similar to breakLanesPtr, if we're inside a loop, this is a pointer
@@ -503,16 +625,49 @@ private:
'continue' statement. */
llvm::Value *continueLanesPtr;
/** If we're inside a loop, this gives the basic block immediately
after the current loop, which we will jump to if all of the lanes
have executed a break statement or are otherwise done with the
loop. */
/** If we're inside a loop or switch statement, this gives the basic
block immediately after the current loop or switch, which we will
jump to if all of the lanes have executed a break statement or are
otherwise done with it. */
llvm::BasicBlock *breakTarget;
/** If we're inside a loop, this gives the block to jump to if all of
the running lanes have executed a 'continue' statement. */
llvm::BasicBlock *continueTarget;
/** @name Switch statement state
These variables store various state that's active when we're
generating code for a switch statement. They should all be NULL
outside of a switch.
@{
*/
/** The value of the expression used to determine which case in the
statements after the switch to execute. */
llvm::Value *switchExpr;
/** Map from case label numbers to the basic block that will hold code
for that case. */
const std::vector<std::pair<int, llvm::BasicBlock *> > *caseBlocks;
/** The basic block of code to run for the "default" label in the
switch statement. */
llvm::BasicBlock *defaultBlock;
/** For each basic block for the code for cases (and the default label,
if present), this map gives the basic block for the immediately
following case/default label. */
const std::map<llvm::BasicBlock *, llvm::BasicBlock *> *nextBlocks;
/** Records whether the switch condition was uniform; this is a
distinct notion from whether the switch represents uniform or
varying control flow; we may have varying control flow from a
uniform switch condition if there is a 'break' inside the switch
that's under varying control flow. */
bool switchConditionWasUniform;
/** @} */
/** A pointer to memory that records which of the program instances
have executed a 'return' statement (and are thus really truly done
running any more instructions in this functions. */
@@ -530,17 +685,31 @@ private:
emitted. */
std::vector<CFInfo *> controlFlowInfo;
#if ISPC_LLVM_VERSION <= ISPC_LLVM_3_6
/** DIFile object corresponding to the source file where the current
function was defined (used for debugging info0. */
function was defined (used for debugging info). */
llvm::DIFile diFile;
/** DISubprogram corresponding to this function (used for debugging
info). */
llvm::DISubprogram diFunction;
llvm::DISubprogram diSubprogram;
/** These correspond to the current set of nested scopes in the
function. */
std::vector<llvm::DILexicalBlock> debugScopes;
#else // LLVM 3.7++
/** DIFile object corresponding to the source file where the current
function was defined (used for debugging info). */
llvm::DIFile *diFile;
/** DISubprogram corresponding to this function (used for debugging
info). */
llvm::DISubprogram *diSubprogram;
/** These correspond to the current set of nested scopes in the
function. */
std::vector<llvm::DIScope *> debugScopes;
#endif
/** True if a 'launch' statement has been encountered in the function. */
bool launchedTasks;
@@ -550,27 +719,43 @@ private:
tasks launched from the current function. */
llvm::Value *launchGroupHandlePtr;
/** Nesting count of the number of times calling code has disabled (and
not yet reenabled) gather/scatter performance warnings. */
int disableGSWarningCount;
std::map<std::string, llvm::BasicBlock *> labelMap;
static bool initLabelBBlocks(ASTNode *node, void *data);
llvm::Value *pointerVectorToVoidPointers(llvm::Value *value);
static void addGSMetadata(llvm::Value *inst, SourcePos pos);
bool ifsInLoopAllUniform() const;
bool ifsInCFAllUniform(int cfType) const;
void jumpIfAllLoopLanesAreDone(llvm::BasicBlock *target);
llvm::Value *emitGatherCallback(llvm::Value *lvalue, llvm::Value *retPtr);
llvm::Value *applyVaryingGEP(llvm::Value *basePtr, llvm::Value *index,
llvm::Value *applyVaryingGEP(llvm::Value *basePtr, llvm::Value *index,
const Type *ptrType);
void restoreMaskGivenReturns(llvm::Value *oldMask);
void addSwitchMaskCheck(llvm::Value *mask);
bool inSwitchStatement() const;
llvm::Value *getMaskAtSwitchEntry();
void scatter(llvm::Value *value, llvm::Value *ptr, const Type *ptrType,
llvm::Value *mask);
CFInfo *popCFState();
void scatter(llvm::Value *value, llvm::Value *ptr, const Type *valueType,
const Type *ptrType, llvm::Value *mask);
void maskedStore(llvm::Value *value, llvm::Value *ptr, const Type *ptrType,
llvm::Value *mask);
llvm::Value *gather(llvm::Value *ptr, const Type *ptrType, llvm::Value *mask,
const char *name);
void storeUniformToSOA(llvm::Value *value, llvm::Value *ptr,
llvm::Value *mask, const Type *valueType,
const PointerType *ptrType);
llvm::Value *loadUniformFromSOA(llvm::Value *ptr, llvm::Value *mask,
const PointerType *ptrType, const char *name);
llvm::Value *gather(llvm::Value *ptr, const PointerType *ptrType,
llvm::Value *mask, const char *name);
llvm::Value *addVaryingOffsetsIfNeeded(llvm::Value *ptr, const Type *ptrType);
};

638
decl.cpp
View File

@@ -1,5 +1,5 @@
/*
Copyright (c) 2010-2011, Intel Corporation
Copyright (c) 2010-2013, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -28,12 +28,12 @@
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file decl.cpp
@brief Implementations of classes related to turning declarations into
symbols and types.
@brief Implementations of classes related to turning declarations into
symbol names and types.
*/
#include "decl.h"
@@ -44,6 +44,7 @@
#include "stmt.h"
#include "expr.h"
#include <stdio.h>
#include <string.h>
#include <set>
static void
@@ -55,26 +56,45 @@ lPrintTypeQualifiers(int typeQualifiers) {
if (typeQualifiers & TYPEQUAL_TASK) printf("task ");
if (typeQualifiers & TYPEQUAL_SIGNED) printf("signed ");
if (typeQualifiers & TYPEQUAL_UNSIGNED) printf("unsigned ");
if (typeQualifiers & TYPEQUAL_EXPORT) printf("export ");
if (typeQualifiers & TYPEQUAL_UNMASKED) printf("unmasked ");
}
/** Given a Type and a set of type qualifiers, apply the type qualifiers to
the type, returning the type that is the result.
the type, returning the type that is the result.
*/
static const Type *
lApplyTypeQualifiers(int typeQualifiers, const Type *type, SourcePos pos) {
if (type == NULL)
return NULL;
if ((typeQualifiers & TYPEQUAL_CONST) != 0)
if ((typeQualifiers & TYPEQUAL_CONST) != 0) {
type = type->GetAsConstType();
}
if ((typeQualifiers & TYPEQUAL_UNIFORM) != 0)
type = type->GetAsUniformType();
else if ((typeQualifiers & TYPEQUAL_VARYING) != 0)
type = type->GetAsVaryingType();
else
type = type->GetAsUnboundVariabilityType();
if ( ((typeQualifiers & TYPEQUAL_UNIFORM) != 0)
&& ((typeQualifiers & TYPEQUAL_VARYING) != 0) ) {
Error(pos, "Type \"%s\" cannot be qualified with both uniform and varying.",
type->GetString().c_str());
}
if ((typeQualifiers & TYPEQUAL_UNIFORM) != 0) {
if (type->IsVoidType())
Error(pos, "\"uniform\" qualifier is illegal with \"void\" type.");
else
type = type->GetAsUniformType();
}
else if ((typeQualifiers & TYPEQUAL_VARYING) != 0) {
if (type->IsVoidType())
Error(pos, "\"varying\" qualifier is illegal with \"void\" type.");
else
type = type->GetAsVaryingType();
}
else {
if (type->IsVoidType() == false)
type = type->GetAsUnboundVariabilityType();
}
if ((typeQualifiers & TYPEQUAL_UNSIGNED) != 0) {
if ((typeQualifiers & TYPEQUAL_SIGNED) != 0)
@@ -84,15 +104,20 @@ lApplyTypeQualifiers(int typeQualifiers, const Type *type, SourcePos pos) {
const Type *unsignedType = type->GetAsUnsignedType();
if (unsignedType != NULL)
type = unsignedType;
else
else {
const Type *resolvedType =
type->ResolveUnboundVariability(Variability::Varying);
Error(pos, "\"unsigned\" qualifier is illegal with \"%s\" type.",
type->ResolveUnboundVariability(Type::Varying)->GetString().c_str());
resolvedType->GetString().c_str());
}
}
if ((typeQualifiers & TYPEQUAL_SIGNED) != 0 && type->IsIntType() == false)
if ((typeQualifiers & TYPEQUAL_SIGNED) != 0 && type->IsIntType() == false) {
const Type *resolvedType =
type->ResolveUnboundVariability(Variability::Varying);
Error(pos, "\"signed\" qualifier is illegal with non-integer type "
"\"%s\".",
type->ResolveUnboundVariability(Type::Varying)->GetString().c_str());
"\"%s\".", resolvedType->GetString().c_str());
}
return type;
}
@@ -107,23 +132,84 @@ DeclSpecs::DeclSpecs(const Type *t, StorageClass sc, int tq) {
typeQualifiers = tq;
soaWidth = 0;
vectorSize = 0;
if (t != NULL) {
if (m->symbolTable->ContainsType(t)) {
// Typedefs might have uniform/varying qualifiers inside.
if (t->IsVaryingType()) {
typeQualifiers |= TYPEQUAL_VARYING;
}
else if (t->IsUniformType()) {
typeQualifiers |= TYPEQUAL_UNIFORM;
}
}
}
}
const Type *
DeclSpecs::GetBaseType(SourcePos pos) const {
const Type *bt = baseType;
const Type *retType = baseType;
if (retType == NULL) {
Warning(pos, "No type specified in declaration. Assuming int32.");
retType = AtomicType::UniformInt32->GetAsUnboundVariabilityType();
}
if (vectorSize > 0) {
const AtomicType *atomicType = dynamic_cast<const AtomicType *>(bt);
const AtomicType *atomicType = CastType<AtomicType>(retType);
if (atomicType == NULL) {
Error(pos, "Only atomic types (int, float, ...) are legal for vector "
"types.");
return NULL;
}
bt = new VectorType(atomicType, vectorSize);
retType = new VectorType(atomicType, vectorSize);
}
return lApplyTypeQualifiers(typeQualifiers, bt, pos);
retType = lApplyTypeQualifiers(typeQualifiers, retType, pos);
if (soaWidth > 0) {
#ifdef ISPC_NVPTX_ENABLED
#if 0 /* see stmt.cpp in DeclStmt::EmitCode for work-around of SOAType Declaration */
if (g->target->getISA() == Target::NVPTX)
{
Error(pos, "\"soa\" data types are currently not supported with \"nvptx\" target.");
return NULL;
}
#endif
#endif /* ISPC_NVPTX_ENABLED */
const StructType *st = CastType<StructType>(retType);
if (st == NULL) {
Error(pos, "Illegal to provide soa<%d> qualifier with non-struct "
"type \"%s\".", soaWidth, retType->GetString().c_str());
return NULL;
}
else if (soaWidth <= 0 || (soaWidth & (soaWidth - 1)) != 0) {
Error(pos, "soa<%d> width illegal. Value must be positive power "
"of two.", soaWidth);
return NULL;
}
if (st->IsUniformType()) {
Error(pos, "\"uniform\" qualifier and \"soa<%d>\" qualifier can't "
"both be used in a type declaration.", soaWidth);
return NULL;
}
else if (st->IsVaryingType()) {
Error(pos, "\"varying\" qualifier and \"soa<%d>\" qualifier can't "
"both be used in a type declaration.", soaWidth);
return NULL;
}
else
retType = st->GetAsSOAType(soaWidth);
if (soaWidth < g->target->getVectorWidth())
PerformanceWarning(pos, "soa<%d> width smaller than gang size %d "
"currently leads to inefficient code to access "
"soa types.", soaWidth, g->target->getVectorWidth());
}
return retType;
}
@@ -133,7 +219,6 @@ lGetStorageClassName(StorageClass storageClass) {
case SC_NONE: return "";
case SC_EXTERN: return "extern";
case SC_EXTERN_C: return "extern \"C\"";
case SC_EXPORT: return "export";
case SC_STATIC: return "static";
case SC_TYPEDEF: return "typedef";
default: FATAL("Unhandled storage class in lGetStorageClassName");
@@ -158,35 +243,35 @@ DeclSpecs::Print() const {
///////////////////////////////////////////////////////////////////////////
// Declarator
Declarator::Declarator(DeclaratorKind dk, SourcePos p)
: pos(p), kind(dk) {
Declarator::Declarator(DeclaratorKind dk, SourcePos p)
: pos(p), kind(dk) {
child = NULL;
typeQualifiers = 0;
storageClass = SC_NONE;
arraySize = -1;
sym = NULL;
type = NULL;
initExpr = NULL;
}
void
Declarator::InitFromDeclSpecs(DeclSpecs *ds) {
const Type *t = GetType(ds);
Symbol *sym = GetSymbol();
if (sym != NULL) {
sym->type = t;
sym->storageClass = ds->storageClass;
const Type *baseType = ds->GetBaseType(pos);
InitFromType(baseType, ds);
if (type == NULL) {
AssertPos(pos, m->errorCount > 0);
return;
}
}
storageClass = ds->storageClass;
Symbol *
Declarator::GetSymbol() const {
// The symbol lives at the last child in the chain, so walk down there
// and return the one there.
const Declarator *d = this;
while (d->child != NULL)
d = d->child;
return d->sym;
if (ds->declSpecList.size() > 0 &&
CastType<FunctionType>(type) == NULL) {
Error(pos, "__declspec specifiers for non-function type \"%s\" are "
"not used.", type->GetString().c_str());
}
}
@@ -196,11 +281,11 @@ Declarator::Print(int indent) const {
pos.Print();
lPrintTypeQualifiers(typeQualifiers);
Symbol *sym = GetSymbol();
if (sym != NULL)
printf("%s", sym->name.c_str());
printf("%s ", lGetStorageClassName(storageClass));
if (name.size() > 0)
printf("%s", name.c_str());
else
printf("(null symbol)");
printf("(unnamed)");
printf(", array size = %d", arraySize);
@@ -234,115 +319,121 @@ Declarator::Print(int indent) const {
}
Symbol *
Declarator::GetFunctionInfo(DeclSpecs *ds, std::vector<Symbol *> *funArgs) {
const FunctionType *type =
dynamic_cast<const FunctionType *>(GetType(ds));
if (type == NULL)
return NULL;
Symbol *declSym = GetSymbol();
Assert(declSym != NULL);
// Get the symbol for the function from the symbol table. (It should
// already have been added to the symbol table by AddGlobal() by the
// time we get here.)
Symbol *funSym = m->symbolTable->LookupFunction(declSym->name.c_str(), type);
if (funSym != NULL)
// May be NULL due to error earlier in compilation
funSym->pos = pos;
// Walk down to the declarator for the function. (We have to get past
// the stuff that specifies the function's return type before we get to
// the function's declarator.)
Declarator *d = this;
while (d != NULL && d->kind != DK_FUNCTION)
d = d->child;
Assert(d != NULL);
for (unsigned int i = 0; i < d->functionParams.size(); ++i) {
Symbol *sym = d->GetSymbolForFunctionParameter(i);
sym->type = sym->type->ResolveUnboundVariability(Type::Varying);
funArgs->push_back(sym);
}
funSym->type = funSym->type->ResolveUnboundVariability(Type::Varying);
return funSym;
}
const Type *
Declarator::GetType(const Type *base, DeclSpecs *ds) const {
void
Declarator::InitFromType(const Type *baseType, DeclSpecs *ds) {
bool hasUniformQual = ((typeQualifiers & TYPEQUAL_UNIFORM) != 0);
bool hasVaryingQual = ((typeQualifiers & TYPEQUAL_VARYING) != 0);
bool isTask = ((typeQualifiers & TYPEQUAL_TASK) != 0);
bool isExported = ((typeQualifiers & TYPEQUAL_EXPORT) != 0);
bool isConst = ((typeQualifiers & TYPEQUAL_CONST) != 0);
bool isUnmasked = ((typeQualifiers & TYPEQUAL_UNMASKED) != 0);
if (hasUniformQual && hasVaryingQual) {
Error(pos, "Can't provide both \"uniform\" and \"varying\" qualifiers.");
return NULL;
return;
}
if (kind != DK_FUNCTION && isTask)
if (kind != DK_FUNCTION && isTask) {
Error(pos, "\"task\" qualifier illegal in variable declaration.");
return;
}
if (kind != DK_FUNCTION && isUnmasked) {
Error(pos, "\"unmasked\" qualifier illegal in variable declaration.");
return;
}
if (kind != DK_FUNCTION && isExported) {
Error(pos, "\"export\" qualifier illegal in variable declaration.");
return;
}
Type::Variability variability = Type::Unbound;
Variability variability(Variability::Unbound);
if (hasUniformQual)
variability = Type::Uniform;
variability = Variability::Uniform;
else if (hasVaryingQual)
variability = Type::Varying;
variability = Variability::Varying;
const Type *type = base;
switch (kind) {
case DK_BASE:
if (kind == DK_BASE) {
// All of the type qualifiers should be in the DeclSpecs for the
// base declarator
Assert(typeQualifiers == 0);
Assert(child == NULL);
return type;
case DK_POINTER:
type = new PointerType(type, variability, isConst);
if (child != NULL)
return child->GetType(type, ds);
AssertPos(pos, typeQualifiers == 0);
AssertPos(pos, child == NULL);
type = baseType;
}
else if (kind == DK_POINTER) {
/* For now, any pointer to an SOA type gets the slice property; if
we add the capability to declare pointers as slices or not,
we'll want to set this based on a type qualifier here. */
const Type *ptrType = new PointerType(baseType, variability, isConst,
baseType->IsSOAType());
if (child != NULL) {
child->InitFromType(ptrType, ds);
type = child->type;
name = child->name;
}
else
return type;
break;
case DK_REFERENCE:
if (hasUniformQual)
type = ptrType;
}
else if (kind == DK_REFERENCE) {
if (hasUniformQual) {
Error(pos, "\"uniform\" qualifier is illegal to apply to references.");
if (hasVaryingQual)
return;
}
if (hasVaryingQual) {
Error(pos, "\"varying\" qualifier is illegal to apply to references.");
if (isConst)
return;
}
if (isConst) {
Error(pos, "\"const\" qualifier is to illegal apply to references.");
return;
}
// The parser should disallow this already, but double check.
if (dynamic_cast<const ReferenceType *>(type) != NULL) {
if (CastType<ReferenceType>(baseType) != NULL) {
Error(pos, "References to references are illegal.");
return NULL;
return;
}
type = new ReferenceType(type);
if (child != NULL)
return child->GetType(type, ds);
const Type *refType = new ReferenceType(baseType);
if (child != NULL) {
child->InitFromType(refType, ds);
type = child->type;
name = child->name;
}
else
return type;
break;
type = refType;
}
else if (kind == DK_ARRAY) {
if (baseType->IsVoidType()) {
Error(pos, "Arrays of \"void\" type are illegal.");
return;
}
if (CastType<ReferenceType>(baseType)) {
Error(pos, "Arrays of references (type \"%s\") are illegal.",
baseType->GetString().c_str());
return;
}
case DK_ARRAY:
type = new ArrayType(type, arraySize);
if (child)
return child->GetType(type, ds);
#ifdef ISPC_NVPTX_ENABLED
#if 0 /* NVPTX */
if (baseType->IsUniformType())
{
fprintf(stderr, " detected uniform array of size= %d array= %s\n" ,arraySize,
baseType->IsArrayType() ? " true " : " false ");
}
#endif
#endif /* ISPC_NVPTX_ENABLED */
const Type *arrayType = new ArrayType(baseType, arraySize);
if (child != NULL) {
child->InitFromType(arrayType, ds);
type = child->type;
name = child->name;
}
else
return type;
break;
case DK_FUNCTION: {
std::vector<const Type *> args;
std::vector<std::string> argNames;
std::vector<ConstExpr *> argDefaults;
std::vector<SourcePos> argPos;
type = arrayType;
}
else if (kind == DK_FUNCTION) {
llvm::SmallVector<const Type *, 8> args;
llvm::SmallVector<std::string, 8> argNames;
llvm::SmallVector<Expr *, 8> argDefaults;
llvm::SmallVector<SourcePos, 8> argPos;
// Loop over the function arguments and store the names, types,
// default values (if any), and source file positions each one in
@@ -350,15 +441,44 @@ Declarator::GetType(const Type *base, DeclSpecs *ds) const {
for (unsigned int i = 0; i < functionParams.size(); ++i) {
Declaration *d = functionParams[i];
Symbol *sym = GetSymbolForFunctionParameter(i);
if (d == NULL) {
AssertPos(pos, m->errorCount > 0);
continue;
}
if (d->declarators.size() == 0) {
// function declaration like foo(float), w/o a name for the
// parameter; wire up a placeholder Declarator for it
d->declarators.push_back(new Declarator(DK_BASE, pos));
d->declarators[0]->InitFromDeclSpecs(d->declSpecs);
}
AssertPos(pos, d->declarators.size() == 1);
Declarator *decl = d->declarators[0];
if (decl == NULL || decl->type == NULL) {
AssertPos(pos, m->errorCount > 0);
continue;
}
if (decl->name == "") {
// Give a name to any anonymous parameter declarations
char buf[32];
sprintf(buf, "__anon_parameter_%d", i);
decl->name = buf;
}
decl->type = decl->type->ResolveUnboundVariability(Variability::Varying);
if (d->declSpecs->storageClass != SC_NONE)
Error(sym->pos, "Storage class \"%s\" is illegal in "
"function parameter declaration for parameter \"%s\".",
Error(decl->pos, "Storage class \"%s\" is illegal in "
"function parameter declaration for parameter \"%s\".",
lGetStorageClassName(d->declSpecs->storageClass),
sym->name.c_str());
decl->name.c_str());
if (decl->type->IsVoidType()) {
Error(decl->pos, "Parameter with type \"void\" illegal in function "
"parameter list.");
decl->type = NULL;
}
const ArrayType *at = dynamic_cast<const ArrayType *>(sym->type);
const ArrayType *at = CastType<ArrayType>(decl->type);
if (at != NULL) {
// As in C, arrays are passed to functions as pointers to
// their element type. We'll just immediately make this
@@ -368,144 +488,124 @@ Declarator::GetType(const Type *base, DeclSpecs *ds) const {
// report this differently than it was originally declared
// in the function, but it's not clear that this is a
// significant problem.)
sym->type = PointerType::GetUniform(at->GetElementType());
const Type *targetType = at->GetElementType();
if (targetType == NULL) {
AssertPos(pos, m->errorCount > 0);
return;
}
decl->type = PointerType::GetUniform(targetType, at->IsSOAType());
// Make sure there are no unsized arrays (other than the
// first dimension) in function parameter lists.
at = dynamic_cast<const ArrayType *>(at->GetElementType());
at = CastType<ArrayType>(targetType);
while (at != NULL) {
if (at->GetElementCount() == 0)
Error(sym->pos, "Arrays with unsized dimensions in "
Error(decl->pos, "Arrays with unsized dimensions in "
"dimensions after the first one are illegal in "
"function parameter lists.");
at = dynamic_cast<const ArrayType *>(at->GetElementType());
at = CastType<ArrayType>(at->GetElementType());
}
}
args.push_back(sym->type);
argNames.push_back(sym->name);
argPos.push_back(sym->pos);
args.push_back(decl->type);
argNames.push_back(decl->name);
argPos.push_back(decl->pos);
ConstExpr *init = NULL;
if (d->declarators.size()) {
// Try to find an initializer expression; if there is one,
// it lives down to the base declarator.
Declarator *decl = d->declarators[0];
while (decl->child != NULL) {
Assert(decl->initExpr == NULL);
Expr *init = NULL;
// Try to find an initializer expression.
while (decl != NULL) {
if (decl->initExpr != NULL) {
decl->initExpr = TypeCheck(decl->initExpr);
decl->initExpr = Optimize(decl->initExpr);
if (decl->initExpr != NULL) {
init = llvm::dyn_cast<ConstExpr>(decl->initExpr);
if (init == NULL)
init = llvm::dyn_cast<NullPointerExpr>(decl->initExpr);
if (init == NULL)
Error(decl->initExpr->pos, "Default value for parameter "
"\"%s\" must be a compile-time constant.",
decl->name.c_str());
}
break;
}
else
decl = decl->child;
}
if (decl->initExpr != NULL &&
(decl->initExpr = TypeCheck(decl->initExpr)) != NULL &&
(decl->initExpr = Optimize(decl->initExpr)) != NULL &&
(init = dynamic_cast<ConstExpr *>(decl->initExpr)) == NULL) {
Error(decl->initExpr->pos, "Default value for parameter "
"\"%s\" must be a compile-time constant.",
sym->name.c_str());
}
}
argDefaults.push_back(init);
}
const Type *returnType = type;
const Type *returnType = baseType;
if (returnType == NULL) {
Error(pos, "No return type provided in function declaration.");
return NULL;
return;
}
bool isExported = ds && (ds->storageClass == SC_EXPORT);
if (CastType<FunctionType>(returnType) != NULL) {
Error(pos, "Illegal to return function type from function.");
return;
}
returnType = returnType->ResolveUnboundVariability(Variability::Varying);
bool isExternC = ds && (ds->storageClass == SC_EXTERN_C);
bool isExported = ds && ((ds->typeQualifiers & TYPEQUAL_EXPORT) != 0);
bool isTask = ds && ((ds->typeQualifiers & TYPEQUAL_TASK) != 0);
bool isUnmasked = ds && ((ds->typeQualifiers & TYPEQUAL_UNMASKED) != 0);
if (isExported && isTask) {
Error(pos, "Function can't have both \"task\" and \"export\" "
"qualifiers");
return NULL;
return;
}
if (isExternC && isTask) {
Error(pos, "Function can't have both \"extern \"C\"\" and \"task\" "
"qualifiers");
return NULL;
return;
}
if (isExternC && isExported) {
Error(pos, "Function can't have both \"extern \"C\"\" and \"export\" "
"qualifiers");
return NULL;
return;
}
if (isUnmasked && isExported)
Warning(pos, "\"unmasked\" qualifier is redundant for exported "
"functions.");
if (child == NULL) {
AssertPos(pos, m->errorCount > 0);
return;
}
const Type *functionType =
const FunctionType *functionType =
new FunctionType(returnType, args, argNames, argDefaults,
argPos, isTask, isExported, isExternC);
functionType = functionType->ResolveUnboundVariability(Type::Varying);
return child->GetType(functionType, ds);
}
default:
FATAL("Unexpected decl kind");
return NULL;
}
argPos, isTask, isExported, isExternC, isUnmasked);
#if 0
// Make sure we actually have an array of structs ..
const StructType *childStructType =
dynamic_cast<const StructType *>(childType);
if (childStructType == NULL) {
Error(pos, "Illegal to provide soa<%d> qualifier with non-struct "
"type \"%s\".", soaWidth, childType->GetString().c_str());
return new ArrayType(childType, arraySize == -1 ? 0 : arraySize);
// handle any explicit __declspecs on the function
if (ds != NULL) {
for (int i = 0; i < (int)ds->declSpecList.size(); ++i) {
std::string str = ds->declSpecList[i].first;
SourcePos pos = ds->declSpecList[i].second;
if (str == "safe")
(const_cast<FunctionType *>(functionType))->isSafe = true;
else if (!strncmp(str.c_str(), "cost", 4)) {
int cost = atoi(str.c_str() + 4);
if (cost < 0)
Error(pos, "Negative function cost %d is illegal.",
cost);
(const_cast<FunctionType *>(functionType))->costOverride = cost;
}
else
Error(pos, "__declspec parameter \"%s\" unknown.", str.c_str());
}
else if ((soaWidth & (soaWidth - 1)) != 0) {
Error(pos, "soa<%d> width illegal. Value must be power of two.",
soaWidth);
return NULL;
}
else if (arraySize != -1 && (arraySize % soaWidth) != 0) {
Error(pos, "soa<%d> width must evenly divide array size %d.",
soaWidth, arraySize);
return NULL;
}
return new SOAArrayType(childStructType, arraySize == -1 ? 0 : arraySize,
soaWidth);
#endif
}
const Type *
Declarator::GetType(DeclSpecs *ds) const {
const Type *baseType = ds->GetBaseType(pos);
const Type *type = GetType(baseType, ds);
return type;
}
Symbol *
Declarator::GetSymbolForFunctionParameter(int paramNum) const {
Assert(paramNum < (int)functionParams.size());
Declaration *d = functionParams[paramNum];
char buf[32];
Symbol *sym;
if (d->declarators.size() == 0) {
// function declaration like foo(float), w/o a name for
// the parameter
sprintf(buf, "__anon_parameter_%d", paramNum);
sym = new Symbol(buf, pos);
sym->type = d->declSpecs->GetBaseType(pos);
}
else {
Assert(d->declarators.size() == 1);
sym = d->declarators[0]->GetSymbol();
if (sym == NULL) {
// Handle more complex anonymous declarations like
// float (float **).
sprintf(buf, "__anon_parameter_%d", paramNum);
sym = new Symbol(buf, d->declarators[0]->pos);
sym->type = d->declarators[0]->GetType(d->declSpecs);
}
}
return sym;
}
child->InitFromType(functionType, ds);
type = child->type;
name = child->name;
}
}
///////////////////////////////////////////////////////////////////////////
// Declaration
@@ -529,6 +629,7 @@ Declaration::Declaration(DeclSpecs *ds, Declarator *d) {
}
std::vector<VariableDeclaration>
Declaration::GetVariableDeclarations() const {
Assert(declSpecs->storageClass != SC_TYPEDEF);
@@ -536,18 +637,23 @@ Declaration::GetVariableDeclarations() const {
for (unsigned int i = 0; i < declarators.size(); ++i) {
Declarator *decl = declarators[i];
if (decl == NULL)
if (decl == NULL || decl->type == NULL) {
// Ignore earlier errors
Assert(m->errorCount > 0);
continue;
}
Symbol *sym = decl->GetSymbol();
sym->type = sym->type->ResolveUnboundVariability(Type::Varying);
if (dynamic_cast<const FunctionType *>(sym->type) == NULL) {
if (decl->type->IsVoidType())
Error(decl->pos, "\"void\" type variable illegal in declaration.");
else if (CastType<FunctionType>(decl->type) == NULL) {
decl->type = decl->type->ResolveUnboundVariability(Variability::Varying);
Symbol *sym = new Symbol(decl->name, decl->pos, decl->type,
decl->storageClass);
m->symbolTable->AddVariable(sym);
vars.push_back(VariableDeclaration(sym, decl->initExpr));
}
}
return vars;
}
@@ -558,18 +664,19 @@ Declaration::DeclareFunctions() {
for (unsigned int i = 0; i < declarators.size(); ++i) {
Declarator *decl = declarators[i];
if (decl == NULL)
if (decl == NULL || decl->type == NULL) {
// Ignore earlier errors
Assert(m->errorCount > 0);
continue;
}
Symbol *sym = decl->GetSymbol();
sym->type = sym->type->ResolveUnboundVariability(Type::Varying);
if (dynamic_cast<const FunctionType *>(sym->type) == NULL)
const FunctionType *ftype = CastType<FunctionType>(decl->type);
if (ftype == NULL)
continue;
bool isInline = (declSpecs->typeQualifiers & TYPEQUAL_INLINE);
m->AddFunctionDeclaration(sym, isInline);
m->AddFunctionDeclaration(decl->name, ftype, decl->storageClass,
isInline, decl->pos);
}
}
@@ -583,13 +690,14 @@ Declaration::Print(int indent) const {
declarators[i]->Print(indent+4);
}
///////////////////////////////////////////////////////////////////////////
void
GetStructTypesNamesPositions(const std::vector<StructDeclaration *> &sd,
std::vector<const Type *> *elementTypes,
std::vector<std::string> *elementNames,
std::vector<SourcePos> *elementPositions) {
llvm::SmallVector<const Type *, 8> *elementTypes,
llvm::SmallVector<std::string, 8> *elementNames,
llvm::SmallVector<SourcePos, 8> *elementPositions) {
std::set<std::string> seenNames;
for (unsigned int i = 0; i < sd.size(); ++i) {
const Type *type = sd[i]->type;
@@ -599,35 +707,41 @@ GetStructTypesNamesPositions(const std::vector<StructDeclaration *> &sd,
// FIXME: making this fake little DeclSpecs here is really
// disgusting
DeclSpecs ds(type);
if (type->IsUniformType())
ds.typeQualifiers |= TYPEQUAL_UNIFORM;
else if (type->IsVaryingType())
ds.typeQualifiers |= TYPEQUAL_VARYING;
if (type->IsVoidType() == false) {
if (type->IsUniformType())
ds.typeQualifiers |= TYPEQUAL_UNIFORM;
else if (type->IsVaryingType())
ds.typeQualifiers |= TYPEQUAL_VARYING;
else if (type->GetSOAWidth() != 0)
ds.soaWidth = type->GetSOAWidth();
// FIXME: ds.vectorSize?
}
for (unsigned int j = 0; j < sd[i]->declarators->size(); ++j) {
Declarator *d = (*sd[i]->declarators)[j];
d->InitFromDeclSpecs(&ds);
Symbol *sym = d->GetSymbol();
if (d->type->IsVoidType())
Error(d->pos, "\"void\" type illegal for struct member.");
const ArrayType *arrayType =
dynamic_cast<const ArrayType *>(sym->type);
if (arrayType != NULL && arrayType->GetElementCount() == 0) {
Error(d->pos, "Unsized arrays aren't allowed in struct "
"definitions.");
elementTypes->push_back(NULL);
}
else
elementTypes->push_back(sym->type);
elementTypes->push_back(d->type);
if (seenNames.find(sym->name) != seenNames.end())
if (seenNames.find(d->name) != seenNames.end())
Error(d->pos, "Struct member \"%s\" has same name as a "
"previously-declared member.", sym->name.c_str());
"previously-declared member.", d->name.c_str());
else
seenNames.insert(sym->name);
seenNames.insert(d->name);
elementNames->push_back(sym->name);
elementPositions->push_back(sym->pos);
elementNames->push_back(d->name);
elementPositions->push_back(d->pos);
}
}
for (int i = 0; i < (int)elementTypes->size() - 1; ++i) {
const ArrayType *arrayType = CastType<ArrayType>((*elementTypes)[i]);
if (arrayType != NULL && arrayType->GetElementCount() == 0)
Error((*elementPositions)[i], "Unsized arrays aren't allowed except "
"for the last member in a struct definition.");
}
}

66
decl.h
View File

@@ -1,5 +1,5 @@
/*
Copyright (c) 2010-2011, Intel Corporation
Copyright (c) 2010-2013, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
@@ -28,7 +28,7 @@
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/** @file decl.h
@@ -47,30 +47,21 @@
variables--here, that the declaration has the 'static' and 'uniform'
qualifiers, and that it's basic type is 'int'. Then for each variable
declaration, the Declaraiton class holds an instance of a Declarator,
which in turn records the per-variable information like the symbol
name, array size (if any), initializer expression, etc.
which in turn records the per-variable information like the name, array
size (if any), initializer expression, etc.
*/
#ifndef ISPC_DECL_H
#define ISPC_DECL_H
#include "ispc.h"
#include <llvm/ADT/SmallVector.h>
struct VariableDeclaration;
class Declaration;
class Declarator;
enum StorageClass {
SC_NONE,
SC_EXTERN,
SC_EXPORT,
SC_STATIC,
SC_TYPEDEF,
SC_EXTERN_C
};
/* Multiple qualifiers can be provided with types in declarations;
therefore, they are set up so that they can be ANDed together into an
int. */
@@ -82,6 +73,8 @@ enum StorageClass {
#define TYPEQUAL_SIGNED (1<<4)
#define TYPEQUAL_UNSIGNED (1<<5)
#define TYPEQUAL_INLINE (1<<6)
#define TYPEQUAL_EXPORT (1<<7)
#define TYPEQUAL_UNMASKED (1<<8)
/** @brief Representation of the declaration specifiers in a declaration.
@@ -90,7 +83,8 @@ enum StorageClass {
*/
class DeclSpecs {
public:
DeclSpecs(const Type *t = NULL, StorageClass sc = SC_NONE, int tq = TYPEQUAL_NONE);
DeclSpecs(const Type *t = NULL, StorageClass sc = SC_NONE,
int tq = TYPEQUAL_NONE);
void Print() const;
@@ -117,6 +111,8 @@ public:
SOA width specified. Otherwise this is zero.
*/
int soaWidth;
std::vector<std::pair<std::string, SourcePos> > declSpecList;
};
@@ -128,7 +124,7 @@ enum DeclaratorKind {
DK_FUNCTION
};
/** @brief Representation of the declaration of a single variable.
/** @brief Representation of the declaration of a single variable.
In conjunction with an instance of the DeclSpecs, this gives us
everything we need for a full variable declaration.
@@ -138,25 +134,11 @@ public:
Declarator(DeclaratorKind dk, SourcePos p);
/** Once a DeclSpecs instance is available, this method completes the
initialization of the Symbol, setting its Type accordingly.
initialization of the type member.
*/
void InitFromDeclSpecs(DeclSpecs *ds);
/** Get the actual type of the combination of Declarator and the given
DeclSpecs. If an explicit base type is provided, the declarator is
applied to that type; otherwise the base type from the DeclSpecs is
used. */
const Type *GetType(DeclSpecs *ds) const;
const Type *GetType(const Type *base, DeclSpecs *ds) const;
/** Returns the symbol corresponding to the function declared by this
declarator and symbols for its arguments in *args. */
Symbol *GetFunctionInfo(DeclSpecs *ds, std::vector<Symbol *> *args);
Symbol *GetSymbolForFunctionParameter(int paramNum) const;
/** Returns the symbol associated with the declarator. */
Symbol *GetSymbol() const;
void InitFromType(const Type *base, DeclSpecs *ds);
void Print(int indent) const;
@@ -177,18 +159,24 @@ public:
/** Type qualifiers provided with the declarator. */
int typeQualifiers;
StorageClass storageClass;
/** For array declarators, this gives the declared size of the array.
Unsized arrays have arraySize == 0. */
Unsized arrays have arraySize == 0. */
int arraySize;
/** Symbol associated with the declarator. */
Symbol *sym;
/** Name associated with the declarator. */
std::string name;
/** Initialization expression for the variable. May be NULL. */
Expr *initExpr;
/** Type of the declarator. This is NULL until InitFromDeclSpecs() or
InitFromType() is called. */
const Type *type;
/** For function declarations, this holds the Declaration *s for the
funciton's parameters. */
function's parameters. */
std::vector<Declaration *> functionParams;
};
@@ -233,8 +221,8 @@ struct StructDeclaration {
/** Given a set of StructDeclaration instances, this returns the types of
the elements of the corresponding struct and their names. */
extern void GetStructTypesNamesPositions(const std::vector<StructDeclaration *> &sd,
std::vector<const Type *> *elementTypes,
std::vector<std::string> *elementNames,
std::vector<SourcePos> *elementPositions);
llvm::SmallVector<const Type *, 8> *elementTypes,
llvm::SmallVector<std::string, 8> *elementNames,
llvm::SmallVector<SourcePos, 8> *elementPositions);
#endif // ISPC_DECL_H

View File

@@ -1,3 +1,604 @@
=== v1.9.1 === (8 July 2016)
An ISPC update with new native AVX512 target for future Xeon CPUs and
improvements for debugging, including new switch --dwarf-version to support
debugging on old systems.
The release is based on patched LLVM 3.8.
=== v1.9.0 === (12 Feb 2016)
An ISPC release with AVX512 (KNL flavor) support and a number of bug fixes,
based on fresh LLVM 3.8 backend.
For AVX512 two modes are supported - generic and native. For instructions on how
to use them, please refer to the wiki. Going forward we assume that native mode
is the primary way to get AVX512 support and that generic mode will be deprecated.
If you observe significantly better performance in generic mode, please report
it via github issues.
Starting this release we are shipping two versions on Windows:
(1) for VS2013 and earlier releases
(2) for VS2015 and newer releases
The reason for doing this is the redesigned C run-time library in VS.
An implementation of "print" ISPC standard library function relies on C runtime
library, which has changed. If you are not using "print" function in your code,
you are safe to use either version.
A new options was introduced to improve debugging: --no-omit-frame-pointer.
=== v1.8.2 === (29 May 2015)
An ISPC update with several important stability fixes and an experimental
AVX512 support.
Current level of AVX512 support is targeting the new generation of Xeon Phi
codename Knights Landing. It's implemented in two different ways: as generic and
native target. Generic target is similar to KNC support and requires Intel C/C++
Compiler (15.0 and newer) and is available in regular ISPC build, which is
based on LLVM 3.6.1. For the native AVX512 target, we have a separate ISPC
build, which is based on LLVM trunk (3.7). This build is less stable and has
several known issues. Nevertheless, if you are interested in AVX512 support for
your code, we encourage you to try it and report the bugs. We actively working
with LLVM maintainers to fix all AVX512 bugs, so your feedback is important for
us and will ensure that bugs affecting your code are fixed by LLVM 3.7 release.
Other notable changes and fixes include:
* Broadwell support via --cpu=broadwell.
* Changed cpu naming to accept cpu codenames. Check help for more details.
* --cpu switch disallowed in multi-target mode.
* Alignment of structure fields (in generated header files) is changed to be
more consistent regardless used C/C++ compiler.
* --dllexport switch is added on Windows to make non-static functions DLL
export.
* --print-target switch is added to dump details of LLVM target machine.
This may help you to debug issues with code generation for incorrect target
(or more likely to ensure that code generation is done right).
* A bug was fixed, which triggered uniform statements to be executed with
all-off mask under some circumstances.
* The restriction for using some uniform types as return type in multi-target
mode with targets of different width was relaxed.
Also, if you are using ISPC for code generation for current generation of
Xeon Phi (Knights Corner), the following changes are for you:
* A bunch of stability fixes for KNC.
* A bug, which affects projects with multiple ISPC source files compiled with generic
target is fixed. As side effect, you may see multiple warnings about unused static
functions - you need to add "-wd177" switch to ICC compiling generic output files.
The release includes LLVM 3.6.1 binaries for Linux, MacOS, Windows and Windows based
cross-compiler for Sony PlayStation4. LLVM 3.5 based experimental Linux binary with
NVPTX support (now supporting also K80).
Native AVX512 support is available in the set of less stable LLVM 3.7 based binaries
for Linux, MacOS and Windows.
=== v1.8.1 === (31 December 2014)
A minor update of ``ispc`` with several important stability fixes, namely:
* Auto-dispatch mechanism is fixed in pre-built Linux binaries (it used to
select too conservative target).
* Compile crash with "-O2 -g" is fixed.
Also KNC (Xeon Phi) support is further improved.
The release includes experimental build for Sony PlayStation4 target (Windows
cross compiler), as well NVPTX experimental support (64 bit Linux binaries
only). Note that there might be NVPTX compilation fails with CUDA 7.0.
Similar to 1.8.0 all binaries are based on LLVM 3.5. MacOS binaries are built
for MacOS 10.9 Mavericks. Linux binaries are compatible with kernel 2.6.32
(ok for RHEL6) and later.
=== v1.8.0 === (16 October 2014)
A major new version of ISPC, which introduces experimental support for NVPTX
target, brings numerous improvements to our KNC (Xeon Phi) support, introduces
debugging support on Windows and fixes several bugs. We also ship experimental
build for Sony PlayStation4 target in this release. Binaries for all platforms
are based on LLVM 3.5.
Note that MacOS binaries are build for MacOS 10.9 Mavericks. Linux binaries are
compatible with kernel 2.6.32 (ok for RHEL6) and later.
More details:
* Experimental NVPTX support is available for users of our binary distribution
on Linux only at the moment. MacOS and Windows users willing to experiment
with this target are welcome to build it from source. Note that GPU imposes
some limitation on ISPC language, which are discussed in corresponding section
of ISPC User's Guide. Implementation of NVPTX support was done by our
contributor Evghenii Gaburov.
* KNC support was greatly extended in knc.h header file. Beyond new features
there are stability fixes and changes for icc 15.0 compatibility. Stdlib
prefetch functions were improved to map to KNC vector prefetches.
* PS4 experimental build is Windows to PS4 cross compiler, which disables arch
and cpu selection (which are preset to PS4 hardware).
* Debug info support on Windows (compatible with VS2010, VS2012 and VS2013).
* Critical bug fix, which caused code generation for incorrect target, despite
explicit target switches, under some conditions.
* Stability fix of the bug, which caused print() function to execute under
all-off mask under some conditions.
=== v1.7.0 === (18 April 2014)
A major new version of ISPC with several language and library extensions and
fixes in debug info support. Binaries for all platforms are based on patched
version on LLVM 3.4. There also performance improvements beyond switchover to
LLVM 3.4.
The list of language and library changes:
* Support for varying types in exported functions was added. See documentation
for more details.
* get_programCount() function was moved from stdlib.ispc to
examples/util/util.isph, which needs to be included somewhere in your
project, if you want to use it.
* Library functions for saturated arithmetic were added. add/sub/mul/div
operations are supported for signed and unsigned 8/16/32/64 integer types
(both uniform and varying).
* The algorithm for selecting overloaded function was extended to cover more
types of overloading. Handling of reference types in overloaded functions was
fixed. The rules for selecting the best match were changed to match C++,
which requires the function to be the best match for all parameters. In
ambiguous cases, a warning is issued, but it will be converted to an error
in the next release.
* Explicit typecasts between any two reference types were allowed.
* Implicit cast of pointer to const type to void* was disallowed.
The list of other notable changes is:
* Number of fixes for better debug info support.
* Memory corruption bug was fixed, which caused rare but not reproducible
compile time fails.
* Alias analysis was enabled (more aggressive optimizations are expected).
* A bug involving inaccurate handling of "const" qualifier was fixed. As a
result, more "const" qualifiers may appear in .h files, which may cause
compilation errors.
=== v1.6.0 === (19 December 2013)
A major new version of ISPC with major improvements in performance and
stability. Linux and MacOS binaries are based on patched version of LLVM 3.3,
while Windows version is based on LLVM 3.4rc3. LLVM 3.4 significantly improves
stability on Win32 platform, so we've decided not to wait for official LLVM 3.4
release.
The list of the most significant changes is:
* New avx1-i32x4 target was added. It may play well for you, if you are focused
on integer computations or FP unit in your hardware is 128 bit wide.
* Support for calculations in double precision was extended with two new
targets avx1.1-i64x4 and avx2-i64x4.
* Language support for overloaded operators was added.
* New library shift() function was added, which is similar to rotate(), but is
non-circular.
* The language was extended to accept 3 dimensional tasking - a syntactic sugar,
which may facilitate programming of some tasks.
* Regression, which broke --opt=force-aligned-memory is fixed.
If you are not using pre-built binaries, you may notice the following changes:
* VS2012/VS2013 are supported.
* alloy.py (with -b switch) can build LLVM for you on any platform now
(except MacOS 10.9, but we know about the problem and working on it).
This is a preferred way to build LLVM for ISPC, as all required patches for
better performance and stability will automatically apply.
* LLVM 3.5 (current trunk) is supported.
There are also multiple fixes for better performance and stability, most
notable are:
* Fixed performance problem for x2 targets.
* Fixed a problem with incorrect vzeroupper insertion on AVX target on Win32.
=== v1.5.0 === (27 September 2013)
A major new version of ISPC with several new targets and important bug fixes.
Here's a list of the most important changes, if you are using pre-built
binaries (which are based on patched version of LLVM 3.3):
* The naming of targets was changed to explicitly include data type width and
a number of threads in the gang. For example, avx2-i32x8 is avx2 target,
which uses 32 bit types as a base and has 8 threads in a gang. Old naming
scheme is still supported, but depricated.
* New SSE4 targets for calculations based on 8 bit and 16 bit data types:
sse4-i8x16 and sse4-i16x8.
* New AVX1 target for calculations based on 64 bit data types: avx1-i64x4.
* SVML support was extended and improved.
* Behavior of -g switch was changed to not affect optimization level.
* ISPC debug infrastructure was redesigned. See --help-dev for more info and
enjoy capabilities of new --debug-phase=<value> and --off-phase=<value>
switches.
* Fixed an auto-dispatch bug, which caused AVX code execution when OS doesn't
support AVX (but hardware does).
* Fixed a bug, which discarded uniform/varying keyword in typedefs.
* Several performance regressions were fixed.
If you are building ISPC yourself, then following changes are also available
to you:
* --cpu=slm for targeting Intel Atom codename Silvermont (if LLVM 3.4 is used).
* ARM NEON targets are available (if enabled in build system).
* --debug-ir=<value> is available to generate debug information based on LLVM
IR (if LLVM 3.4 is used). In debugger you'll see LLVM IR instead of source
code.
* A redesigned and improved test and configuration management system is
available to facilitate the process of building LLVM and testing ISPC
compiler.
Standard library changes/fixes:
* __pause() function was removed from standard library.
* Fixed reduce_[min|max]_[float|double] intrinsics, which were producing
incorrect code under some conditions.
Language changes:
* By default a floating point constant without a suffix is a single precision
constant (32 bit). A new suffix "d" was introduced to allow double precision
constant (64 bit). Please refer to tests/double-consts.ispc for syntax
examples.
=== v1.4.4 === (19 July 2013)
A minor version update with several stability fixes requested by the customers.
=== v1.4.3 === (25 June 2013)
A minor version update with several stability improvements:
* Two bugs were fixed (including a bug in LLVM) to improve stability on 32 bit
platforms.
* A bug affecting several examples was fixed.
* --instrument switch is fixed.
All tests and examples now properly compile and execute on native targets on
Unix platforms (Linux and MacOS).
=== v1.4.2 === (11 June 2013)
A minor version update with a few important changes:
* Stability fix for AVX2 target (Haswell) - problem with gather instructions was
released in LLVM 3.4, if you build with LLVM 3.2 or 3.3, it's available in our
repository (llvm_patches/r183327-AVX2-GATHER.patch) and needs to be applied
manually.
* Stability fix for widespread issue on Win32 platform (#503).
* Performance improvements for Xeon Phi related to mask representation.
Also LLVM 3.3 has been released and now it's the recommended version for building ISPC.
Precompiled binaries are also built with LLVM 3.3.
=== v1.4.1 === (28 May 2013)
A major new version of ispc has been released with stability and performance
improvements on all supported platforms (Windows, Linux and MacOS).
This version supports LLVM 3.1, 3.2, 3.3 and 3.4. The released binaries are built with 3.2.
New compiler features:
* ISPC memory allocation returns aligned memory with platform natural alignment
of vector registers by default. Alignment can also be managed via
--force-alignment=<value>.
Important bug fixes/changes:
* ISPC was fixed to be fully functional when built by GCC 4.7.
* Major cleanup of build and test scripts on Windows.
* Gather/scatter performance improvements on Xeon Phi.
* FMA instructions are enabled for AVX2 instruction set.
* Support of RDRAND instruction when available via library function rdrand (Ivy Bridge).
Release also contains numerous bug fixes and minor improvements.
=== v1.3.0 === (29 June 2012)
This is a major new release of ispc, with support for more compilation
targets and a number of additions to the language. As usual, the quality
of generated code has also been improved in a number of cases and a number
of small bugs have been fixed.
New targets:
* This release provides "beta" support for compiling to Intel® Xeon
Phi™ processor, code named Knights Corner, the first processor in
the Intel® Many Integrated Core Architecture. See
http://ispc.github.com/ispc.html#compiling-for-the-intel-xeon-phi-architecture
for more details on this support.
* This release also has an "avx1.1" target, which provides support for the
new instructions in the Intel Ivy Bridge microarchitecutre.
New language features:
* The foreach_active statement allows iteration over the active program
instances in a gang. (See
http://ispc.github.com/ispc.html#iteration-over-active-program-instances-foreach-active)
* foreach_unique allows iterating over subsets of program instances in a
gang that share the same value of a variable. (See
http://ispc.github.com/ispc.html#iteration-over-unique-elements-foreach-unique)
* An "unmasked" function qualifier and statement in the language allow
re-activating execution of all program instances in a gang. (See
http://ispc.github.com/ispc.html#re-establishing-the-execution-mask
Standard library updates:
* The seed_rng() function has been modified to take a "varying" seed value
when a varying RNGState is being initialized.
* An isnan() function has been added, to check for floating-point "not a
number" values.
* The float_to_srgb8() routine does high performance conversion of
floating-point color values to SRGB8 format.
Other changes:
* A number of bugfixes have been made for compiler crashes with malformed
programs.
* Floating-point comparisons are now "unordered", so that any comparison
where one of the operands is a "not a number" value returns false. (This
matches standard IEEE floating-point behavior.)
* The code generated for 'break' statements in "varying" loops has been
improved for some common cases.
* Compile time and compiler memory use have both been improved,
particularly for large input programs.
* A nubmer of bugs have been fixed in the debugging information generated
by the compiler when the "-g" command-line flag is used.
=== v1.2.2 === (20 April 2012)
This release includes a number of small additions to functionality and a
number of bugfixes. New functionality includes:
* It's now possible to forward declare structures as in C/C++: "struct
Foo;". After such a declaration, structs with pointers to "Foo" and
functions that take pointers or references to Foo structs can be declared
without the entire definition of Foo being available.
* New built-in types size_t, ptrdiff_t, and [u]intptr_t are now available,
corresponding to the equivalent types in C.
* The standard library now provides atomic_swap*() and
atomic_compare_exchange*() functions for void * types.
* The C++ backend has seen a number of improvements to the quality and
readability of generated code.
A number of bugs have been fixed in this release as well. The most
significant are:
* Fixed a bug where nested loops could cause a compiler crash in some
circumstances (issues #240, and #229)
* Gathers could access invlaid mamory (and cause the program to crash) in
some circumstances (#235)
* References to temporary values are now handled properly when passed to a
function that takes a reference typed parameter.
* A case where incorrect code could be generated for compile-time-constant
initializers has been fixed (#234).
=== v1.2.1 === (6 April 2012)
This release contains only minor new functionality and is mostly for many
small bugfixes and improvements to error handling and error reporting.
The new functionality that is present is:
* Significantly more efficient versions of the float / half conversion
routines are now available in the standard library, thanks to Fabian
Giesen.
* The last member of a struct can now be a zero-length array; this allows
the trick of dynamically allocating enough storage for the struct and
some number of array elements at the end of it.
Significant bugs fixed include:
* Issue #205: When a target ISA isn't specified, use the host system's
capabilities to choose a target for which it will be able to run the
generated code.
* Issues #215 and #217: Don't allocate storage for global variables that
are declared "extern".
* Issue #197: Allow NULL as a default argument value in a function
declaration.
* Issue #223: Fix bugs where taking the address of a function wouldn't work
as expected.
* Issue #224: When there are overloaded variants of a function that take
both reference and const reference parameters, give the non-const
reference preference when matching values of that underlying type.
* Issue #225: An error is issed when a varying lvalue is assigned to a
reference type (rather than crashing).
* Issue #193: Permit conversions from array types to void *, not just the
pointer type of the underlying array element.
* Issue #199: Still evaluate expressions that are cast to (void).
The documentation has also been improved, with FAQs added to clarify some
aspects of the ispc pointer model.
=== v1.2.0 === (20 March 2012)
This is a major new release of ispc, with a number of significant
improvements to functionality, performance, and compiler robustness. It
does, however, include three small changes to language syntax and semantics
that may require changes to existing programs:
* Syntax for the "launch" keyword has been cleaned up; it's now no longer
necessary to bracket the launched function call with angle brackets.
(In other words, now use "launch foo();", rather than "launch < foo() >;".
* When using pointers, the pointed-to data type is now "uniform" by
default. Use the varying keyword to specify varying pointed-to types when
needed. (i.e. "float *ptr" is a varying pointer to uniform float data,
whereas previously it was a varying pointer to varying float values.)
Use "varying float *" to specify a varying pointer to varying float data,
and so forth.
* The details of "uniform" and "varying" and how they interact with struct
types have been cleaned up. Now, when a struct type is declared, if the
struct elements don't have explicit "uniform" or "varying" qualifiers,
they are said to have "unbound" variability. When a struct type is
instantiated, any unbound variability elements inherit the variability of
the parent struct type. See http://ispc.github.com/ispc.html#struct-types
for more details.
ispc has a new language feature that makes it much easier to use the
efficient "(array of) structure of arrays" (AoSoA, or SoA) memory layout of
data. A new "soa<n>" qualifier can be applied to structure types to
specify an n-wide SoA version of the corresponding type. Array indexing
and pointer operations with arrays SoA types automatically handles the
two-stage indexing calculation to access the data. See
http://ispc.github.com/ispc.html#structure-of-array-types for more details.
For more efficient access of data that is still in "array of structures"
(AoS) format, ispc has a new "memory coalescing" optimization that
automatically detects series of strided loads and/or gathers that can be
transformed into a more efficient set of vector loads and shuffles. A
diagnostic is emitted when this optimization is successfully applied.
Smaller changes in this release:
* The standard library now provides memcpy(), memmove() and memset()
functions, as well as single-precision asin() and acos() functions.
* -I can now be specified on the command-line to specify a search path for
#include files.
* A number of improvements have been made to error reporting from the
parser, and a number of cases where malformed programs could cause the
compiler to crash have been fixed.
* A number of small improvements to the quality and performance of generated
code have been made, including finding more cases where 32-bit addressing
calculations can be safely done on 64-bit systems and generating better
code for initializer expressions.
=== v1.1.4 === (4 February 2012)
There are two major bugfixes for Windows in this release. First, a number
of failures in AVX code generation on Windows have been fixed; AVX on
Windows now has no known issues. Second, a longstanding bug in parsing 64-bit
integer constants on Windows has been fixed.
This release features a new experimental scalar target, contributed by Gabe
Weisz <gweisz@cs.cmu.edu>. This target ("--target=generic-1") compiles
gangs of single program instances (i.e. programCount == 1); it can be
useful for debugging ispc programs.
The compiler now supports dynamic memory allocation in ispc programs (with
"new" and "delete" operators based on C++). See
http://ispc.github.com/ispc.html#dynamic-memory-allocation in the
documentation for more information.
ispc now performs "short circuit" evaluation of the || and && logical
operators and the ? : selection operator. (This represents the correction
of a major incompatibility with C.) Code like "(index < arraySize &&
array[index] == 1)" thus now executes as in C, where "array[index]" won't
be evaluated unless "index" is less than "arraySize".
The standard library now provides "local" atomic operations, which are
atomic across the gang of program instances (but not across other gangs or
other hardware threads. See the updated documentation on atomics for more
information:
http://ispc.github.com/ispc.html#atomic-operations-and-memory-fences.
The standard library now offers a clock() function, which returns a uniform
int64 value that counts processor cycles; it can be used for
fine-resolution timing measurements.
Finally (of limited interest now): ispc now supports the forthcoming AVX2
instruction set, due with Haswell-generation CPUs. All tests and examples
compile and execute correctly with AVX2. (Thanks specifically to Craig
Topper and Nadav Rotem for work on AVX2 support in LLVM, which made this
possible.)
=== v1.1.3 === (20 January 2012)
With this release, the language now supports "switch" statements, with the
same semantics and syntax as in C.
This release includes fixes for two important performance related issues:
the quality of code generated for "foreach" statements has been
substantially improved (https://github.com/ispc/ispc/issues/151), and a
performance regression with code for "gathers" that was introduced in
v1.1.2 has been fixed in this release.
A number of other small bugs were fixed in this release as well, including
one where invalid memory would sometimes be incorrectly accessed
(https://github.com/ispc/ispc/issues/160).
Thanks to Jean-Luc Duprat for a number of patches that improve support for
building on various platforms, and to Pierre-Antoine Lacaze for patches so
that ispc builds under MinGW.
=== v1.1.2 === (9 January 2012)
The major new feature in this release is support for "generic" C++

View File

@@ -1,11 +1,16 @@
#!/bin/bash
rst2html=rst2html.py
for i in ispc perfguide faq; do
rst2html.py --template=template.txt --link-stylesheet \
$rst2html --template=template.txt --link-stylesheet \
--stylesheet-path=css/style.css $i.rst > $i.html
done
rst2html.py --template=template-perf.txt --link-stylesheet \
$rst2html --template=template-news.txt --link-stylesheet \
--stylesheet-path=css/style.css news.rst > news.html
$rst2html --template=template-perf.txt --link-stylesheet \
--stylesheet-path=css/style.css perf.rst > perf.html
#rst2latex --section-numbering --documentclass=article --documentoptions=DIV=9,10pt,letterpaper ispc.txt > ispc.tex

View File

@@ -1,10 +1,10 @@
=============================================================
Intel® SPMD Program Compiler Frequently Asked Questions (FAQ)
=============================================================
=====================================
Frequently Asked Questions About ispc
=====================================
This document includes a number of frequently (and not frequently) asked
questions about ispc, the Intel® SPMD Program Compiler. The source to this
document is in the file ``docs/faq.txt`` in the ``ispc`` source
document is in the file ``docs/faq.rst`` in the ``ispc`` source
distribution.
* Understanding ispc's Output
@@ -14,11 +14,24 @@ distribution.
+ `Why are there multiple versions of exported ispc functions in the assembly output?`_
+ `How can I more easily see gathers and scatters in generated assembly?`_
* Running The Compiler
+ `Why is it required to use one of the "generic" targets with C++ output?`_
+ `Why won't the compiler generate an object file or assembly output with the "generic" targets?`_
* Language Details
+ `What is the difference between "int *foo" and "int foo[]"?`_
+ `Why are pointed-to types "uniform" by default?`_
+ `What am I getting an error about assigning a varying lvalue to a reference type?`_
* Interoperability
+ `How can I supply an initial execution mask in the call from the application?`_
+ `How can I generate a single binary executable with support for multiple instruction sets?`_
+ `How can I determine at run-time which vector instruction set's instructions were selected to execute?`_
+ `Is it possible to inline ispc functions in C/C++ code?`_
+ `Why is it illegal to pass "varying" values from C/C++ to ispc functions?`_
* Programming Techniques
@@ -26,6 +39,8 @@ distribution.
+ `How can a gang of program instances generate variable amounts of output efficiently?`_
+ `Is it possible to use ispc for explicit vector programming?`_
+ `How can I debug my ispc programs using Valgrind?`_
+ `foreach statements generate more complex assembly than I'd expect; what's going on?`_
+ `How do I launch an individual task for each active program instance?`_
Understanding ispc's Output
===========================
@@ -212,6 +227,174 @@ easier to understand:
jmp ___pseudo_scatter_base_offsets32_32 ## TAILCALL
Running The Compiler
====================
Why is it required to use one of the "generic" targets with C++ output?
-----------------------------------------------------------------------
The C++ output option transforms the provided ``ispc`` program source into
C++ code where each basic operation in the program (addition, comparison,
etc.) is represented as a function call to an as-yet-undefined function,
chaining the results of these calls together to perform the required
computations. It is then expected that the user will provide the
implementation of these functions via a header file with ``inline``
functions defined for each of these functions and then use a C++ compiler
to generate a final object file. (Examples of these headers include
``examples/intrinsics/sse4.h`` and ``examples/intrinsics/knc.h`` in the
``ispc`` distribution.)
If a target other than one of the "generic" ones is used with C++ output,
then the compiler will transform certain operations into particular code
sequences that may not be desired for the actual final target; for example,
SSE targets that don't have hardware "gather" instructions will transform a
gather into a sequence of scalar load instructions. When this in turn is
transformed to C++ code, the fact that the loads were originally a gather
is lost, and the header file of function definitions wouldn't have a chance
to map the "gather" to a target-specific operation, as the ``knc.h`` header
does, for example. Thus, the "generic" targets exist to provide basic
targets of various vector widths, without imposing any limitations on the
final target's capabilities.
Why won't the compiler generate an object file or assembly output with the "generic" targets?
---------------------------------------------------------------------------------------------
As described in the above FAQ entry, when compiling to the "generic"
targets, ``ispc`` generates vector code for the source program that
transforms every basic operation in the program (addition, comparison,
etc.) into a separate function call.
While there is no fundamental reason that the compiler couldn't generate
target-specific object code with a function call to an undefined function
for each primitive operation, doing so wouldn't actually be useful in
practice--providing definitions of these functions in a separate object
file and actually performing function calls for each of them (versus
turning them into inline function calls) would be a highly inefficient way
to run the program.
Therefore, in the interests of encouraging the use of the system,
these types of output are disallowed.
Language Details
================
What is the difference between "int \*foo" and "int foo[]"?
-----------------------------------------------------------
In C and C++, declaring a function to take a parameter ``int *foo`` and
``int foo[]`` results in the same type for the parameter. Both are
pointers to integers. In ``ispc``, these are different types. The first
one is a varying pointer to a uniform integer value in memory, while the
second results in a uniform pointer to the start of an array of varying
integer values in memory.
To understand why the first is a varying pointer to a uniform integer,
first recall that types without explicit rate qualifiers (``uniform``,
``varying``, or ``soa<>``) are ``varying`` by default. Second, recall from
the `discussion of pointer types in the ispc User's Guide`_ that pointed-to
types without rate qualifiers are ``uniform`` by default. (This second
rule is discussed further below, in `Why are pointed-to types "uniform" by
default?`_.) The type of ``int *foo`` follows from these.
.. _discussion of pointer types in the ispc User's Guide: ispc.html#pointer-types
Conversely, in a function body, ``int foo[10]`` represents a declaration of
a 10-element array of varying ``int`` values. In that we'd certainly like
to be able to pass such an array to a function that takes a ``int []``
parameter, the natural type for an ``int []`` parameter is a uniform
pointer to varying integer values.
In terms of compatibility with C/C++, it's unfortunate that this
distinction exists, though any other set of rules seems to introduce more
awkwardness than this one. (Though we're interested to hear ideas to
improve these rules!).
Why are pointed-to types "uniform" by default?
----------------------------------------------
In ``ispc``, types without rate qualifiers are "varying" by default, but
types pointed to by pointers without rate qualifiers are "uniform" by
default. Why this difference?
::
int foo; // no rate qualifier, "varying int".
uniform int *foo; // pointer type has no rate qualifier, pointed-to does.
// "varying pointer to uniform int".
int *foo; // neither pointer type nor pointed-to type ("int") have
// rate qualifiers. Pointer type is varying by default,
// pointed-to is uniform. "varying pointer to uniform int".
varying int *foo; // varying pointer to varying int
The first rule, having types without rate qualifiers be varying by default,
is a default that keeps the number of "uniform" or "varying" qualifiers in
``ispc`` programs low. Most ``ispc`` programs use mostly "varying"
variables, so this rule allows most variables to be declared without also
requiring rate qualifiers.
On a related note, this rule allows many C/C++ functions to be used to
define equivalent functions in the SPMD execution model that ``ispc``
provides with little or no modification:
::
// scalar add in C/C++, SPMD/vector add in ispc
int add(int a, int b) { return a + b; }
This motivation also explains why ``uniform int *foo`` represents a varying
pointer; having pointers be varying by default if they don't have rate
qualifiers similarly helps with porting code from C/C++ to ``ispc``.
The tricker issue is why pointed-to types are "uniform" by default. In our
experience, data in memory that is accessed via pointers is most often
uniform; this generally includes all data that has been allocated and
initialized by the C/C++ application code. In practice, "varying" types are
more generally (but not exclusively) used for local data in ``ispc``
functions. Thus, making the pointed-to type uniform by default leads to
more concise code for the most common cases.
What am I getting an error about assigning a varying lvalue to a reference type?
--------------------------------------------------------------------------------
Given code like the following:
::
uniform float a[...];
int index = ...;
float &r = a[index];
``ispc`` issues the error "Initializer for reference-type variable "r" must
have a uniform lvalue type.". The underlying issue stems from how
references are represented in the code generated by ``ispc``. Recall that
``ispc`` supports both uniform and varying pointer types--a uniform pointer
points to the same location in memory for all program instances in the
gang, while a varying pointer allows each program instance to have its own
pointer value.
References are represented a pointer in the code generated by ``ispc``,
though this is generally opaque to the user; in ``ispc``, they are
specifically uniform pointers. This design decision was made so that given
code like this:
::
extern void func(float &val);
float foo = ...;
func(foo);
Then the reference would be handled efficiently as a single pointer, rather
than unnecessarily being turned into a gang-size of pointers.
However, an implication of this decision is that it's not possible for
references to refer to completely different things for each of the program
instances. (And hence the error that is issued). In cases where a unique
per-program-instance pointer is needed, a varying pointer should be used
instead of a reference.
Interoperability
================
@@ -346,6 +529,92 @@ In a similar fashion, it's possible to find out at run-time the value of
export uniform int width() { return programCount; }
Is it possible to inline ispc functions in C/C++ code?
------------------------------------------------------
If you're willing to use the ``clang`` C/C++ compiler that's part of the
LLVM tool suite, then it is possible to inline ``ispc`` code with C/C++
(and conversely, to inline C/C++ calls in ``ispc``). Doing so can provide
performance advantages when calling out to short functions written in the
"other" language. Note that you don't need to use ``clang`` to compile all
of your C/C++ code, but only for the files where you want to be able to
inline. In order to do this, you must have a full installation of LLVM
version 3.0 or later, including the ``clang`` compiler.
The basic approach is to have the various compilers emit LLVM intermediate
representation (IR) code and to then use tools from LLVM to link together
the IR from the compilers and then re-optimize it, which gives the LLVM
optimizer the opportunity to do additional inlining and cross-function
optimizations. If you have source files ``foo.ispc`` and ``foo.cpp``,
first emit LLVM IR:
::
ispc --emit-llvm -o foo_ispc.bc foo.ispc
clang -O2 -c -emit-llvm -o foo_cpp.bc foo.cpp
Next, link the two IR files into a single file and run the LLVM optimizer
on the result:
::
llvm-link foo_ispc.bc foo_cpp.bc -o - | opt -O3 -o foo_opt.bc
And finally, generate a native object file:
::
llc -filetype=obj foo_opt.bc -o foo.o
This file can in turn be linked in with the rest of your object files when
linking your applicaiton.
(Note that if you're using the AVX instruction set, you must provide the
``-mattr=+avx`` flag to ``llc``.)
Why is it illegal to pass "varying" values from C/C++ to ispc functions?
------------------------------------------------------------------------
If any of the types in the parameter list to an exported function is
"varying" (including recursively, and members of structure types, etc.),
then ``ispc`` will issue an error and refuse to compile the function:
::
% echo "export int add(int x) { return ++x; }" | ispc
<stdin>:1:12: Error: Illegal to return a "varying" type from exported function "foo"
<stdin>:1:20: Error: Varying parameter "x" is illegal in an exported function.
While there's no fundamental reason why this isn't possible, recall the
definition of "varying" variables: they have one value for each program
instance in the gang. As such, the number of values and amount of storage
required to represent a varying variable depends on the gang size
(i.e. ``programCount``), which can have different values depending on the
compilation target.
``ispc`` therefore prohibits passing "varying" values between the
application and the ``ispc`` program in order to prevent the
application-side code from depending on a particular gang size, in order to
encourage portability to different gang sizes. (A generally desirable
programming practice.)
For cases where the size of data is actually fixed from the application
side, the value can be passed via a pointer to a short ``uniform`` array,
as follows:
::
export void add4(uniform int ptr[4]) {
foreach (i = 0 ... 4)
ptr[i]++;
}
On the 4-wide SSE instruction set, this compiles to a single vector add
instruction (and associated move instructions), while it still also
efficiently computes the correct result on 8-wide AVX targets.
Programming Techniques
======================
@@ -480,3 +749,131 @@ you can use ``--target=sse4`` when compiling to run with ``valgrind``.
Note that ``valgrind`` does not yet support programs that use the AVX
instruction set.
foreach statements generate more complex assembly than I'd expect; what's going on?
-----------------------------------------------------------------------------------
Given a simple ``foreach`` loop like the following:
::
void foo(uniform float a[], uniform int count) {
foreach (i = 0 ... count)
a[i] *= 2;
}
the ``ispc`` compiler generates approximately 40 instructions--why isn't
the generated code simpler?
There are two main components to the code: one handles
``programCount``-sized chunks of elements of the array, and the other
handles any excess elements at the end of the array that don't completely
fill a gang. The code for the main loop is essentially what one would
expect: a vector of values are laoded from the array, the multiply is done,
and the result is stored.
::
LBB0_2: ## %foreach_full_body
movslq %edx, %rdx
vmovups (%rdi,%rdx), %ymm1
vmulps %ymm0, %ymm1, %ymm1
vmovups %ymm1, (%rdi,%rdx)
addl $32, %edx
addl $8, %eax
cmpl %ecx, %eax
jl LBB0_2
Then, there is a sequence of instructions that handles any additional
elements at the end of the array. (These instructions don't execute if
there aren't any left-over values to process, but they do lengthen the
amount of generated code.)
::
## BB#4: ## %partial_inner_only
vmovd %eax, %xmm0
vinsertf128 $1, %xmm0, %ymm0, %ymm0
vpermilps $0, %ymm0, %ymm0 ## ymm0 = ymm0[0,0,0,0,4,4,4,4]
vextractf128 $1, %ymm0, %xmm3
vmovd %esi, %xmm2
vmovaps LCPI0_1(%rip), %ymm1
vextractf128 $1, %ymm1, %xmm4
vpaddd %xmm4, %xmm3, %xmm3
# ....
vmulps LCPI0_0(%rip), %ymm1, %ymm1
vmaskmovps %ymm1, %ymm0, (%rdi,%rax)
If you know that the number of elements to be processed will always be an
exact multiple of the 8, 16, etc., then adding a simple assignment to
``count`` like the one below gives the compiler enough information to be
able to eliminate the code for the additional array elements.
::
void foo(uniform float a[], uniform int count) {
// This assignment doesn't change the value of count
// if it's a multiple of 16, but it gives the compiler
// insight into this fact, allowing for simpler code to
// be generated for the foreach loop.
count = (count & ~(16-1));
foreach (i = 0 ... count)
a[i] *= 2;
}
With this new version of ``foo()``, only the code for the first loop above
is generated.
How do I launch an individual task for each active program instance?
--------------------------------------------------------------------
Recall from the `discussion of "launch" in the ispc User's Guide`_ that a
``launch`` statement launches a single task corresponding to a single gang
of executing program instances, where the indices of the active program
instances are the same as were active when the ``launch`` statement
executed.
.. _discussion of "launch" in the ispc User's Guide: ispc.html#task-parallelism-launch-and-sync-statements
In some situations, it's desirable to be able to launch an individual task
for each executing program instance. For example, we might be performing
an iterative computation where a subset of the program instances determine
that an item they are responsible for requires additional processing.
::
bool itemNeedsMoreProcessing(int);
int itemNum = ...;
if (itemNeedsMoreProcessing(itemNum)) {
// do additional work
}
For performance reasons, it may be desirable to apply an entire gang's
worth of comptuation to each item that needs additional processing;
there may be available parallelism in this computation such that we'd like
to process each of the items with SPMD computation.
In this case, the ``foreach_active`` and ``unmasked`` constructs can be
applied together to accomplish this goal.
::
// do additional work
task void doWork(uniform int index);
foreach_active (index) {
unmasked {
launch doWork(extract(itemNum, index));
}
}
Recall that the body of the ``foreach_active`` loop runs once for each
active program instance, with each active program instance's
``programIndex`` value available in ``index`` in the above. In the loop,
we can re-establish an "all on" execution mask, enabling execution in all
of the program instances in the gang, such that execution in ``doWork()``
starts with all instances running. (Alternatively, the ``unmasked`` block
could be in the definition of ``doWork()``.)

File diff suppressed because it is too large Load Diff

179
docs/news.rst Normal file
View File

@@ -0,0 +1,179 @@
=========
ispc News
=========
ispc 1.9.1 is Released
----------------------
An ``ispc`` release with new native AVX512 target for future Xeon CPUs and
improvements for debugging. Release is based on patched version LLVM 3.8 backend.
For more details, please check `Release Notes`_.
.. _Release Notes: https://github.com/ispc/ispc/blob/master/docs/ReleaseNotes.txt
ispc 1.9.0 is Released
----------------------
An ``ispc`` release with AVX512 (KNL flavor) support and a number of bug fixes,
based on fresh LLVM 3.8 backend.
For more details, please check `Release Notes`_.
.. _Release Notes: https://github.com/ispc/ispc/blob/master/docs/ReleaseNotes.txt
ispc 1.8.2 is Released
----------------------
An update of ``ispc`` with several important stability fixes and an experimental
AVX512 support has been released. Binaries are based on LLVM 3.6.1. Binaries with
native AVX512 support are based on LLVM 3.7 (r238198).
For more details, please check `Release Notes`_.
.. _Release Notes: https://github.com/ispc/ispc/blob/master/docs/ReleaseNotes.txt
ispc 1.8.1 is Released
----------------------
A minor update of ``ispc`` with several important stability fixes has been
released. Problem with auto-dispatch on Linux is fixed (affects only pre-built
binaries), the problem with -O2 -g is also fixed. There are several
improvements in Xeon Phi support. Similar to 1.8.0 all binaries are based on
LLVM 3.5.
ispc 1.8.0 is Released
----------------------
A major new version of ``ispc``, which introduces experimental support for NVPTX
target, brings numerous improvements to our KNC (Xeon Phi) support, introduces
debugging support on Windows and fixes several bugs. We also ship experimental
build for Sony PlayStation4 target in this release. Binaries for all platforms
are based on LLVM 3.5.
ispc 1.7.0 is Released
----------------------
A major new version of ``ispc`` with several language and library extensions and
fixes in debug info support. Binaries for all platforms are based on patched
version on LLVM 3.4. There also performance improvements beyond switchover to
LLVM 3.4.
ispc 1.6.0 is Released
----------------------
A major update of ``ispc`` has been released. The main focus is on improved
performance and stability. Several new targets were added. There are also
a number of language and library extensions. Released binaries are based on
patched LLVM 3.3 on Linux and MacOS and LLVM 3.4rc3 on Windows. Please refer
to Release Notes for complete set of changes.
ispc 1.5.0 is Released
----------------------
A major update of ``ispc`` has been released with several new targets available
and bunch of performance and stability fixes. The released binaries are built
with patched version of LLVM 3.3. Please refer to Release Notes for complete
set of changes.
ispc 1.4.4 is Released
----------------------
A minor update of ``ispc`` has been released with several stability improvements.
The released binaries are built with patched version of LLVM 3.3. Since this
release we also distribute 32 bit Linux binaries.
ispc 1.4.3 is Released
----------------------
A minor update of ``ispc`` has been released with several stability improvements.
All tests and examples now properly compile and execute on native targets on
Unix platforms (Linux and MacOS).
The released binaries are built with patched version of LLVM 3.3.
ispc 1.4.2 is Released
----------------------
A minor update of ``ispc`` has been released with stability fix for AVX2
(Haswell), fix for Win32 platform and performance improvements on Xeon Phi.
As usual, it's available on all supported platforms (Windows, Linux and MacOS).
This version supports LLVM 3.1, 3.2, 3.3 and 3.4, but now we are recommending
to avoid 3.1, as it's known to contain a number of stability problems and we are
planning to deprecate its support soon.
The released binaries are built with 3.3.
ispc 1.4.1 is Released
----------------------
A major new version of ``ispc`` has been released with stability and
performance improvements on all supported platforms (Windows, Linux and MacOS).
This version supports LLVM 3.1, 3.2, 3.3 and 3.4. The released binaries are
built with 3.2.
ispc 1.3.0 is Released
----------------------
A major new version of ``ispc`` has been released. In addition to a number
of new language features, this release notably features initial support for
compiling to the Intel Xeon Phi (Many Integrated Core) architecture.
ispc 1.2.1 is Released
----------------------
This is a bugfix release, fixing approximately 20 bugs in the system and
improving error handling and error reporting. New functionality includes
very efficient float/half conversion routines thanks to Fabian
Giesen. See the `1.2.1 release notes`_ for details.
.. _1.2.1 release notes: https://github.com/ispc/ispc/tree/master/docs/ReleaseNotes.txt
ispc 1.2.0 is Released
-----------------------
A new major release was posted on March 20, 2012. This release includes
significant new functionality for cleanly handling "structure of arrays"
(SoA) data layout and a new model for how uniform and varying are handled
with structure types.
Paper on ispc To Appear in InPar 2012
-------------------------------------
A technical paper on ``ispc``, `ispc: A SPMD Compiler for High-Performance
CPU Programming`_, by Matt Pharr and William R. Mark, has been accepted to
the `InPar 2012`_ conference. This paper describes a number of the design
features and key characteristics of the ``ispc`` implementation.
(© 2012 IEEE. Personal use of this material is permitted. Permission from
IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or
promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component
of this work in other works.).
.. _ispc\: A SPMD Compiler for High-Performance CPU Programming: https://github.com/downloads/ispc/ispc/ispc_inpar_2012.pdf
.. _InPar 2012: http://innovativeparallel.org/
ispc 1.1.4 is Released
----------------------
On February 4, 2012, the 1.1.4 release of ``ispc`` was posted; new features
include ``new`` and ``delete`` for dynamic memory allocation in ``ispc``
programs, "local" atomic operations in the standard library, and a new
scalar compilation target. See the `1.1.4 release notes`_ for details.
.. _1.1.4 release notes: https://github.com/ispc/ispc/tree/master/docs/ReleaseNotes.txt
ispc 1.1.3 is Released
----------------------
With this release, the language now supports "switch" statements, with the same semantics and syntax as in C.
This release includes fixes for two important performance related issues:
the quality of code generated for "foreach" statements has been
substantially improved, and performance regression with code for "gathers"
that was introduced in v1.1.2 has been fixed in this release.
Thanks to Jean-Luc Duprat for a number of patches that improve support for
building on various platforms, and to Pierre-Antoine Lacaze for patches so
that ispc builds under MinGW.

View File

@@ -13,6 +13,7 @@ the most out of ``ispc`` in practice.
+ `Improving Control Flow Coherence With "foreach_tiled"`_
+ `Using Coherent Control Flow Constructs`_
+ `Use "uniform" Whenever Appropriate`_
+ `Use "Structure of Arrays" Layout When Possible`_
* `Tips and Techniques`_
@@ -20,6 +21,7 @@ the most out of ``ispc`` in practice.
+ `Avoid 64-bit Addressing Calculations When Possible`_
+ `Avoid Computation With 8 and 16-bit Integer Types`_
+ `Implementing Reductions Efficiently`_
+ `Using "foreach_active" Effectively`_
+ `Using Low-level Vector Tricks`_
+ `The "Fast math" Option`_
+ `"inline" Aggressively`_
@@ -247,6 +249,76 @@ but it's always best to provide the compiler with as much help as possible
to understand the actual form of your computation.
Use "Structure of Arrays" Layout When Possible
----------------------------------------------
In general, memory access performance (for both reads and writes) is best
when the running program instances access a contiguous region of memory; in
this case efficient vector load and store instructions can often be used
rather than gathers and scatters. As an example of this issue, consider an
array of a simple point datatype laid out and accessed in conventional
"array of structures" (AOS) layout:
::
struct Point { float x, y, z; };
uniform Point pts[...];
float v = pts[programIndex].x;
In the above code, the access to ``pts[programIndex].x`` accesses
non-sequential memory locations, due to the ``y`` and ``z`` values between
the desired ``x`` values in memory. A "gather" is required to get the
value of ``v``, with a corresponding decrease in performance.
If ``Point`` was defined as a "structure of arrays" (SOA) type, the access
can be much more efficient:
::
struct Point8 { float x[8], y[8], z[8]; };
uniform Point8 pts8[...];
int majorIndex = programIndex / 8;
int minorIndex = programIndex % 8;
float v = pts8[majorIndex].x[minorIndex];
In this case, each ``Point8`` has 8 ``x`` values contiguous in memory
before 8 ``y`` values and then 8 ``z`` values. If the gang size is 8 or
less, the access for ``v`` will have the same value of ``majorIndex`` for
all program instances and will access consecutive elements of the ``x[8]``
array with a vector load. (For larger gang sizes, two 8-wide vector loads
would be issues, which is also quite efficient.)
However, the syntax in the above code is messy; accessing SOA data in this
fashion is much less elegant than the corresponding code for accessing the
data with AOS layout. The ``soa`` qualifier in ``ispc`` can be used to
cause the corresponding transformation to be made to the ``Point`` type,
while preserving the clean syntax for data access that comes with AOS
layout:
::
soa<8> Point pts[...];
float v = pts[programIndex].x;
Thanks to having SOA layout a first-class concept in the language's type
system, it's easy to write functions that convert data between the
layouts. For example, the ``aos_to_soa`` function below converts ``count``
elements of the given ``Point`` type from AOS to 8-wide SOA layout. (It
assumes that the caller has pre-allocated sufficient space in the
``pts_soa`` output array.
::
void aos_to_soa(uniform Point pts_aos[], uniform int count,
soa<8> pts_soa[]) {
foreach (i = 0 ... count)
pts_soa[i] = pts_aos[i];
}
Analogously, a function could be written to convert back from SOA to AOS if
needed.
Tips and Techniques
===================
@@ -339,6 +411,12 @@ based on the index, it can be worth doing. See the example
``examples/volume_rendering`` in the ``ispc`` distribution for the use of
this technique in an instance where it is beneficial to performance.
Understanding Memory Read Coalescing
------------------------------------
XXXX todo
Avoid 64-bit Addressing Calculations When Possible
--------------------------------------------------
@@ -433,6 +511,43 @@ values--very efficient code in the end.
return reduce_add(sum);
}
Using "foreach_active" Effectively
----------------------------------
For high-performance code,
For example, consider this segment of code, from the introduction of
``foreach_active`` in the ispc User's Guide:
::
uniform float array[...] = { ... };
int index = ...;
foreach_active (i) {
++array[index];
}
Here, ``index`` was assumed to possibly have the same value for multiple
program instances, so the updates to ``array[index]`` are serialized by the
``foreach_active`` statement in order to not have undefined results when
``index`` values do collide.
The code generated by the compiler can be improved in this case by making
it clear that only a single element of the array is accessed by
``array[index]`` and that thus a general gather or scatter isn't required.
Specifically, by using the ``extract()`` function from the standard library
to extract the current program instance's value of ``index`` into a
``uniform`` variable and then using that to index into ``array``, as below,
more efficient code is generated.
::
foreach_active (instanceNum) {
uniform int unifIndex = extract(index, instanceNum);
++array[unifIndex];
}
Using Low-level Vector Tricks
-----------------------------
@@ -547,7 +662,7 @@ gathers happen.)
extern "C" {
void ISPCInstrument(const char *fn, const char *note,
int line, int mask);
int line, uint64_t mask);
}
This function is passed the file name of the ``ispc`` file running, a short
@@ -560,7 +675,7 @@ as follows:
::
ISPCInstrument("foo.ispc", "function entry", 55, 0xf);
ISPCInstrument("foo.ispc", "function entry", 55, 0xfull);
This call indicates that at the currently executing program has just
entered the function defined at line 55 of the file ``foo.ispc``, with a
@@ -671,7 +786,7 @@ countries.
* Other names and brands may be claimed as the property of others.
Copyright(C) 2011, Intel Corporation. All rights reserved.
Copyright(C) 2011-2016, Intel Corporation. All rights reserved.
Optimization Notice

66
docs/template-news.txt Normal file
View File

@@ -0,0 +1,66 @@
%(head_prefix)s
%(head)s
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-1486404-4']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
%(stylesheet)s
%(body_prefix)s
<div id="wrap">
<div id="wrap2">
<div id="header">
<h1 id="logo">Intel SPMD Program Compiler</h1>
<div id="slogan">An open-source compiler for high-performance SIMD programming on
the CPU</div>
</div>
<div id="nav">
<div id="nbar">
<ul>
<li><a href="index.html">Overview</a></li>
<li id="selected"><a href="news.html">News</a></li>
<li><a href="features.html">Features</a></li>
<li><a href="downloads.html">Downloads</a></li>
<li><a href="documentation.html">Documentation</a></li>
<li><a href="perf.html">Performance</a></li>
<li><a href="contrib.html">Contributors</a></li>
</ul>
</div>
</div>
<div id="content-wrap">
<div id="sidebar">
<div class="widgetspace">
<h1>Resources</h1>
<ul class="menu">
<li><a href="http://github.com/ispc/ispc/">ispc page on github</a></li>
<li><a href="http://groups.google.com/group/ispc-users/">ispc
users mailing list</a></li>
<li><a href="http://groups.google.com/group/ispc-dev/">ispc
developers mailing list</a></li>
<li><a href="http://github.com/ispc/ispc/wiki/">Wiki</a></li>
<li><a href="http://github.com/ispc/ispc/issues/">Bug tracking</a></li>
<li><a href="doxygen/index.html">Doxygen</a></li>
</ul>
</div>
</div>
%(body_pre_docinfo)s
%(docinfo)s
<div id="content">
%(body)s
</div>
<div class="clearfix"></div>
<div id="footer"> &copy; 2011-2016 <strong>Intel Corporation</strong> | Valid <a href="http://validator.w3.org/check?uri=referer">XHTML</a> | <a href="http://jigsaw.w3.org/css-validator/check/referer">CSS</a> | ClearBlue by: <a href="http://www.themebin.com/">ThemeBin</a>
<!-- Please Do Not remove this link, thank u -->
</div>
</div>
</div>
</div>
%(body_suffix)s

View File

@@ -26,10 +26,12 @@
<div id="nbar">
<ul>
<li><a href="index.html">Overview</a></li>
<li><a href="news.html">News</a></li>
<li><a href="features.html">Features</a></li>
<li><a href="downloads.html">Downloads</a></li>
<li><a href="documentation.html">Documentation</a></li>
<li id="selected"><a href="perf.html">Performance</a></li>
<li><a href="contrib.html">Contributors</a></li>
</ul>
</div>
</div>
@@ -45,8 +47,7 @@
developers mailing list</a></li>
<li><a href="http://github.com/ispc/ispc/wiki/">Wiki</a></li>
<li><a href="http://github.com/ispc/ispc/issues/">Bug tracking</a></li>
<li><a href="doxygen/index.html">Doxygen documentation of
<tt>ispc</tt> source code</a></li>
<li><a href="doxygen/index.html">Doxygen</a></li>
</ul>
</div>
</div>
@@ -56,7 +57,7 @@
%(body)s
</div>
<div class="clearfix"></div>
<div id="footer"> &copy; 2011 <strong>Intel Corporation</strong> | Valid <a href="http://validator.w3.org/check?uri=referer">XHTML</a> | <a href="http://jigsaw.w3.org/css-validator/check/referer">CSS</a> | ClearBlue by: <a href="http://www.themebin.com/">ThemeBin</a>
<div id="footer"> &copy; 2011-2016 <strong>Intel Corporation</strong> | Valid <a href="http://validator.w3.org/check?uri=referer">XHTML</a> | <a href="http://jigsaw.w3.org/css-validator/check/referer">CSS</a> | ClearBlue by: <a href="http://www.themebin.com/">ThemeBin</a>
<!-- Please Do Not remove this link, thank u -->
</div>
</div>

View File

@@ -26,10 +26,12 @@
<div id="nbar">
<ul>
<li><a href="index.html">Overview</a></li>
<li><a href="news.html">News</a></li>
<li><a href="features.html">Features</a></li>
<li><a href="downloads.html">Downloads</a></li>
<li id="selected"><a href="documentation.html">Documentation</a></li>
<li><a href="perf.html">Performance</a></li>
<li><a href="contrib.html">Contributors</a></li>
</ul>
</div>
</div>
@@ -45,8 +47,7 @@
developers mailing list</a></li>
<li><a href="http://github.com/ispc/ispc/wiki/">Wiki</a></li>
<li><a href="http://github.com/ispc/ispc/issues/">Bug tracking</a></li>
<li><a href="doxygen/index.html">Doxygen documentation of
<tt>ispc</tt> source code</a></li>
<li><a href="doxygen/index.html">Doxygen</a></li>
</ul>
</div>
</div>
@@ -56,7 +57,7 @@
%(body)s
</div>
<div class="clearfix"></div>
<div id="footer"> &copy; 2011 <strong>Intel Corporation</strong> | Valid <a href="http://validator.w3.org/check?uri=referer">XHTML</a> | <a href="http://jigsaw.w3.org/css-validator/check/referer">CSS</a> | ClearBlue by: <a href="http://www.themebin.com/">ThemeBin</a>
<div id="footer"> &copy; 2011-2016 <strong>Intel Corporation</strong> | Valid <a href="http://validator.w3.org/check?uri=referer">XHTML</a> | <a href="http://jigsaw.w3.org/css-validator/check/referer">CSS</a> | ClearBlue by: <a href="http://www.themebin.com/">ThemeBin</a>
<!-- Please Do Not remove this link, thank u -->
</div>
</div>

View File

@@ -31,7 +31,7 @@ PROJECT_NAME = "Intel SPMD Program Compiler"
# This could be handy for archiving the generated documentation or
# if some version control system is used.
PROJECT_NUMBER = 1.1.2
PROJECT_NUMBER = 1.9.2dev
# The OUTPUT_DIRECTORY tag is used to specify the (relative or absolute)
# base path where the generated documentation will be put.
@@ -581,10 +581,12 @@ WARN_LOGFILE =
# directories like "/usr/src/myproject". Separate the files or directories
# with spaces.
INPUT = builtins.h \
INPUT = ast.h \
builtins.h \
ctx.h \
decl.h \
expr.h \
func.h \
ispc.h \
llvmutil.h \
module.h \
@@ -593,10 +595,13 @@ INPUT = builtins.h \
sym.h \
type.h \
util.h \
ast.cpp \
builtins.cpp \
cbackend.cpp \
ctx.cpp \
decl.cpp \
expr.cpp \
func.cpp \
ispc.cpp \
llvmutil.cpp \
main.cpp \
@@ -608,7 +613,7 @@ INPUT = builtins.h \
util.cpp \
parse.yy \
lex.ll \
builtins-c.c
builtins/builtins.c
# This tag can be used to specify the character encoding of the source files
# that doxygen parses. Internally doxygen uses the UTF-8 encoding, which is

View File

@@ -39,9 +39,6 @@ example implementation of this function that counts the number of times the
callback is made and records some statistics about control flow coherence
is provided in the instrument.cpp file.
*** Note: on Linux, this example currently hits an assertion in LLVM during
*** compilation
Deferred
========
@@ -76,6 +73,14 @@ This directory includes three implementations of the algorithm:
light culling and shading.
GMRES
=====
An implementation of the generalized minimal residual method for solving
sparse matrix equations.
(http://en.wikipedia.org/wiki/Generalized_minimal_residual_method)
Mandelbrot
==========
@@ -110,6 +115,13 @@ This program implements both the Black-Scholes and Binomial options pricing
models in both ispc and regular serial C++ code.
Perfbench
=========
This runs a number of microbenchmarks to measure system performance and
code generation quality.
RT
==
@@ -134,6 +146,11 @@ This is a simple "hello world" type program that shows a ~10 line
application program calling out to a ~5 line ispc program to do a simple
computation.
Sort
====
This is a bucket sort of 32 bit unsigned integers.
By default 1000000 random elements get sorted.
Call ./sort N in order to sort N elements instead.
Volume
======

View File

@@ -2,6 +2,7 @@
EXAMPLE=ao
CPP_SRC=ao.cpp ao_serial.cpp
ISPC_SRC=ao.ispc
ISPC_TARGETS=sse2,sse4,avx
ISPC_IA_TARGETS=sse2-i32x4,sse4-i32x4,avx1-i32x8,avx2-i32x8,avx512knl-i32x16,avx512skx-i32x16
ISPC_ARM_TARGETS=neon
include ../common.mk

View File

@@ -60,7 +60,7 @@ using namespace ispc;
extern void ao_serial(int w, int h, int nsubsamples, float image[]);
static unsigned int test_iterations;
static unsigned int test_iterations[] = {3, 7, 1};
static unsigned int width, height;
static unsigned char *img;
static float *fimg;
@@ -106,16 +106,20 @@ savePPM(const char *fname, int w, int h)
int main(int argc, char **argv)
{
if (argc != 4) {
if (argc < 3) {
printf ("%s\n", argv[0]);
printf ("Usage: ao [num test iterations] [width] [height]\n");
printf ("Usage: ao [width] [height] [ispc iterations] [tasks iterations] [serial iterations]\n");
getchar();
exit(-1);
}
else {
test_iterations = atoi(argv[1]);
width = atoi (argv[2]);
height = atoi (argv[3]);
if (argc == 6) {
for (int i = 0; i < 3; i++) {
test_iterations[i] = atoi(argv[3 + i]);
}
}
width = atoi (argv[1]);
height = atoi (argv[2]);
}
// Allocate space for output images
@@ -127,18 +131,19 @@ int main(int argc, char **argv)
// time for any of them.
//
double minTimeISPC = 1e30;
for (unsigned int i = 0; i < test_iterations; i++) {
for (unsigned int i = 0; i < test_iterations[0]; i++) {
memset((void *)fimg, 0, sizeof(float) * width * height * 3);
assert(NSUBSAMPLES == 2);
reset_and_start_timer();
ao_ispc(width, height, NSUBSAMPLES, fimg);
double t = get_elapsed_mcycles();
printf("@time of ISPC run:\t\t\t[%.3f] million cycles\n", t);
minTimeISPC = std::min(minTimeISPC, t);
}
// Report results and save image
printf("[aobench ispc]:\t\t\t[%.3f] M cycles (%d x %d image)\n",
printf("[aobench ispc]:\t\t\t[%.3f] million cycles (%d x %d image)\n",
minTimeISPC, width, height);
savePPM("ao-ispc.ppm", width, height);
@@ -147,18 +152,19 @@ int main(int argc, char **argv)
// minimum time for any of them.
//
double minTimeISPCTasks = 1e30;
for (unsigned int i = 0; i < test_iterations; i++) {
for (unsigned int i = 0; i < test_iterations[1]; i++) {
memset((void *)fimg, 0, sizeof(float) * width * height * 3);
assert(NSUBSAMPLES == 2);
reset_and_start_timer();
ao_ispc_tasks(width, height, NSUBSAMPLES, fimg);
double t = get_elapsed_mcycles();
printf("@time of ISPC + TASKS run:\t\t\t[%.3f] million cycles\n", t);
minTimeISPCTasks = std::min(minTimeISPCTasks, t);
}
// Report results and save image
printf("[aobench ispc + tasks]:\t\t[%.3f] M cycles (%d x %d image)\n",
printf("[aobench ispc + tasks]:\t\t[%.3f] million cycles (%d x %d image)\n",
minTimeISPCTasks, width, height);
savePPM("ao-ispc-tasks.ppm", width, height);
@@ -167,16 +173,17 @@ int main(int argc, char **argv)
// minimum time.
//
double minTimeSerial = 1e30;
for (unsigned int i = 0; i < test_iterations; i++) {
for (unsigned int i = 0; i < test_iterations[2]; i++) {
memset((void *)fimg, 0, sizeof(float) * width * height * 3);
reset_and_start_timer();
ao_serial(width, height, NSUBSAMPLES, fimg);
double t = get_elapsed_mcycles();
printf("@time of serial run:\t\t\t\t[%.3f] million cycles\n", t);
minTimeSerial = std::min(minTimeSerial, t);
}
// Report more results, save another image...
printf("[aobench serial]:\t\t[%.3f] M cycles (%d x %d image)\n", minTimeSerial,
printf("[aobench serial]:\t\t[%.3f] million cycles (%d x %d image)\n", minTimeSerial,
width, height);
printf("\t\t\t\t(%.2fx speedup from ISPC, %.2fx speedup from ISPC + tasks)\n",
minTimeSerial / minTimeISPC, minTimeSerial / minTimeISPCTasks);

View File

@@ -50,7 +50,6 @@ struct Isect {
struct Sphere {
vec center;
float radius;
};
struct Plane {
@@ -82,8 +81,8 @@ static inline void vnormalize(vec &v) {
}
static inline void
ray_plane_intersect(Isect &isect, Ray &ray, Plane &plane) {
static void
ray_plane_intersect(Isect &isect, Ray &ray, uniform Plane &plane) {
float d = -dot(plane.p, plane.n);
float v = dot(ray.dir, plane.n);
@@ -103,7 +102,7 @@ ray_plane_intersect(Isect &isect, Ray &ray, Plane &plane) {
static inline void
ray_sphere_intersect(Isect &isect, Ray &ray, Sphere &sphere) {
ray_sphere_intersect(Isect &isect, Ray &ray, uniform Sphere &sphere) {
vec rs = ray.org - sphere.center;
float B = dot(rs, ray.dir);
@@ -124,7 +123,7 @@ ray_sphere_intersect(Isect &isect, Ray &ray, Sphere &sphere) {
}
static inline void
static void
orthoBasis(vec basis[3], vec n) {
basis[2] = n;
basis[1].x = 0.0; basis[1].y = 0.0; basis[1].z = 0.0;
@@ -147,8 +146,8 @@ orthoBasis(vec basis[3], vec n) {
}
static inline float
ambient_occlusion(Isect &isect, Plane &plane, Sphere spheres[3],
static float
ambient_occlusion(Isect &isect, uniform Plane &plane, uniform Sphere spheres[3],
RNGState &rngstate) {
float eps = 0.0001f;
vec p, n;
@@ -204,112 +203,52 @@ ambient_occlusion(Isect &isect, Plane &plane, Sphere spheres[3],
static void ao_scanlines(uniform int y0, uniform int y1, uniform int w,
uniform int h, uniform int nsubsamples,
uniform float image[]) {
static Plane plane = { { 0.0f, -0.5f, 0.0f }, { 0.f, 1.f, 0.f } };
static Sphere spheres[3] = {
static uniform Plane plane = { { 0.0f, -0.5f, 0.0f }, { 0.f, 1.f, 0.f } };
static uniform Sphere spheres[3] = {
{ { -2.0f, 0.0f, -3.5f }, 0.5f },
{ { -0.5f, 0.0f, -3.0f }, 0.5f },
{ { 1.0f, 0.0f, -2.2f }, 0.5f } };
RNGState rngstate;
seed_rng(&rngstate, y0);
seed_rng(&rngstate, programIndex + (y0 << (programIndex & 15)));
float invSamples = 1.f / nsubsamples;
// Compute the mapping between the 'programCount'-wide program
// instances running in parallel and samples in the image.
//
// For now, we'll always take four samples per pixel, so start by
// initializing du and dv with offsets into subpixel samples. We'll
// take care of further updating du and dv for the case where we're
// doing more than 4 program instances in parallel shortly.
uniform float uSteps[4] = { 0, 1, 0, 1 };
uniform float vSteps[4] = { 0, 0, 1, 1 };
float du = uSteps[programIndex % 4] / nsubsamples;
float dv = vSteps[programIndex % 4] / nsubsamples;
foreach_tiled(y = y0 ... y1, x = 0 ... w,
u = 0 ... nsubsamples, v = 0 ... nsubsamples) {
float du = (float)u * invSamples, dv = (float)v * invSamples;
// Now handle the case where we are able to do more than one pixel's
// worth of work at once. nx records the number of pixels in the x
// direction we do per iteration and ny the number in y.
uniform int nx = 1, ny = 1;
// Figure out x,y pixel in NDC
float px = (x + du - (w / 2.0f)) / (w / 2.0f);
float py = -(y + dv - (h / 2.0f)) / (h / 2.0f);
float ret = 0.f;
Ray ray;
Isect isect;
// FIXME: We actually need ny to be 1 regardless of the decomposition,
// since the task decomposition is one scanline high.
ray.org = 0.f;
if (programCount == 8) {
// Do two pixels at once in the x direction
nx = 2;
if (programIndex >= 4)
// And shift the offsets for the second pixel's worth of work
++du;
}
else if (programCount == 16) {
nx = 4;
ny = 1;
if (programIndex >= 4 && programIndex < 8)
++du;
if (programIndex >= 8 && programIndex < 12)
du += 2;
if (programIndex >= 12)
du += 3;
}
// Poor man's perspective projection
ray.dir.x = px;
ray.dir.y = py;
ray.dir.z = -1.0;
vnormalize(ray.dir);
// Now loop over all of the pixels, stepping in x and y as calculated
// above. (Assumes that ny divides y and nx divides x...)
for (uniform int y = y0; y < y1; y += ny) {
for (uniform int x = 0; x < w; x += nx) {
// Figure out x,y pixel in NDC
float px = (x + du - (w / 2.0f)) / (w / 2.0f);
float py = -(y + dv - (h / 2.0f)) / (h / 2.0f);
float ret = 0.f;
Ray ray;
Isect isect;
isect.t = 1.0e+17;
isect.hit = 0;
ray.org = 0.f;
for (uniform int snum = 0; snum < 3; ++snum)
ray_sphere_intersect(isect, ray, spheres[snum]);
ray_plane_intersect(isect, ray, plane);
// Poor man's perspective projection
ray.dir.x = px;
ray.dir.y = py;
ray.dir.z = -1.0;
vnormalize(ray.dir);
// Note use of 'coherent' if statement; the set of rays we
// trace will often all hit or all miss the scene
cif (isect.hit) {
ret = ambient_occlusion(isect, plane, spheres, rngstate);
ret *= invSamples * invSamples;
isect.t = 1.0e+17;
isect.hit = 0;
for (uniform int snum = 0; snum < 3; ++snum)
ray_sphere_intersect(isect, ray, spheres[snum]);
ray_plane_intersect(isect, ray, plane);
// Note use of 'coherent' if statement; the set of rays we
// trace will often all hit or all miss the scene
cif (isect.hit)
ret = ambient_occlusion(isect, plane, spheres, rngstate);
// This is a little grungy; we have results for
// programCount-worth of values. Because we're doing 2x2
// subsamples, we need to peel them off in groups of four,
// average the four values for each pixel, and update the
// output image.
//
// Store the varying value to a uniform array of the same size.
// See the discussion about communication among program
// instances in the ispc user's manual for more discussion on
// this idiom.
uniform float retArray[programCount];
retArray[programIndex] = ret;
// offset to the first pixel in the image
uniform int offset = 3 * (y * w + x);
for (uniform int p = 0; p < programCount; p += 4, offset += 3) {
// Get the four sample values for this pixel
uniform float sumret = retArray[p] + retArray[p+1] + retArray[p+2] +
retArray[p+3];
// Normalize by number of samples taken
sumret /= nsubsamples * nsubsamples;
// Store result in the image
image[offset+0] = sumret;
image[offset+1] = sumret;
image[offset+2] = sumret;
}
int offset = 3 * (y * w + x);
atomic_add_local(&image[offset], ret);
atomic_add_local(&image[offset+1], ret);
atomic_add_local(&image[offset+2], ret);
}
}
}
@@ -329,5 +268,5 @@ static void task ao_task(uniform int width, uniform int height,
export void ao_ispc_tasks(uniform int w, uniform int h, uniform int nsubsamples,
uniform float image[]) {
launch[h] < ao_task(w, h, nsubsamples, image) >;
launch[h] ao_task(w, h, nsubsamples, image);
}

View File

@@ -18,159 +18,17 @@
<Platform>x64</Platform>
</ProjectConfiguration>
</ItemGroup>
<PropertyGroup Label="Globals">
<ProjectGuid>{F29204CA-19DF-4F3C-87D5-03F4EEDAAFEB}</ProjectGuid>
<Keyword>Win32Proj</Keyword>
<RootNamespace>aobench</RootNamespace>
<ISPC_file>ao</ISPC_file>
<default_targets>sse2,sse4,avx1-i32x8</default_targets>
</PropertyGroup>
<Import Project="..\common.props" />
<ItemGroup>
<ClCompile Include="ao.cpp" />
<ClCompile Include="ao_serial.cpp" />
<ClCompile Include="../tasksys.cpp" />
</ItemGroup>
<ItemGroup>
<CustomBuild Include="ao.ispc">
<FileType>Document</FileType>
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --arch=x86 --target=sse2,sse4,avx
</Command>
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --target=sse2,sse4,avx
</Command>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
<Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --arch=x86 --target=sse2,sse4,avx
</Command>
<Command Condition="'$(Configuration)|$(Platform)'=='Release|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --target=sse2,sse4,avx
</Command>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|x64'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
</CustomBuild>
</ItemGroup>
<PropertyGroup Label="Globals">
<ProjectGuid>{F29204CA-19DF-4F3C-87D5-03F4EEDAAFEB}</ProjectGuid>
<Keyword>Win32Proj</Keyword>
<RootNamespace>aobench</RootNamespace>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings">
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="PropertySheets">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="PropertySheets">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<PropertyGroup Label="UserMacros" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<LinkIncremental>true</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<LinkIncremental>true</LinkIncremental>
<ExecutablePath>$(ExecutablePath);$(ProjectDir)..\..</ExecutablePath>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<LinkIncremental>false</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<LinkIncremental>false</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
</PropertyGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<ClCompile>
<PrecompiledHeader>
</PrecompiledHeader>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
<IntrinsicFunctions>true</IntrinsicFunctions>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<ClCompile>
<PrecompiledHeader>
</PrecompiledHeader>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
<IntrinsicFunctions>true</IntrinsicFunctions>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<PrecompiledHeader>
</PrecompiledHeader>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<PrecompiledHeader>
</PrecompiledHeader>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
</Link>
</ItemDefinitionGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
</Project>
</Project>

View File

@@ -1,5 +1,5 @@
CXX=g++ -m64
CXX=clang++ -m64
CXXFLAGS=-Iobjs/ -g3 -Wall
ISPC=ispc
ISPCFLAGS=-O2 --instrument --arch=x86-64 --target=sse2
@@ -14,13 +14,13 @@ dirs:
clean:
/bin/rm -rf objs *~ ao
ao: objs/ao.o objs/instrument.o objs/ao_ispc.o ../tasksys.cpp
ao: objs/ao.o objs/instrument.o objs/ao_instrumented_ispc.o ../tasksys.cpp
$(CXX) $(CXXFLAGS) -o $@ $^ -lm -lpthread
objs/%.o: %.cpp dirs
$(CXX) $< $(CXXFLAGS) -c -o $@
objs/ao.o: objs/ao_ispc.h
objs/ao.o: objs/ao_instrumented_ispc.h
objs/%_ispc.h objs/%_ispc.o: %.ispc dirs
$(ISPC) $(ISPCFLAGS) $< -o objs/$*_ispc.o -h objs/$*_instrumented_ispc.h
$(ISPC) $(ISPCFLAGS) $< -o objs/$*_ispc.o -h objs/$*_ispc.h

View File

@@ -35,6 +35,8 @@
#define NOMINMAX
#pragma warning (disable: 4244)
#pragma warning (disable: 4305)
// preventing MSVC fopen() deprecation complaints
#define _CRT_SECURE_NO_DEPRECATE
#endif
#include <stdio.h>

View File

@@ -211,7 +211,7 @@ static void ao_scanlines(uniform int y0, uniform int y1, uniform int w,
{ { 1.0f, 0.0f, -2.2f }, 0.5f } };
RNGState rngstate;
seed_rng(&rngstate, y0);
seed_rng(&rngstate, programIndex + (y0 << (programIndex & 15)));
// Compute the mapping between the 'programCount'-wide program
// instances running in parallel and samples in the image.
@@ -329,5 +329,5 @@ static void task ao_task(uniform int width, uniform int height,
export void ao_ispc_tasks(uniform int w, uniform int h, uniform int nsubsamples,
uniform float image[]) {
launch[h] < ao_task(w, h, nsubsamples, image) >;
launch[h] ao_task(w, h, nsubsamples, image);
}

View File

@@ -18,157 +18,18 @@
<Platform>x64</Platform>
</ProjectConfiguration>
</ItemGroup>
<PropertyGroup Label="Globals">
<ProjectGuid>{B3B4AE3D-6D5A-4CF9-AF5B-43CF2131B958}</ProjectGuid>
<Keyword>Win32Proj</Keyword>
<RootNamespace>aobench_instrumented</RootNamespace>
<ISPC_file>ao_instrumented</ISPC_file>
<default_targets>sse2</default_targets>
<flags>--instrument</flags>
</PropertyGroup>
<Import Project="..\common.props" />
<ItemGroup>
<ClCompile Include="ao.cpp" />
<ClCompile Include="instrument.cpp" />
<ClCompile Include="../tasksys.cpp" />
</ItemGroup>
<ItemGroup>
<CustomBuild Include="ao.ispc">
<FileType>Document</FileType>
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename)_instrumented.obj -h $(TargetDir)%(Filename)_instrumented_ispc.h --arch=x86 --instrument --target=sse2
</Command>
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename)_instrumented.obj -h $(TargetDir)%(Filename)_instrumented_ispc.h --instrument --target=sse2
</Command>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">$(TargetDir)%(Filename)_instrumented.obj;$(TargetDir)%(Filename)_instrumented_ispc.h</Outputs>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">$(TargetDir)%(Filename)_instrumented.obj;$(TargetDir)%(Filename)_instrumented_ispc.h</Outputs>
<Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename)_instrumented.obj -h $(TargetDir)%(Filename)_instrumented_ispc.h --arch=x86 --instrument --target=sse2
</Command>
<Command Condition="'$(Configuration)|$(Platform)'=='Release|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename)_instrumented.obj -h $(TargetDir)%(Filename)_instrumented_ispc.h --instrument --target=sse2
</Command>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">$(TargetDir)%(Filename)_instrumented.obj;$(TargetDir)%(Filename)_instrumented_ispc.h</Outputs>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|x64'">$(TargetDir)%(Filename)_instrumented.obj;$(TargetDir)%(Filename)_instrumented_ispc.h</Outputs>
</CustomBuild>
</ItemGroup>
<PropertyGroup Label="Globals">
<ProjectGuid>{B3B4AE3D-6D5A-4CF9-AF5B-43CF2131B958}</ProjectGuid>
<Keyword>Win32Proj</Keyword>
<RootNamespace>aobench_instrumented</RootNamespace>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings">
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="PropertySheets">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="PropertySheets">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<PropertyGroup Label="UserMacros" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<LinkIncremental>true</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
<PreBuildEventUseInBuild>true</PreBuildEventUseInBuild>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<LinkIncremental>true</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
<PreBuildEventUseInBuild>true</PreBuildEventUseInBuild>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<LinkIncremental>false</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
<PreBuildEventUseInBuild>true</PreBuildEventUseInBuild>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<LinkIncremental>false</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
<PreBuildEventUseInBuild>true</PreBuildEventUseInBuild>
</PropertyGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<ClCompile>
<PrecompiledHeader>
</PrecompiledHeader>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<ClCompile>
<PrecompiledHeader>
</PrecompiledHeader>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<PrecompiledHeader>
</PrecompiledHeader>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<PrecompiledHeader>
</PrecompiledHeader>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
</Link>
</ItemDefinitionGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
</Project>
</Project>

View File

@@ -34,6 +34,8 @@
#include "instrument.h"
#include <stdio.h>
#include <assert.h>
#include <iomanip>
#include <sstream>
#include <string>
#include <map>
@@ -46,7 +48,7 @@ struct CallInfo {
static std::map<std::string, CallInfo> callInfo;
int countbits(int i) {
int countbits(uint64_t i) {
int ret = 0;
while (i) {
if (i & 0x1)
@@ -60,14 +62,13 @@ int countbits(int i) {
// Callback function that ispc compiler emits calls to when --instrument
// command-line flag is given while compiling.
void
ISPCInstrument(const char *fn, const char *note, int line, int mask) {
char sline[16];
sprintf(sline, "%04d", line);
std::string s = std::string(fn) + std::string("(") + std::string(sline) +
std::string(") - ") + std::string(note);
ISPCInstrument(const char *fn, const char *note, int line, uint64_t mask) {
std::stringstream s;
s << fn << "(" << std::setfill('0') << std::setw(4) << line << ") - "
<< note;
// Find or create a CallInfo instance for this callsite.
CallInfo &ci = callInfo[s];
CallInfo &ci = callInfo[s.str()];
// And update its statistics...
++ci.count;

View File

@@ -28,7 +28,7 @@
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef INSTRUMENT_H
@@ -36,8 +36,8 @@
#include <stdint.h>
extern "C" {
void ISPCInstrument(const char *fn, const char *note, int line, int mask);
extern "C" {
void ISPCInstrument(const char *fn, const char *note, int line, uint64_t mask);
}
void ISPCPrintInstrument();

View File

@@ -1,20 +1,72 @@
TASK_CXX=../tasksys.cpp
TASK_LIB=-lpthread
TASK_OBJ=tasksys.o
TASK_OBJ=objs/tasksys.o
CXX=clang++
CXXFLAGS+=-Iobjs/ -O2
CC=clang
CCFLAGS+=-Iobjs/ -O2
CXX=g++
CXXFLAGS=-Iobjs/ -O2 -m64
LIBS=-lm $(TASK_LIB) -lstdc++
ISPC=ispc -O2 --arch=x86-64 $(ISPC_FLAGS)
ISPC_OBJS=$(addprefix objs/, $(ISPC_SRC:.ispc=)_ispc.o $(ISPC_SRC:.ispc=)_ispc_sse2.o \
$(ISPC_SRC:.ispc=)_ispc_sse4.o $(ISPC_SRC:.ispc=)_ispc_avx.o)
ISPC=ispc
ISPC_FLAGS+=-O2
ISPC_HEADER=objs/$(ISPC_SRC:.ispc=_ispc.h)
CPP_OBJS=$(addprefix objs/, $(CPP_SRC:.cpp=.o) $(TASK_OBJ))
ARCH:=$(shell uname -m | sed -e s/x86_64/x86/ -e s/i686/x86/ -e s/arm.*/arm/ -e s/sa110/arm/)
ifeq ($(ARCH),x86)
ISPC_OBJS=$(addprefix objs/, $(ISPC_SRC:.ispc=)_ispc.o)
COMMA=,
ifneq (,$(findstring $(COMMA),$(ISPC_IA_TARGETS)))
#$(info multi-target detected: $(ISPC_IA_TARGETS))
ifneq (,$(findstring sse2,$(ISPC_IA_TARGETS)))
ISPC_OBJS+=$(addprefix objs/, $(ISPC_SRC:.ispc=)_ispc_sse2.o)
endif
ifneq (,$(findstring sse4,$(ISPC_IA_TARGETS)))
ISPC_OBJS+=$(addprefix objs/, $(ISPC_SRC:.ispc=)_ispc_sse4.o)
endif
ifneq (,$(findstring avx1-,$(ISPC_IA_TARGETS)))
ISPC_OBJS+=$(addprefix objs/, $(ISPC_SRC:.ispc=)_ispc_avx.o)
endif
ifneq (,$(findstring avx1.1,$(ISPC_IA_TARGETS)))
ISPC_OBJS+=$(addprefix objs/, $(ISPC_SRC:.ispc=)_ispc_avx11.o)
endif
ifneq (,$(findstring avx2,$(ISPC_IA_TARGETS)))
ISPC_OBJS+=$(addprefix objs/, $(ISPC_SRC:.ispc=)_ispc_avx2.o)
endif
ifneq (,$(findstring avx512knl,$(ISPC_IA_TARGETS)))
ISPC_OBJS+=$(addprefix objs/, $(ISPC_SRC:.ispc=)_ispc_avx512knl.o)
endif
ifneq (,$(findstring avx512skx,$(ISPC_IA_TARGETS)))
ISPC_OBJS+=$(addprefix objs/, $(ISPC_SRC:.ispc=)_ispc_avx512skx.o)
endif
endif
ISPC_TARGETS=$(ISPC_IA_TARGETS)
ARCH_BIT:=$(shell getconf LONG_BIT)
ifeq ($(ARCH_BIT),32)
ISPC_FLAGS += --arch=x86
CXXFLAGS += -m32
CCFLAGS += -m32
else
ISPC_FLAGS += --arch=x86-64
CXXFLAGS += -m64
CCFLAGS += -m64
endif
else ifeq ($(ARCH),arm)
ISPC_OBJS=$(addprefix objs/, $(ISPC_SRC:.ispc=_ispc.o))
ISPC_TARGETS=$(ISPC_ARM_TARGETS)
else
$(error Unknown architecture $(ARCH) from uname -m)
endif
CPP_OBJS=$(addprefix objs/, $(CPP_SRC:.cpp=.o))
CC_OBJS=$(addprefix objs/, $(CC_SRC:.c=.o))
OBJS=$(CPP_OBJS) $(CC_OBJS) $(TASK_OBJ) $(ISPC_OBJS)
default: $(EXAMPLE)
all: $(EXAMPLE) $(EXAMPLE)-sse4 $(EXAMPLE)-generic16
all: $(EXAMPLE) $(EXAMPLE)-sse4 $(EXAMPLE)-generic16 $(EXAMPLE)-scalar
.PHONY: dirs clean
@@ -24,24 +76,27 @@ dirs:
objs/%.cpp objs/%.o objs/%.h: dirs
clean:
/bin/rm -rf objs *~ $(EXAMPLE) $(EXAMPLE)-sse4 $(EXAMPLE)-generic16
/bin/rm -rf objs *~ $(EXAMPLE) $(EXAMPLE)-sse4 $(EXAMPLE)-generic16 ref test
$(EXAMPLE): $(CPP_OBJS) $(ISPC_OBJS)
$(EXAMPLE): $(OBJS)
$(CXX) $(CXXFLAGS) -o $@ $^ $(LIBS)
objs/%.o: %.cpp dirs $(ISPC_HEADER)
$(CXX) $< $(CXXFLAGS) -c -o $@
objs/%.o: %.c dirs $(ISPC_HEADER)
$(CC) $< $(CCFLAGS) -c -o $@
objs/%.o: ../%.cpp dirs
$(CXX) $< $(CXXFLAGS) -c -o $@
objs/$(EXAMPLE).o: objs/$(EXAMPLE)_ispc.h
objs/$(EXAMPLE).o: objs/$(EXAMPLE)_ispc.h dirs
objs/%_ispc.h objs/%_ispc.o objs/%_ispc_sse2.o objs/%_ispc_sse4.o objs/%_ispc_avx.o: %.ispc
$(ISPC) --target=$(ISPC_TARGETS) $< -o objs/$*_ispc.o -h objs/$*_ispc.h
objs/%_ispc.h objs/%_ispc.o objs/%_ispc_sse2.o objs/%_ispc_sse4.o objs/%_ispc_avx.o objs/%_ispc_avx11.o objs/%_ispc_avx2.o objs/%_ispc_avx512knl.o objs/%_ispc_avx512skx.o : %.ispc dirs
$(ISPC) $(ISPC_FLAGS) --target=$(ISPC_TARGETS) $< -o objs/$*_ispc.o -h objs/$*_ispc.h
objs/$(ISPC_SRC:.ispc=)_sse4.cpp: $(ISPC_SRC)
$(ISPC) $< -o $@ --target=generic-4 --emit-c++ --c++-include-file=sse4.h
$(ISPC) $(ISPC_FLAGS) $< -o $@ --target=generic-4 --emit-c++ --c++-include-file=sse4.h
objs/$(ISPC_SRC:.ispc=)_sse4.o: objs/$(ISPC_SRC:.ispc=)_sse4.cpp
$(CXX) -I../intrinsics -msse4.2 $< $(CXXFLAGS) -c -o $@
@@ -50,10 +105,16 @@ $(EXAMPLE)-sse4: $(CPP_OBJS) objs/$(ISPC_SRC:.ispc=)_sse4.o
$(CXX) $(CXXFLAGS) -o $@ $^ $(LIBS)
objs/$(ISPC_SRC:.ispc=)_generic16.cpp: $(ISPC_SRC)
$(ISPC) $< -o $@ --target=generic-16 --emit-c++ --c++-include-file=generic-16.h
$(ISPC) $(ISPC_FLAGS) $< -o $@ --target=generic-16 --emit-c++ --c++-include-file=generic-16.h
objs/$(ISPC_SRC:.ispc=)_generic16.o: objs/$(ISPC_SRC:.ispc=)_generic16.cpp
$(CXX) -I../intrinsics $< $(CXXFLAGS) -c -o $@
$(EXAMPLE)-generic16: $(CPP_OBJS) objs/$(ISPC_SRC:.ispc=)_generic16.o
$(CXX) $(CXXFLAGS) -o $@ $^ $(LIBS)
objs/$(ISPC_SRC:.ispc=)_scalar.o: $(ISPC_SRC)
$(ISPC) $(ISPC_FLAGS) $< -o $@ --target=generic-1
$(EXAMPLE)-scalar: $(CPP_OBJS) objs/$(ISPC_SRC:.ispc=)_scalar.o
$(CXX) $(CXXFLAGS) -o $@ $^ $(LIBS)

176
examples/common.props Normal file
View File

@@ -0,0 +1,176 @@
<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup Label="ProjectConfigurations">
<ProjectConfiguration Include="Debug|Win32">
<Configuration>Debug</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Debug|x64">
<Configuration>Debug</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|Win32">
<Configuration>Release</Configuration>
<Platform>Win32</Platform>
</ProjectConfiguration>
<ProjectConfiguration Include="Release|x64">
<Configuration>Release</Configuration>
<Platform>x64</Platform>
</ProjectConfiguration>
</ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<CharacterSet>Unicode</CharacterSet>
<PlatformToolset>v110</PlatformToolset>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<CharacterSet>Unicode</CharacterSet>
<PlatformToolset>v110</PlatformToolset>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
<PlatformToolset>v110</PlatformToolset>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
<PlatformToolset>v110</PlatformToolset>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings">
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="PropertySheets">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="PropertySheets">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<PropertyGroup Label="UserMacros" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<LinkIncremental>true</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<LinkIncremental>true</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<LinkIncremental>false</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<LinkIncremental>false</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
</PropertyGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<ClCompile>
<PrecompiledHeader>
</PrecompiledHeader>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
<IntrinsicFunctions>true</IntrinsicFunctions>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<ClCompile>
<PrecompiledHeader>
</PrecompiledHeader>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
<IntrinsicFunctions>true</IntrinsicFunctions>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<PrecompiledHeader>
</PrecompiledHeader>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<PrecompiledHeader>
</PrecompiledHeader>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
</Link>
</ItemDefinitionGroup>
<PropertyGroup Label="User">
<ISPC_compiler Condition=" '$(ISPC_compiler)' == '' ">ispc</ISPC_compiler>
<Target_str Condition=" '$(Target_str)' == '' ">$(default_targets)</Target_str>
<Target_out>$(ISPC_file).obj</Target_out>
<Target_out Condition="($(Target_str.Contains(',')) And $(Target_str.Contains('sse2')))">$(Target_out);$(ISPC_file)_sse2.obj</Target_out>
<Target_out Condition="($(Target_str.Contains(',')) And $(Target_str.Contains('sse4')))">$(Target_out);$(ISPC_file)_sse4.obj</Target_out>
<Target_out Condition="($(Target_str.Contains(',')) And $(Target_str.Contains('avx1-')))">$(Target_out);$(ISPC_file)_avx.obj</Target_out>
<Target_out Condition="($(Target_str.Contains(',')) And $(Target_str.Contains('avx1.1')))">$(Target_out);$(ISPC_file)_avx11.obj</Target_out>
<Target_out Condition="($(Target_str.Contains(',')) And $(Target_str.Contains('avx2')))">$(Target_out);$(ISPC_file)_avx2.obj</Target_out>
</PropertyGroup>
<ItemGroup>
<CustomBuild Include='$(ISPC_file).ispc'>
<FileType>Document</FileType>
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">$(ISPC_compiler) -O0 %(Filename).ispc -o %(Filename).obj -h %(Filename)_ispc.h --arch=x86 --target=$(Target_str) -g $(flags)</Command>
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">$(ISPC_compiler) -O0 %(Filename).ispc -o %(Filename).obj -h %(Filename)_ispc.h --target=$(Target_str) -g $(flags)</Command>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">$(Target_out)</Outputs>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">$(Target_out)</Outputs>
<Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">$(ISPC_compiler) -O2 %(Filename).ispc -o %(Filename).obj -h %(Filename)_ispc.h --arch=x86 --target=$(Target_str) $(flags)</Command>
<Command Condition="'$(Configuration)|$(Platform)'=='Release|x64'">$(ISPC_compiler) -O2 %(Filename).ispc -o %(Filename).obj -h %(Filename)_ispc.h --target=$(Target_str) $(flags)</Command>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">$(Target_out)</Outputs>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|x64'">$(Target_out)</Outputs>
</CustomBuild>
</ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
</Project>

View File

@@ -2,7 +2,8 @@
EXAMPLE=deferred_shading
CPP_SRC=common.cpp main.cpp dynamic_c.cpp dynamic_cilk.cpp
ISPC_SRC=kernels.ispc
ISPC_TARGETS=sse2,sse4-x2,avx-x2
ISPC_IA_TARGETS=sse2-i32x4,sse4-i32x8,avx1-i32x16,avx2-i32x16,avx512knl-i32x16,avx512skx-i32x16
ISPC_ARM_TARGETS=neon
ISPC_FLAGS=--opt=fast-math
include ../common.mk

View File

@@ -204,6 +204,7 @@ void WriteFrame(const char *filename, const InputData *input,
fprintf(out, "P6 %d %d 255\n", input->header.framebufferWidth,
input->header.framebufferHeight);
fwrite(framebufferAOS, imageBytes, 1, out);
fclose(out);
lAlignedFree(framebufferAOS);
}

View File

@@ -1,4 +1,4 @@
<?xml version="1.0" encoding="utf-8"?>
<?xml version="1.0" encoding="utf-8"?>
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<ItemGroup Label="ProjectConfigurations">
<ProjectConfiguration Include="Debug|Win32">
@@ -21,133 +21,11 @@
<PropertyGroup Label="Globals">
<ProjectGuid>{87f53c53-957e-4e91-878a-bc27828fb9eb}</ProjectGuid>
<Keyword>Win32Proj</Keyword>
<RootNamespace>mandelbrot</RootNamespace>
<RootNamespace>deferred</RootNamespace>
<ISPC_file>kernels</ISPC_file>
<default_targets>sse2,sse4-x2,avx1-x2</default_targets>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>Unicode</CharacterSet>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ImportGroup Label="ExtensionSettings">
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="PropertySheets">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="PropertySheets">
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
</ImportGroup>
<PropertyGroup Label="UserMacros" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<LinkIncremental>true</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<LinkIncremental>true</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<LinkIncremental>false</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<LinkIncremental>false</LinkIncremental>
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
</PropertyGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
<ClCompile>
<PrecompiledHeader>
</PrecompiledHeader>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
<IntrinsicFunctions>true</IntrinsicFunctions>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
<ClCompile>
<PrecompiledHeader>
</PrecompiledHeader>
<WarningLevel>Level3</WarningLevel>
<Optimization>Disabled</Optimization>
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
<IntrinsicFunctions>true</IntrinsicFunctions>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<PrecompiledHeader>
</PrecompiledHeader>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
</Link>
</ItemDefinitionGroup>
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
<ClCompile>
<WarningLevel>Level3</WarningLevel>
<PrecompiledHeader>
</PrecompiledHeader>
<Optimization>MaxSpeed</Optimization>
<FunctionLevelLinking>true</FunctionLevelLinking>
<IntrinsicFunctions>true</IntrinsicFunctions>
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
<FloatingPointModel>Fast</FloatingPointModel>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
<GenerateDebugInformation>true</GenerateDebugInformation>
<EnableCOMDATFolding>true</EnableCOMDATFolding>
<OptimizeReferences>true</OptimizeReferences>
</Link>
</ItemDefinitionGroup>
<Import Project="..\common.props" />
<ItemGroup>
<ClCompile Include="common.cpp" />
<ClCompile Include="dynamic_c.cpp" />
@@ -155,24 +33,4 @@
<ClCompile Include="main.cpp" />
<ClCompile Include="../tasksys.cpp" />
</ItemGroup>
<ItemGroup>
<CustomBuild Include="kernels.ispc">
<FileType>Document</FileType>
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --arch=x86 --target=sse2,sse4-x2,avx-x2
</Command>
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --target=sse2,sse4-x2,avx-x2
</Command>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
<Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --arch=x86 --target=sse2,sse4-x2,avx-x2
</Command>
<Command Condition="'$(Configuration)|$(Platform)'=='Release|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --target=sse2,sse4-x2,avx-x2
</Command>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|x64'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
</CustomBuild>
</ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
<ImportGroup Label="ExtensionTargets">
</ImportGroup>
</Project>

View File

@@ -35,35 +35,35 @@
struct InputDataArrays
{
uniform float * uniform zBuffer;
uniform unsigned int16 * uniform normalEncoded_x; // half float
uniform unsigned int16 * uniform normalEncoded_y; // half float
uniform unsigned int16 * uniform specularAmount; // half float
uniform unsigned int16 * uniform specularPower; // half float
uniform unsigned int8 * uniform albedo_x; // unorm8
uniform unsigned int8 * uniform albedo_y; // unorm8
uniform unsigned int8 * uniform albedo_z; // unorm8
uniform float * uniform lightPositionView_x;
uniform float * uniform lightPositionView_y;
uniform float * uniform lightPositionView_z;
uniform float * uniform lightAttenuationBegin;
uniform float * uniform lightColor_x;
uniform float * uniform lightColor_y;
uniform float * uniform lightColor_z;
uniform float * uniform lightAttenuationEnd;
float *zBuffer;
unsigned int16 *normalEncoded_x; // half float
unsigned int16 *normalEncoded_y; // half float
unsigned int16 *specularAmount; // half float
unsigned int16 *specularPower; // half float
unsigned int8 *albedo_x; // unorm8
unsigned int8 *albedo_y; // unorm8
unsigned int8 *albedo_z; // unorm8
float *lightPositionView_x;
float *lightPositionView_y;
float *lightPositionView_z;
float *lightAttenuationBegin;
float *lightColor_x;
float *lightColor_y;
float *lightColor_z;
float *lightAttenuationEnd;
};
struct InputHeader
{
uniform float cameraProj[4][4];
uniform float cameraNear;
uniform float cameraFar;
float cameraProj[4][4];
float cameraNear;
float cameraFar;
uniform int32 framebufferWidth;
uniform int32 framebufferHeight;
uniform int32 numLights;
uniform int32 inputDataChunkSize;
uniform int32 inputDataArrayOffsets[idaNum];
int32 framebufferWidth;
int32 framebufferHeight;
int32 numLights;
int32 inputDataChunkSize;
int32 inputDataArrayOffsets[idaNum];
};
@@ -158,38 +158,22 @@ IntersectLightsWithTileMinMax(
uniform float gBufferScale_x = 0.5f * (float)gBufferWidth;
uniform float gBufferScale_y = 0.5f * (float)gBufferHeight;
// Parallize across frustum planes.
// We really only have four side planes here, but write the code to
// handle programCount > 4 robustly
uniform float frustumPlanes_xy[programCount];
uniform float frustumPlanes_z[programCount];
uniform float frustumPlanes_xy[4] = {
-(cameraProj_11 * gBufferScale_x),
(cameraProj_11 * gBufferScale_x),
(cameraProj_22 * gBufferScale_y),
-(cameraProj_22 * gBufferScale_y) };
uniform float frustumPlanes_z[4] = {
tileEndX - gBufferScale_x,
-tileStartX + gBufferScale_x,
tileEndY - gBufferScale_y,
-tileStartY + gBufferScale_y };
// TODO: If programIndex < 4 here? Don't care about masking off the
// rest but if interleaving ("x2" modes) the other lanes should ideally
// not be emitted...
{
// This one is totally constant over the whole screen... worth pulling it up at all?
float frustumPlanes_xy_v;
frustumPlanes_xy_v = insert(frustumPlanes_xy_v, 0, -(cameraProj_11 * gBufferScale_x));
frustumPlanes_xy_v = insert(frustumPlanes_xy_v, 1, (cameraProj_11 * gBufferScale_x));
frustumPlanes_xy_v = insert(frustumPlanes_xy_v, 2, (cameraProj_22 * gBufferScale_y));
frustumPlanes_xy_v = insert(frustumPlanes_xy_v, 3, -(cameraProj_22 * gBufferScale_y));
float frustumPlanes_z_v;
frustumPlanes_z_v = insert(frustumPlanes_z_v, 0, tileEndX - gBufferScale_x);
frustumPlanes_z_v = insert(frustumPlanes_z_v, 1, -tileStartX + gBufferScale_x);
frustumPlanes_z_v = insert(frustumPlanes_z_v, 2, tileEndY - gBufferScale_y);
frustumPlanes_z_v = insert(frustumPlanes_z_v, 3, -tileStartY + gBufferScale_y);
// Normalize
float norm = rsqrt(frustumPlanes_xy_v * frustumPlanes_xy_v +
frustumPlanes_z_v * frustumPlanes_z_v);
frustumPlanes_xy_v *= norm;
frustumPlanes_z_v *= norm;
// Save out for uniform use later
frustumPlanes_xy[programIndex] = frustumPlanes_xy_v;
frustumPlanes_z[programIndex] = frustumPlanes_z_v;
for (uniform int i = 0; i < 4; ++i) {
uniform float norm = rsqrt(frustumPlanes_xy[i] * frustumPlanes_xy[i] +
frustumPlanes_z[i] * frustumPlanes_z[i]);
frustumPlanes_xy[i] *= norm;
frustumPlanes_z[i] *= norm;
}
uniform int32 tileNumLights = 0;
@@ -343,8 +327,8 @@ ShadeTile(
// Reconstruct normal from G-buffer
float surface_normal_x, surface_normal_y, surface_normal_z;
float normal_x = half_to_float_fast(inputData.normalEncoded_x[gBufferOffset]);
float normal_y = half_to_float_fast(inputData.normalEncoded_y[gBufferOffset]);
float normal_x = half_to_float(inputData.normalEncoded_x[gBufferOffset]);
float normal_y = half_to_float(inputData.normalEncoded_y[gBufferOffset]);
float f = (normal_x - normal_x * normal_x) + (normal_y - normal_y * normal_y);
float m = sqrt(4.0f * f - 1.0f);
@@ -355,9 +339,9 @@ ShadeTile(
// Load other G-buffer parameters
float surface_specularAmount =
half_to_float_fast(inputData.specularAmount[gBufferOffset]);
half_to_float(inputData.specularAmount[gBufferOffset]);
float surface_specularPower =
half_to_float_fast(inputData.specularPower[gBufferOffset]);
half_to_float(inputData.specularPower[gBufferOffset]);
float surface_albedo_x = Unorm8ToFloat32(inputData.albedo_x[gBufferOffset]);
float surface_albedo_y = Unorm8ToFloat32(inputData.albedo_y[gBufferOffset]);
float surface_albedo_z = Unorm8ToFloat32(inputData.albedo_z[gBufferOffset]);
@@ -530,9 +514,9 @@ RenderStatic(uniform InputHeader &inputHeader,
// Launch a task to render each tile, each of which is MIN_TILE_WIDTH
// by MIN_TILE_HEIGHT pixels.
launch[num_groups] < RenderTile(num_groups_x, num_groups_y,
inputHeader, inputData, visualizeLightCount,
framebuffer_r, framebuffer_g, framebuffer_b) >;
launch[num_groups] RenderTile(num_groups_x, num_groups_y,
inputHeader, inputData, visualizeLightCount,
framebuffer_r, framebuffer_g, framebuffer_b);
}
@@ -591,8 +575,6 @@ SplitTileMinMax(
uniform float light_positionView_z_array[],
uniform float light_attenuationEnd_array[],
// Outputs
// TODO: ISPC doesn't currently like multidimensionsal arrays so we'll do the
// indexing math ourselves
uniform int32 subtileIndices[],
uniform int32 subtileIndicesPitch,
uniform int32 subtileNumLights[]
@@ -601,30 +583,20 @@ SplitTileMinMax(
uniform float gBufferScale_x = 0.5f * (float)gBufferWidth;
uniform float gBufferScale_y = 0.5f * (float)gBufferHeight;
// Parallize across frustum planes
// Only have 2 frustum split planes here so may not be worth it, but
// we'll do it for now for consistency
uniform float frustumPlanes_xy[programCount];
uniform float frustumPlanes_z[programCount];
// This one is totally constant over the whole screen... worth pulling it up at all?
float frustumPlanes_xy_v;
frustumPlanes_xy_v = insert(frustumPlanes_xy_v, 0, -(cameraProj_11 * gBufferScale_x));
frustumPlanes_xy_v = insert(frustumPlanes_xy_v, 1, (cameraProj_22 * gBufferScale_y));
float frustumPlanes_z_v;
frustumPlanes_z_v = insert(frustumPlanes_z_v, 0, tileMidX - gBufferScale_x);
frustumPlanes_z_v = insert(frustumPlanes_z_v, 1, tileMidY - gBufferScale_y);
uniform float frustumPlanes_xy[2] = { -(cameraProj_11 * gBufferScale_x),
(cameraProj_22 * gBufferScale_y) };
uniform float frustumPlanes_z[2] = { tileMidX - gBufferScale_x,
tileMidY - gBufferScale_y };
// Normalize
float norm = rsqrt(frustumPlanes_xy_v * frustumPlanes_xy_v +
frustumPlanes_z_v * frustumPlanes_z_v);
frustumPlanes_xy_v *= norm;
frustumPlanes_z_v *= norm;
// Save out for uniform use later
frustumPlanes_xy[programIndex] = frustumPlanes_xy_v;
frustumPlanes_z[programIndex] = frustumPlanes_z_v;
uniform float norm[2] = { rsqrt(frustumPlanes_xy[0] * frustumPlanes_xy[0] +
frustumPlanes_z[0] * frustumPlanes_z[0]),
rsqrt(frustumPlanes_xy[1] * frustumPlanes_xy[1] +
frustumPlanes_z[1] * frustumPlanes_z[1]) };
frustumPlanes_xy[0] *= norm[0];
frustumPlanes_xy[1] *= norm[1];
frustumPlanes_z[0] *= norm[0];
frustumPlanes_z[1] *= norm[1];
// Initialize
uniform int32 subtileLightOffset[4];

View File

@@ -62,10 +62,16 @@
///////////////////////////////////////////////////////////////////////////
int main(int argc, char** argv) {
if (argc != 2) {
printf("usage: deferred_shading <input_file (e.g. data/pp1280x720.bin)>\n");
if (argc < 2) {
printf("usage: deferred_shading <input_file (e.g. data/pp1280x720.bin)> [tasks iterations] [serial iterations]\n");
return 1;
}
static unsigned int test_iterations[] = {5, 3, 500}; //last value is for nframes, it is scale.
if (argc == 5) {
for (int i = 0; i < 3; i++) {
test_iterations[i] = atoi(argv[2 + i]);
}
}
InputData *input = CreateInputDataFromFile(argv[1]);
if (!input) {
@@ -81,16 +87,17 @@ int main(int argc, char** argv) {
InitDynamicCilk(input);
#endif // __cilk
int nframes = 5;
int nframes = test_iterations[2];
double ispcCycles = 1e30;
for (int i = 0; i < 5; ++i) {
for (unsigned int i = 0; i < test_iterations[0]; ++i) {
framebuffer.clear();
reset_and_start_timer();
for (int j = 0; j < nframes; ++j)
ispc::RenderStatic(&input->header, &input->arrays,
ispc::RenderStatic(input->header, input->arrays,
VISUALIZE_LIGHT_COUNT,
framebuffer.r, framebuffer.g, framebuffer.b);
double mcycles = get_elapsed_mcycles() / nframes;
printf("@time of ISPC + TASKS run:\t\t\t[%.3f] million cycles\n", mcycles);
ispcCycles = std::min(ispcCycles, mcycles);
}
printf("[ispc static + tasks]:\t\t[%.3f] million cycles to render "
@@ -98,14 +105,16 @@ int main(int argc, char** argv) {
input->header.framebufferWidth, input->header.framebufferHeight);
WriteFrame("deferred-ispc-static.ppm", input, framebuffer);
nframes = 3;
#ifdef __cilk
double dynamicCilkCycles = 1e30;
for (int i = 0; i < 5; ++i) {
for (int i = 0; i < test_iterations[1]; ++i) {
framebuffer.clear();
reset_and_start_timer();
for (int j = 0; j < nframes; ++j)
DispatchDynamicCilk(input, &framebuffer);
double mcycles = get_elapsed_mcycles() / nframes;
printf("@time of serial run:\t\t\t[%.3f] million cycles\n", mcycles);
dynamicCilkCycles = std::min(dynamicCilkCycles, mcycles);
}
printf("[ispc + Cilk dynamic]:\t\t[%.3f] million cycles to render image\n",
@@ -114,12 +123,13 @@ int main(int argc, char** argv) {
#endif // __cilk
double serialCycles = 1e30;
for (int i = 0; i < 5; ++i) {
for (unsigned int i = 0; i < test_iterations[1]; ++i) {
framebuffer.clear();
reset_and_start_timer();
for (int j = 0; j < nframes; ++j)
DispatchDynamicC(input, &framebuffer);
double mcycles = get_elapsed_mcycles() / nframes;
printf("@time of serial run:\t\t\t[%.3f] million cycles\n", mcycles);
serialCycles = std::min(serialCycles, mcycles);
}
printf("[C++ serial dynamic, 1 core]:\t[%.3f] million cycles to render image\n",
@@ -130,7 +140,7 @@ int main(int argc, char** argv) {
printf("\t\t\t\t(%.2fx speedup from static ISPC, %.2fx from Cilk+ISPC)\n",
serialCycles/ispcCycles, serialCycles/dynamicCilkCycles);
#else
printf("\t\t\t\t(%.2fx speedup from ISPC)\n", serialCycles/ispcCycles);
printf("\t\t\t\t(%.2fx speedup from ISPC + tasks)\n", serialCycles/ispcCycles);
#endif // __cilk
DeleteInputData(input);

View File

@@ -1,6 +1,6 @@

Microsoft Visual Studio Solution File, Format Version 11.00
# Visual Studio 2010
Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio 2012
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "simple", "simple\simple.vcxproj", "{947C5311-8B78-4D05-BEE4-BCF342D4B367}"
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "rt", "rt\rt.vcxproj", "{E787BC3F-2D2E-425E-A64D-4721E2FF3DC9}"
@@ -23,6 +23,10 @@ Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "stencil", "stencil\stencil.
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "deferred_shading", "deferred\deferred_shading.vcxproj", "{87F53C53-957E-4E91-878A-BC27828FB9EB}"
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "perfbench", "perfbench\perfbench.vcxproj", "{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}"
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "sort", "sort\sort.vcxproj", "{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C2}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Win32 = Debug|Win32
@@ -119,6 +123,22 @@ Global
{87F53C53-957E-4E91-878A-BC27828FB9EB}.Release|Win32.Build.0 = Release|Win32
{87F53C53-957E-4E91-878A-BC27828FB9EB}.Release|x64.ActiveCfg = Release|x64
{87F53C53-957E-4E91-878A-BC27828FB9EB}.Release|x64.Build.0 = Release|x64
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Debug|Win32.ActiveCfg = Debug|Win32
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Debug|Win32.Build.0 = Debug|Win32
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Debug|x64.ActiveCfg = Debug|x64
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Debug|x64.Build.0 = Debug|x64
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Release|Win32.ActiveCfg = Release|Win32
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Release|Win32.Build.0 = Release|Win32
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Release|x64.ActiveCfg = Release|x64
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Release|x64.Build.0 = Release|x64
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C2}.Debug|Win32.ActiveCfg = Debug|Win32
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C2}.Debug|Win32.Build.0 = Debug|Win32
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C2}.Debug|x64.ActiveCfg = Debug|x64
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C2}.Debug|x64.Build.0 = Debug|x64
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C2}.Release|Win32.ActiveCfg = Release|Win32
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C2}.Release|Win32.Build.0 = Release|Win32
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C2}.Release|x64.ActiveCfg = Release|x64
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C2}.Release|x64.Build.0 = Release|x64
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE

9
examples/gmres/Makefile Normal file
View File

@@ -0,0 +1,9 @@
EXAMPLE=gmres
CPP_SRC=algorithm.cpp main.cpp matrix.cpp
CC_SRC=mmio.c
ISPC_SRC=matrix.ispc
ISPC_IA_TARGETS=sse2-i32x4,sse4-i32x8,avx1-i32x16,avx2-i32x16,avx512knl-i32x16,avx512skx-i32x16
ISPC_ARM_TARGETS=neon
include ../common.mk

View File

@@ -0,0 +1,231 @@
/*
Copyright (c) 2012, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
/*===========================================================================*\
|* Includes
\*===========================================================================*/
#include "algorithm.h"
#include "stdio.h"
#include "debug.h"
/*===========================================================================*\
|* GMRES
\*===========================================================================*/
/* upper_triangular_right_solve:
* ----------------------------
* Given upper triangular matrix R and rhs vector b, solve for
* x. This "solve" ignores the rows, columns of R that are greater than the
* dimensions of x.
*/
void upper_triangular_right_solve (const DenseMatrix &R, const Vector &b, Vector &x)
{
// Dimensionality check
ASSERT(R.rows() >= b.size());
ASSERT(R.cols() >= x.size());
ASSERT(b.size() >= x.size());
int max_row = x.size() - 1;
// first solve step:
x[max_row] = b[max_row] / R(max_row, max_row);
for (int row = max_row - 1; row >= 0; row--) {
double xi = b[row];
for (int col = max_row; col > row; col--)
xi -= x[col] * R(row, col);
x[row] = xi / R(row, row);
}
}
/* create_rotation (used in gmres):
* -------------------------------
* Construct a Givens rotation to zero out the lowest non-zero entry in a partially
* factored Hessenburg matrix. Note that the previous Givens rotations should be
* applied to this column before creating a new rotation.
*/
void create_rotation (const DenseMatrix &H, size_t col, Vector &Cn, Vector &Sn)
{
double a = H(col, col);
double b = H(col + 1, col);
double r;
if (b == 0) {
Cn[col] = copysign(1, a);
Sn[col] = 0;
}
else if (a == 0) {
Cn[col] = 0;
Sn[col] = copysign(1, b);
}
else {
r = sqrt(a*a + b*b);
Sn[col] = -b / r;
Cn[col] = a / r;
}
}
/* Applies the 'col'th Givens rotation stored in vectors Sn and Cn to the 'col'th
* column of the DenseMatrix M. (Previous columns don't need the rotation applied b/c
* presumeably, the first col-1 columns are already upper triangular, and so their
* entries in the col and col+1 rows are 0.)
*/
void apply_rotation (DenseMatrix &H, size_t col, Vector &Cn, Vector &Sn)
{
double c = Cn[col];
double s = Sn[col];
double tmp = c * H(col, col) - s * H(col+1, col);
H(col+1, col) = s * H(col, col) + c * H(col+1, col);
H(col, col) = tmp;
}
/* Applies the 'col'th Givens rotation to the vector.
*/
void apply_rotation (Vector &v, size_t col, Vector &Cn, Vector &Sn)
{
double a = v[col];
double b = v[col + 1];
double c = Cn[col];
double s = Sn[col];
v[col] = c * a - s * b;
v[col + 1] = s * a + c * b;
}
/* Applies the first 'col' Givens rotations to the newly-created column
* of H. (Leaves other columns alone.)
*/
void update_column (DenseMatrix &H, size_t col, Vector &Cn, Vector &Sn)
{
for (int i = 0; i < col; i++) {
double c = Cn[i];
double s = Sn[i];
double t = c * H(i,col) - s * H(i+1,col);
H(i+1, col) = s * H(i,col) + c * H(i+1,col);
H(i, col) = t;
}
}
/* After a new column has been added to the hessenburg matrix, factor it back into
* an upper-triangular matrix by:
* - applying the previous Givens rotations to the new column
* - computing the new Givens rotation to make the column upper triangluar
* - applying the new Givens rotation to the column, and
* - applying the new Givens rotation to the solution vector
*/
void update_qr_decomp (DenseMatrix &H, Vector &s, size_t col, Vector &Cn, Vector &Sn)
{
update_column( H, col, Cn, Sn);
create_rotation(H, col, Cn, Sn);
apply_rotation( H, col, Cn, Sn);
apply_rotation( s, col, Cn, Sn);
}
void gmres (const Matrix &A, const Vector &b, Vector &x, int num_iters, double max_err)
{
DEBUG_PRINT("gmres starting!\n");
x.zero();
ASSERT(A.rows() == A.cols());
DenseMatrix Qstar(num_iters + 1, A.rows());
DenseMatrix H(num_iters + 1, num_iters);
// arrays for storing parameters of givens rotations
Vector Sn(num_iters);
Vector Cn(num_iters);
// array for storing the rhs projected onto the hessenburg's column space
Vector G(num_iters+1);
G.zero();
double beta = b.norm();
G[0] = beta;
// temp vector, stores Aqi
Vector w(A.rows());
w.copy(b);
w.normalize();
Qstar.set_row(0, w);
int iter = 0;
Vector temp(A.rows(), false);
double rel_err;
while (iter < num_iters)
{
// w = Aqi
Qstar.row(iter, temp);
A.multiply(temp, w);
// construct ith column of H, i+1th row of Qstar:
for (int row = 0; row <= iter; row++) {
Qstar.row(row, temp);
H(row, iter) = temp.dot(w);
w.add_ax(-H(row, iter), temp);
}
H(iter+1, iter) = w.norm();
w.divide(H(iter+1, iter));
Qstar.set_row(iter+1, w);
update_qr_decomp (H, G, iter, Cn, Sn);
rel_err = fabs(G[iter+1] / beta);
if (rel_err < max_err)
break;
if (iter % 100 == 0)
DEBUG_PRINT("Iter %d: %f err\n", iter, rel_err);
iter++;
}
if (iter == num_iters) {
fprintf(stderr, "Error: gmres failed to converge in %d iterations (relative err: %f)\n", num_iters, rel_err);
exit(-1);
}
// We've reached an acceptable solution (?):
DEBUG_PRINT("gmres completed in %d iterations (rel. resid. %f, max %f)\n", num_iters, rel_err, max_err);
Vector y(iter+1);
upper_triangular_right_solve(H, G, y);
for (int i = 0; i < iter + 1; i++) {
Qstar.row(i, temp);
x.add_ax(y[i], temp);
}
}

View File

@@ -0,0 +1,50 @@
/*
Copyright (c) 2012, Intel Corporation
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#ifndef __ALGORITHM_H__
#define __ALGORITHM_H__
#include "matrix.h"
/* Generalized Minimal Residual Method:
* -----------------------------------
* Takes a square matrix and an rhs and uses GMRES to find an estimate for x.
* The specified error is relative.
*/
void gmres (const Matrix &A, const Vector &b, Vector &x, int num_iters, double err);
#endif

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More