If a flag along the lines of "--target=sse4,avx-x2" is provided on the command-line,
then the program will be compiled for each of the given targets, with a separate
output file generated for each one. Further, an output file with dispatch functions
that check the current system's CPU and then chooses the best available variant
is also created.
Issue #11.
Specifically, launch all of the tasks in one statement, rather than
still looping over spans in y and launching a collection of tasks
across x for each span. This seems to give a few percent better
performance.
Don't print more than 3 lines of source file context with errors.
(Any more than that is almost certainly not the Right Thing to do.)
Make some parsing error messages more clear.
Within each function that launches tasks, we now can easily track which
tasks that function launched, so that the sync at the end of the function
can just sync on the tasks launched by that function (not all tasks
launched by all functions.)
Implementing this led to a rework of the task system API that ispc generates
code to call; the example task systems in examples/tasksys.cpp have been
updated to conform to this API. (The updated API is also documented in
the ispc user's guide.)
As part of this, "launch[n]" syntax was added to launch a number of tasks
in a single launch statement, rather than requiring a loop over 'n' to
launch n tasks.
This commit thus fixes issue #84 (enhancement to launch multiple tasks from
a single launch statement) as well as issue #105 (recursive task launches
were broken).
The intent is that the code in stdlib.ispc that is calling out to the built-ins
should match argument types exactly (using explicit casts as needed), just
for maximal clarity/safety.
- Don't suggest matches when given an empty string or a single, non-alpha
character.
- Also fixed the parser to be a bit less confusing when it encounters an
unexpected EOF.
Go back to running both sides of 'if' statements with masking and without
branching if we can determine that the code is relatively simple (as per
the simple cost model), and is safe to run even if the mask is 'all off'.
This gives a bit of a performance improvement for some of the examples
(most notably, the ray tracer), and is the code that one wants generated
in this case anyhow.
This is currently only used to decide whether it's worth doing an
"are all lanes running" check at the start of functions--for small
functions, it's not worth the overhead.
The cost is estimated relatively early in compilation (e.g. before
we know if an array access is a scatter/gather or not, before
constant folding, etc.), so there are many known shortcomings.
For the case where we have a regular (i.e. non-'cif') 'if' statement,
the generated code just simply checks to see if any program instance
is running before running the corresponding statements. This is a
lighter-weight check than IfStmt::emitMaskMixed() was performing.