added cuda examples
This commit is contained in:
167
examples_cuda/README.txt
Normal file
167
examples_cuda/README.txt
Normal file
@@ -0,0 +1,167 @@
|
|||||||
|
====================
|
||||||
|
ISPC Examples README
|
||||||
|
====================
|
||||||
|
|
||||||
|
This directory has a number of sample ispc programs. Before building them
|
||||||
|
(on an system), install the appropriate ispc compiler binary into a
|
||||||
|
directory in your path. Then, if you're running Windows, open the
|
||||||
|
"examples.sln" file and built from there. For building under Linux/OSX,
|
||||||
|
there are makefiles in each directory that build the examples individually.
|
||||||
|
|
||||||
|
Almost all of them benchmark ispc implementations of the given computation
|
||||||
|
against regular serial C++ implementations, printing out a comparison of
|
||||||
|
the runtimes and the speedup delivered by ispc. It may be instructive to
|
||||||
|
do a side-by-side diff of the C++ and ispc implementations of these
|
||||||
|
algorithms to learn more about wirting ispc code.
|
||||||
|
|
||||||
|
|
||||||
|
AOBench
|
||||||
|
=======
|
||||||
|
|
||||||
|
This is an ISPC implementation of the "AO bench" benchmark
|
||||||
|
(http://syoyo.wordpress.com/2009/01/26/ao-bench-is-evolving/). The command
|
||||||
|
line arguments are:
|
||||||
|
|
||||||
|
ao (num iterations) (x res) (yres)
|
||||||
|
|
||||||
|
It executes the program for the given number of iterations, rendering an
|
||||||
|
(xres x yres) image each time and measuring the computation time with both
|
||||||
|
serial and ispc implementations.
|
||||||
|
|
||||||
|
|
||||||
|
AOBench_Instrumented
|
||||||
|
====================
|
||||||
|
|
||||||
|
This version of AO Bench is compiled with the --instrument ispc compiler
|
||||||
|
flag. This causes the compiler to emit calls to a (user-supplied)
|
||||||
|
ISPCInstrument() function at interesting places in the compiled code. An
|
||||||
|
example implementation of this function that counts the number of times the
|
||||||
|
callback is made and records some statistics about control flow coherence
|
||||||
|
is provided in the instrument.cpp file.
|
||||||
|
|
||||||
|
|
||||||
|
Deferred
|
||||||
|
========
|
||||||
|
|
||||||
|
This example shows an extensive example of using ispc for efficient
|
||||||
|
deferred shading of scenes with thousands of lights; it's an implementation
|
||||||
|
of the algorithm that Johan Andersson described at SIGGRAPH 2009,
|
||||||
|
implemented by Andrew Lauritzen and Jefferson Montgomery. The basic idea
|
||||||
|
is that a pre-rendered G-buffer is partitioned into tiles, and in each
|
||||||
|
tile, the set of lights that contribute to the tile is first computed.
|
||||||
|
Then, the pixels in the tile are then shaded using just those light
|
||||||
|
sources. (See slides 19-29 of
|
||||||
|
http://s09.idav.ucdavis.edu/talks/04-JAndersson-ParallelFrostbite-Siggraph09.pdf
|
||||||
|
for more details on the algorithm.)
|
||||||
|
|
||||||
|
This directory includes three implementations of the algorithm:
|
||||||
|
|
||||||
|
- An ispc implementation that first does a static partitioning of the
|
||||||
|
screen into tiles to parallelize across the CPU cores. Within each tile
|
||||||
|
ispc kernels provide highly efficient implementations of the light
|
||||||
|
culling and shading calculations.
|
||||||
|
- A "best practices" serial C++ implementation. This implementation does a
|
||||||
|
dynamic partitioning of the screen, refining tiles with significant Z
|
||||||
|
depth complexity (these tiles often have a large number of lights that
|
||||||
|
affect them). Within each final tile, the pixels are shaded using
|
||||||
|
regular C++ code.
|
||||||
|
- If the Cilk extensions are available in your compiler, an ispc
|
||||||
|
implementation that uses Cilk will also be built.
|
||||||
|
(See http://software.intel.com/en-us/articles/intel-cilk-plus/). Like
|
||||||
|
the "best practices" serial implementation, this version does dynamic
|
||||||
|
tile partitioning for better load balancing and then uses ispc for the
|
||||||
|
light culling and shading.
|
||||||
|
|
||||||
|
|
||||||
|
GMRES
|
||||||
|
=====
|
||||||
|
|
||||||
|
An implementation of the generalized minimal residual method for solving
|
||||||
|
sparse matrix equations.
|
||||||
|
(http://en.wikipedia.org/wiki/Generalized_minimal_residual_method)
|
||||||
|
|
||||||
|
|
||||||
|
Mandelbrot
|
||||||
|
==========
|
||||||
|
|
||||||
|
Mandelbrot set generation. This example is extensively documented at the
|
||||||
|
http://ispc.github.com/example.html page.
|
||||||
|
|
||||||
|
|
||||||
|
Mandelbrot_tasks
|
||||||
|
================
|
||||||
|
|
||||||
|
Implementation of Mandelbrot set generation that also parallelizes across
|
||||||
|
cores using tasks. Under Windows, a simple task system built on
|
||||||
|
Microsoft's Concurrency Runtime is used (see tasks_concrt.cpp). On OSX, a
|
||||||
|
task system based on Grand Central Dispatch is used (tasks_gcd.cpp), and on
|
||||||
|
Linux, a pthreads-based task system is used (tasks_pthreads.cpp). When
|
||||||
|
using tasks with ispc, no task system is mandated; the user is free to plug
|
||||||
|
in any task system they want, for ease of interoperating with existing task
|
||||||
|
systems.
|
||||||
|
|
||||||
|
|
||||||
|
Noise
|
||||||
|
=====
|
||||||
|
|
||||||
|
This example has an implementation of Ken Perlin's procedural "noise"
|
||||||
|
function, as described in his 2002 "Improving Noise" SIGGRAPH paper.
|
||||||
|
|
||||||
|
|
||||||
|
Options
|
||||||
|
=======
|
||||||
|
|
||||||
|
This program implements both the Black-Scholes and Binomial options pricing
|
||||||
|
models in both ispc and regular serial C++ code.
|
||||||
|
|
||||||
|
|
||||||
|
Perfbench
|
||||||
|
=========
|
||||||
|
|
||||||
|
This runs a number of microbenchmarks to measure system performance and
|
||||||
|
code generation quality.
|
||||||
|
|
||||||
|
|
||||||
|
RT
|
||||||
|
==
|
||||||
|
|
||||||
|
This is a simple ray tracer; it reads in camera parameters and a bounding
|
||||||
|
volume hierarchy and renders the scene from the given viewpoint. The
|
||||||
|
command line arguments are:
|
||||||
|
|
||||||
|
rt <scene name base>
|
||||||
|
|
||||||
|
Where <scene base name> is one of "cornell", "teapot", or "sponza".
|
||||||
|
|
||||||
|
The implementation originally derives from the bounding volume hierarchy
|
||||||
|
and triangle intersection code from pbrt; see the pbrt source code and/or
|
||||||
|
"Physically Based Rendering" book for more about the basic algorithmic
|
||||||
|
details.
|
||||||
|
|
||||||
|
|
||||||
|
Simple
|
||||||
|
======
|
||||||
|
|
||||||
|
This is a simple "hello world" type program that shows a ~10 line
|
||||||
|
application program calling out to a ~5 line ispc program to do a simple
|
||||||
|
computation.
|
||||||
|
|
||||||
|
Sort
|
||||||
|
====
|
||||||
|
This is a bucket sort of 32 bit unsigned integers.
|
||||||
|
By default 1000000 random elements get sorted.
|
||||||
|
Call ./sort N in order to sort N elements instead.
|
||||||
|
|
||||||
|
Volume
|
||||||
|
======
|
||||||
|
|
||||||
|
Ray-marching volume rendering, with single scattering lighting model. To
|
||||||
|
run it, specify a camera parameter file and a volume density file, e.g.:
|
||||||
|
|
||||||
|
volume camera.dat density_highres.vol
|
||||||
|
|
||||||
|
(See, e.g. Chapters 11 and 16 of "Physically Based Rendering" for
|
||||||
|
information about the algorithm implemented here.) The volume data set
|
||||||
|
included here was generated by the example implementation of the "Wavelet
|
||||||
|
Turbulence for Fluid Simulation" SIGGRAPH 2008 paper by Kim et
|
||||||
|
al. (http://www.cs.cornell.edu/~tedkim/WTURB/)
|
||||||
2
examples_cuda/aobench/.gitignore
vendored
Normal file
2
examples_cuda/aobench/.gitignore
vendored
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
ao
|
||||||
|
*.ppm
|
||||||
8
examples_cuda/aobench/Makefile
Normal file
8
examples_cuda/aobench/Makefile
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
|
||||||
|
EXAMPLE=ao
|
||||||
|
CPP_SRC=ao.cpp ao_serial.cpp
|
||||||
|
ISPC_SRC=ao.ispc
|
||||||
|
ISPC_IA_TARGETS=sse2,sse4,avx
|
||||||
|
ISPC_ARM_TARGETS=neon
|
||||||
|
|
||||||
|
include ../common.mk
|
||||||
186
examples_cuda/aobench/ao.cpp
Normal file
186
examples_cuda/aobench/ao.cpp
Normal file
@@ -0,0 +1,186 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2010-2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef _MSC_VER
|
||||||
|
#define _CRT_SECURE_NO_WARNINGS
|
||||||
|
#define NOMINMAX
|
||||||
|
#pragma warning (disable: 4244)
|
||||||
|
#pragma warning (disable: 4305)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <assert.h>
|
||||||
|
#ifdef __linux__
|
||||||
|
#include <malloc.h>
|
||||||
|
#endif
|
||||||
|
#include <math.h>
|
||||||
|
#include <map>
|
||||||
|
#include <string>
|
||||||
|
#include <algorithm>
|
||||||
|
#include <sys/types.h>
|
||||||
|
|
||||||
|
#include "ao_ispc.h"
|
||||||
|
using namespace ispc;
|
||||||
|
|
||||||
|
#include "../timing.h"
|
||||||
|
|
||||||
|
#define NSUBSAMPLES 2
|
||||||
|
|
||||||
|
extern void ao_serial(int w, int h, int nsubsamples, float image[]);
|
||||||
|
|
||||||
|
static unsigned int test_iterations;
|
||||||
|
static unsigned int width, height;
|
||||||
|
static unsigned char *img;
|
||||||
|
static float *fimg;
|
||||||
|
|
||||||
|
|
||||||
|
static unsigned char
|
||||||
|
clamp(float f)
|
||||||
|
{
|
||||||
|
int i = (int)(f * 255.5);
|
||||||
|
|
||||||
|
if (i < 0) i = 0;
|
||||||
|
if (i > 255) i = 255;
|
||||||
|
|
||||||
|
return (unsigned char)i;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void
|
||||||
|
savePPM(const char *fname, int w, int h)
|
||||||
|
{
|
||||||
|
for (int y = 0; y < h; y++) {
|
||||||
|
for (int x = 0; x < w; x++) {
|
||||||
|
img[3 * (y * w + x) + 0] = clamp(fimg[3 *(y * w + x) + 0]);
|
||||||
|
img[3 * (y * w + x) + 1] = clamp(fimg[3 *(y * w + x) + 1]);
|
||||||
|
img[3 * (y * w + x) + 2] = clamp(fimg[3 *(y * w + x) + 2]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
FILE *fp = fopen(fname, "wb");
|
||||||
|
if (!fp) {
|
||||||
|
perror(fname);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
fprintf(fp, "P6\n");
|
||||||
|
fprintf(fp, "%d %d\n", w, h);
|
||||||
|
fprintf(fp, "255\n");
|
||||||
|
fwrite(img, w * h * 3, 1, fp);
|
||||||
|
fclose(fp);
|
||||||
|
printf("Wrote image file %s\n", fname);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
int main(int argc, char **argv)
|
||||||
|
{
|
||||||
|
if (argc != 4) {
|
||||||
|
printf ("%s\n", argv[0]);
|
||||||
|
printf ("Usage: ao [num test iterations] [width] [height]\n");
|
||||||
|
getchar();
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
test_iterations = atoi(argv[1]);
|
||||||
|
width = atoi (argv[2]);
|
||||||
|
height = atoi (argv[3]);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Allocate space for output images
|
||||||
|
img = new unsigned char[width * height * 3];
|
||||||
|
fimg = new float[width * height * 3];
|
||||||
|
|
||||||
|
//
|
||||||
|
// Run the ispc path, test_iterations times, and report the minimum
|
||||||
|
// time for any of them.
|
||||||
|
//
|
||||||
|
double minTimeISPC = 1e30;
|
||||||
|
for (unsigned int i = 0; i < test_iterations; i++) {
|
||||||
|
memset((void *)fimg, 0, sizeof(float) * width * height * 3);
|
||||||
|
assert(NSUBSAMPLES == 2);
|
||||||
|
|
||||||
|
reset_and_start_timer();
|
||||||
|
ao_ispc(width, height, NSUBSAMPLES, fimg);
|
||||||
|
double t = get_elapsed_mcycles();
|
||||||
|
minTimeISPC = std::min(minTimeISPC, t);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Report results and save image
|
||||||
|
printf("[aobench ispc]:\t\t\t[%.3f] million cycles (%d x %d image)\n",
|
||||||
|
minTimeISPC, width, height);
|
||||||
|
savePPM("ao-ispc.ppm", width, height);
|
||||||
|
|
||||||
|
//
|
||||||
|
// Run the ispc + tasks path, test_iterations times, and report the
|
||||||
|
// minimum time for any of them.
|
||||||
|
//
|
||||||
|
double minTimeISPCTasks = 1e30;
|
||||||
|
for (unsigned int i = 0; i < test_iterations; i++) {
|
||||||
|
memset((void *)fimg, 0, sizeof(float) * width * height * 3);
|
||||||
|
assert(NSUBSAMPLES == 2);
|
||||||
|
|
||||||
|
reset_and_start_timer();
|
||||||
|
ao_ispc_tasks(width, height, NSUBSAMPLES, fimg);
|
||||||
|
double t = get_elapsed_mcycles();
|
||||||
|
minTimeISPCTasks = std::min(minTimeISPCTasks, t);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Report results and save image
|
||||||
|
printf("[aobench ispc + tasks]:\t\t[%.3f] million cycles (%d x %d image)\n",
|
||||||
|
minTimeISPCTasks, width, height);
|
||||||
|
savePPM("ao-ispc-tasks.ppm", width, height);
|
||||||
|
|
||||||
|
//
|
||||||
|
// Run the serial path, again test_iteration times, and report the
|
||||||
|
// minimum time.
|
||||||
|
//
|
||||||
|
double minTimeSerial = 1e30;
|
||||||
|
for (unsigned int i = 0; i < test_iterations; i++) {
|
||||||
|
memset((void *)fimg, 0, sizeof(float) * width * height * 3);
|
||||||
|
reset_and_start_timer();
|
||||||
|
ao_serial(width, height, NSUBSAMPLES, fimg);
|
||||||
|
double t = get_elapsed_mcycles();
|
||||||
|
minTimeSerial = std::min(minTimeSerial, t);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Report more results, save another image...
|
||||||
|
printf("[aobench serial]:\t\t[%.3f] million cycles (%d x %d image)\n", minTimeSerial,
|
||||||
|
width, height);
|
||||||
|
printf("\t\t\t\t(%.2fx speedup from ISPC, %.2fx speedup from ISPC + tasks)\n",
|
||||||
|
minTimeSerial / minTimeISPC, minTimeSerial / minTimeISPCTasks);
|
||||||
|
savePPM("ao-serial.ppm", width, height);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
272
examples_cuda/aobench/ao.ispc
Normal file
272
examples_cuda/aobench/ao.ispc
Normal file
@@ -0,0 +1,272 @@
|
|||||||
|
// -*- mode: c++ -*-
|
||||||
|
/*
|
||||||
|
Copyright (c) 2010-2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
/*
|
||||||
|
Based on Syoyo Fujita's aobench: http://code.google.com/p/aobench
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define NAO_SAMPLES 8
|
||||||
|
#define M_PI 3.1415926535f
|
||||||
|
|
||||||
|
typedef float<3> vec;
|
||||||
|
|
||||||
|
struct Isect {
|
||||||
|
float t;
|
||||||
|
vec p;
|
||||||
|
vec n;
|
||||||
|
int hit;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct Sphere {
|
||||||
|
vec center;
|
||||||
|
float radius;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct Plane {
|
||||||
|
vec p;
|
||||||
|
vec n;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct Ray {
|
||||||
|
vec org;
|
||||||
|
vec dir;
|
||||||
|
};
|
||||||
|
|
||||||
|
static inline float dot(vec a, vec b) {
|
||||||
|
return a.x * b.x + a.y * b.y + a.z * b.z;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline vec vcross(vec v0, vec v1) {
|
||||||
|
vec ret;
|
||||||
|
ret.x = v0.y * v1.z - v0.z * v1.y;
|
||||||
|
ret.y = v0.z * v1.x - v0.x * v1.z;
|
||||||
|
ret.z = v0.x * v1.y - v0.y * v1.x;
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void vnormalize(vec &v) {
|
||||||
|
float len2 = dot(v, v);
|
||||||
|
float invlen = rsqrt(len2);
|
||||||
|
v *= invlen;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void
|
||||||
|
ray_plane_intersect(Isect &isect, Ray &ray, uniform Plane &plane) {
|
||||||
|
float d = -dot(plane.p, plane.n);
|
||||||
|
float v = dot(ray.dir, plane.n);
|
||||||
|
|
||||||
|
cif (abs(v) < 1.0e-17)
|
||||||
|
return;
|
||||||
|
else {
|
||||||
|
float t = -(dot(ray.org, plane.n) + d) / v;
|
||||||
|
|
||||||
|
cif ((t > 0.0) && (t < isect.t)) {
|
||||||
|
isect.t = t;
|
||||||
|
isect.hit = 1;
|
||||||
|
isect.p = ray.org + ray.dir * t;
|
||||||
|
isect.n = plane.n;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline void
|
||||||
|
ray_sphere_intersect(Isect &isect, Ray &ray, uniform Sphere &sphere) {
|
||||||
|
vec rs = ray.org - sphere.center;
|
||||||
|
|
||||||
|
float B = dot(rs, ray.dir);
|
||||||
|
float C = dot(rs, rs) - sphere.radius * sphere.radius;
|
||||||
|
float D = B * B - C;
|
||||||
|
|
||||||
|
cif (D > 0.) {
|
||||||
|
float t = -B - sqrt(D);
|
||||||
|
|
||||||
|
cif ((t > 0.0) && (t < isect.t)) {
|
||||||
|
isect.t = t;
|
||||||
|
isect.hit = 1;
|
||||||
|
isect.p = ray.org + t * ray.dir;
|
||||||
|
isect.n = isect.p - sphere.center;
|
||||||
|
vnormalize(isect.n);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void
|
||||||
|
orthoBasis(vec basis[3], vec n) {
|
||||||
|
basis[2] = n;
|
||||||
|
basis[1].x = 0.0; basis[1].y = 0.0; basis[1].z = 0.0;
|
||||||
|
|
||||||
|
if ((n.x < 0.6) && (n.x > -0.6)) {
|
||||||
|
basis[1].x = 1.0;
|
||||||
|
} else if ((n.y < 0.6) && (n.y > -0.6)) {
|
||||||
|
basis[1].y = 1.0;
|
||||||
|
} else if ((n.z < 0.6) && (n.z > -0.6)) {
|
||||||
|
basis[1].z = 1.0;
|
||||||
|
} else {
|
||||||
|
basis[1].x = 1.0;
|
||||||
|
}
|
||||||
|
|
||||||
|
basis[0] = vcross(basis[1], basis[2]);
|
||||||
|
vnormalize(basis[0]);
|
||||||
|
|
||||||
|
basis[1] = vcross(basis[2], basis[0]);
|
||||||
|
vnormalize(basis[1]);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static float
|
||||||
|
ambient_occlusion(Isect &isect, uniform Plane &plane, uniform Sphere spheres[3],
|
||||||
|
RNGState &rngstate) {
|
||||||
|
float eps = 0.0001f;
|
||||||
|
vec p, n;
|
||||||
|
vec basis[3];
|
||||||
|
float occlusion = 0.0;
|
||||||
|
|
||||||
|
p = isect.p + eps * isect.n;
|
||||||
|
|
||||||
|
orthoBasis(basis, isect.n);
|
||||||
|
|
||||||
|
static const uniform int ntheta = NAO_SAMPLES;
|
||||||
|
static const uniform int nphi = NAO_SAMPLES;
|
||||||
|
for (uniform int j = 0; j < ntheta; j++) {
|
||||||
|
for (uniform int i = 0; i < nphi; i++) {
|
||||||
|
Ray ray;
|
||||||
|
Isect occIsect;
|
||||||
|
|
||||||
|
float theta = sqrt(frandom(&rngstate));
|
||||||
|
float phi = 2.0f * M_PI * frandom(&rngstate);
|
||||||
|
float x = cos(phi) * theta;
|
||||||
|
float y = sin(phi) * theta;
|
||||||
|
float z = sqrt(1.0 - theta * theta);
|
||||||
|
|
||||||
|
// local . global
|
||||||
|
float rx = x * basis[0].x + y * basis[1].x + z * basis[2].x;
|
||||||
|
float ry = x * basis[0].y + y * basis[1].y + z * basis[2].y;
|
||||||
|
float rz = x * basis[0].z + y * basis[1].z + z * basis[2].z;
|
||||||
|
|
||||||
|
ray.org = p;
|
||||||
|
ray.dir.x = rx;
|
||||||
|
ray.dir.y = ry;
|
||||||
|
ray.dir.z = rz;
|
||||||
|
|
||||||
|
occIsect.t = 1.0e+17;
|
||||||
|
occIsect.hit = 0;
|
||||||
|
|
||||||
|
for (uniform int snum = 0; snum < 3; ++snum)
|
||||||
|
ray_sphere_intersect(occIsect, ray, spheres[snum]);
|
||||||
|
ray_plane_intersect (occIsect, ray, plane);
|
||||||
|
|
||||||
|
if (occIsect.hit) occlusion += 1.0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
occlusion = (ntheta * nphi - occlusion) / (float)(ntheta * nphi);
|
||||||
|
return occlusion;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/* Compute the image for the scanlines from [y0,y1), for an overall image
|
||||||
|
of width w and height h.
|
||||||
|
*/
|
||||||
|
static void ao_scanlines(uniform int y0, uniform int y1, uniform int w,
|
||||||
|
uniform int h, uniform int nsubsamples,
|
||||||
|
uniform float image[]) {
|
||||||
|
static uniform Plane plane = { { 0.0f, -0.5f, 0.0f }, { 0.f, 1.f, 0.f } };
|
||||||
|
static uniform Sphere spheres[3] = {
|
||||||
|
{ { -2.0f, 0.0f, -3.5f }, 0.5f },
|
||||||
|
{ { -0.5f, 0.0f, -3.0f }, 0.5f },
|
||||||
|
{ { 1.0f, 0.0f, -2.2f }, 0.5f } };
|
||||||
|
RNGState rngstate;
|
||||||
|
|
||||||
|
seed_rng(&rngstate, programIndex + (y0 << (programIndex & 15)));
|
||||||
|
float invSamples = 1.f / nsubsamples;
|
||||||
|
|
||||||
|
foreach_tiled(y = y0 ... y1, x = 0 ... w,
|
||||||
|
u = 0 ... nsubsamples, v = 0 ... nsubsamples) {
|
||||||
|
float du = (float)u * invSamples, dv = (float)v * invSamples;
|
||||||
|
|
||||||
|
// Figure out x,y pixel in NDC
|
||||||
|
float px = (x + du - (w / 2.0f)) / (w / 2.0f);
|
||||||
|
float py = -(y + dv - (h / 2.0f)) / (h / 2.0f);
|
||||||
|
float ret = 0.f;
|
||||||
|
Ray ray;
|
||||||
|
Isect isect;
|
||||||
|
|
||||||
|
ray.org = 0.f;
|
||||||
|
|
||||||
|
// Poor man's perspective projection
|
||||||
|
ray.dir.x = px;
|
||||||
|
ray.dir.y = py;
|
||||||
|
ray.dir.z = -1.0;
|
||||||
|
vnormalize(ray.dir);
|
||||||
|
|
||||||
|
isect.t = 1.0e+17;
|
||||||
|
isect.hit = 0;
|
||||||
|
|
||||||
|
for (uniform int snum = 0; snum < 3; ++snum)
|
||||||
|
ray_sphere_intersect(isect, ray, spheres[snum]);
|
||||||
|
ray_plane_intersect(isect, ray, plane);
|
||||||
|
|
||||||
|
// Note use of 'coherent' if statement; the set of rays we
|
||||||
|
// trace will often all hit or all miss the scene
|
||||||
|
cif (isect.hit) {
|
||||||
|
ret = ambient_occlusion(isect, plane, spheres, rngstate);
|
||||||
|
ret *= invSamples * invSamples;
|
||||||
|
|
||||||
|
int offset = 3 * (y * w + x);
|
||||||
|
atomic_add_local(&image[offset], ret);
|
||||||
|
atomic_add_local(&image[offset+1], ret);
|
||||||
|
atomic_add_local(&image[offset+2], ret);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
export void ao_ispc(uniform int w, uniform int h, uniform int nsubsamples,
|
||||||
|
uniform float image[]) {
|
||||||
|
ao_scanlines(0, h, w, h, nsubsamples, image);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void task ao_task(uniform int width, uniform int height,
|
||||||
|
uniform int nsubsamples, uniform float image[]) {
|
||||||
|
ao_scanlines(taskIndex, taskIndex+1, width, height, nsubsamples, image);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
export void ao_ispc_tasks(uniform int w, uniform int h, uniform int nsubsamples,
|
||||||
|
uniform float image[]) {
|
||||||
|
launch[h] ao_task(w, h, nsubsamples, image);
|
||||||
|
}
|
||||||
314
examples_cuda/aobench/ao_serial.cpp
Normal file
314
examples_cuda/aobench/ao_serial.cpp
Normal file
@@ -0,0 +1,314 @@
|
|||||||
|
// -*- mode: c++ -*-
|
||||||
|
/*
|
||||||
|
Copyright (c) 2010-2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
/*
|
||||||
|
Based on Syoyo Fujita's aobench: http://code.google.com/p/aobench
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef _MSC_VER
|
||||||
|
#define _CRT_SECURE_NO_WARNINGS
|
||||||
|
#define NOMINMAX
|
||||||
|
#pragma warning (disable: 4244)
|
||||||
|
#pragma warning (disable: 4305)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <math.h>
|
||||||
|
|
||||||
|
#ifdef _MSC_VER
|
||||||
|
static long long drand48_x = 0x1234ABCD330E;
|
||||||
|
|
||||||
|
static inline void srand48(int x) {
|
||||||
|
drand48_x = x ^ (x << 16);
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline double drand48() {
|
||||||
|
drand48_x = drand48_x * 0x5DEECE66D + 0xB;
|
||||||
|
return (drand48_x & 0xFFFFFFFFFFFF) * (1.0 / 281474976710656.0);
|
||||||
|
}
|
||||||
|
#endif // _MSC_VER
|
||||||
|
|
||||||
|
#ifdef _MSC_VER
|
||||||
|
__declspec(align(16))
|
||||||
|
#endif
|
||||||
|
struct vec {
|
||||||
|
vec() { x=y=z=pad=0.; }
|
||||||
|
vec(float xx, float yy, float zz) { x = xx; y = yy; z = zz; }
|
||||||
|
|
||||||
|
vec operator*(float f) const { return vec(x*f, y*f, z*f); }
|
||||||
|
vec operator+(const vec &f2) const {
|
||||||
|
return vec(x+f2.x, y+f2.y, z+f2.z);
|
||||||
|
}
|
||||||
|
vec operator-(const vec &f2) const {
|
||||||
|
return vec(x-f2.x, y-f2.y, z-f2.z);
|
||||||
|
}
|
||||||
|
vec operator*(const vec &f2) const {
|
||||||
|
return vec(x*f2.x, y*f2.y, z*f2.z);
|
||||||
|
}
|
||||||
|
float x, y, z;
|
||||||
|
float pad;
|
||||||
|
}
|
||||||
|
#ifndef _MSC_VER
|
||||||
|
__attribute__ ((aligned(16)))
|
||||||
|
#endif
|
||||||
|
;
|
||||||
|
inline vec operator*(float f, const vec &v) { return vec(f*v.x, f*v.y, f*v.z); }
|
||||||
|
|
||||||
|
|
||||||
|
#define NAO_SAMPLES 8
|
||||||
|
|
||||||
|
#ifdef M_PI
|
||||||
|
#undef M_PI
|
||||||
|
#endif
|
||||||
|
#define M_PI 3.1415926535f
|
||||||
|
|
||||||
|
struct Isect {
|
||||||
|
float t;
|
||||||
|
vec p;
|
||||||
|
vec n;
|
||||||
|
int hit;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct Sphere {
|
||||||
|
vec center;
|
||||||
|
float radius;
|
||||||
|
|
||||||
|
};
|
||||||
|
|
||||||
|
struct Plane {
|
||||||
|
vec p;
|
||||||
|
vec n;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct Ray {
|
||||||
|
vec org;
|
||||||
|
vec dir;
|
||||||
|
};
|
||||||
|
|
||||||
|
static inline float dot(const vec &a, const vec &b) {
|
||||||
|
return a.x * b.x + a.y * b.y + a.z * b.z;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline vec vcross(const vec &v0, const vec &v1) {
|
||||||
|
vec ret;
|
||||||
|
ret.x = v0.y * v1.z - v0.z * v1.y;
|
||||||
|
ret.y = v0.z * v1.x - v0.x * v1.z;
|
||||||
|
ret.z = v0.x * v1.y - v0.y * v1.x;
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void vnormalize(vec &v) {
|
||||||
|
float len2 = dot(v, v);
|
||||||
|
float invlen = 1.f / sqrtf(len2);
|
||||||
|
v = v * invlen;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline void
|
||||||
|
ray_plane_intersect(Isect &isect, Ray &ray,
|
||||||
|
Plane &plane) {
|
||||||
|
float d = -dot(plane.p, plane.n);
|
||||||
|
float v = dot(ray.dir, plane.n);
|
||||||
|
|
||||||
|
if (fabsf(v) < 1.0e-17f)
|
||||||
|
return;
|
||||||
|
else {
|
||||||
|
float t = -(dot(ray.org, plane.n) + d) / v;
|
||||||
|
|
||||||
|
if ((t > 0.0) && (t < isect.t)) {
|
||||||
|
isect.t = t;
|
||||||
|
isect.hit = 1;
|
||||||
|
isect.p = ray.org + ray.dir * t;
|
||||||
|
isect.n = plane.n;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline void
|
||||||
|
ray_sphere_intersect(Isect &isect, Ray &ray,
|
||||||
|
Sphere &sphere) {
|
||||||
|
vec rs = ray.org - sphere.center;
|
||||||
|
|
||||||
|
float B = dot(rs, ray.dir);
|
||||||
|
float C = dot(rs, rs) - sphere.radius * sphere.radius;
|
||||||
|
float D = B * B - C;
|
||||||
|
|
||||||
|
if (D > 0.) {
|
||||||
|
float t = -B - sqrtf(D);
|
||||||
|
|
||||||
|
if ((t > 0.0) && (t < isect.t)) {
|
||||||
|
isect.t = t;
|
||||||
|
isect.hit = 1;
|
||||||
|
isect.p = ray.org + t * ray.dir;
|
||||||
|
isect.n = isect.p - sphere.center;
|
||||||
|
vnormalize(isect.n);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline void
|
||||||
|
orthoBasis(vec basis[3], const vec &n) {
|
||||||
|
basis[2] = n;
|
||||||
|
basis[1].x = 0.0; basis[1].y = 0.0; basis[1].z = 0.0;
|
||||||
|
|
||||||
|
if ((n.x < 0.6f) && (n.x > -0.6f)) {
|
||||||
|
basis[1].x = 1.0;
|
||||||
|
} else if ((n.y < 0.6f) && (n.y > -0.6f)) {
|
||||||
|
basis[1].y = 1.0;
|
||||||
|
} else if ((n.z < 0.6f) && (n.z > -0.6f)) {
|
||||||
|
basis[1].z = 1.0;
|
||||||
|
} else {
|
||||||
|
basis[1].x = 1.0;
|
||||||
|
}
|
||||||
|
|
||||||
|
basis[0] = vcross(basis[1], basis[2]);
|
||||||
|
vnormalize(basis[0]);
|
||||||
|
|
||||||
|
basis[1] = vcross(basis[2], basis[0]);
|
||||||
|
vnormalize(basis[1]);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static float
|
||||||
|
ambient_occlusion(Isect &isect, Plane &plane,
|
||||||
|
Sphere spheres[3]) {
|
||||||
|
float eps = 0.0001f;
|
||||||
|
vec p, n;
|
||||||
|
vec basis[3];
|
||||||
|
float occlusion = 0.0;
|
||||||
|
|
||||||
|
p = isect.p + eps * isect.n;
|
||||||
|
|
||||||
|
orthoBasis(basis, isect.n);
|
||||||
|
|
||||||
|
static const int ntheta = NAO_SAMPLES;
|
||||||
|
static const int nphi = NAO_SAMPLES;
|
||||||
|
for (int j = 0; j < ntheta; j++) {
|
||||||
|
for (int i = 0; i < nphi; i++) {
|
||||||
|
Ray ray;
|
||||||
|
Isect occIsect;
|
||||||
|
|
||||||
|
float theta = sqrtf(drand48());
|
||||||
|
float phi = 2.0f * M_PI * drand48();
|
||||||
|
float x = cosf(phi) * theta;
|
||||||
|
float y = sinf(phi) * theta;
|
||||||
|
float z = sqrtf(1.0f - theta * theta);
|
||||||
|
|
||||||
|
// local . global
|
||||||
|
float rx = x * basis[0].x + y * basis[1].x + z * basis[2].x;
|
||||||
|
float ry = x * basis[0].y + y * basis[1].y + z * basis[2].y;
|
||||||
|
float rz = x * basis[0].z + y * basis[1].z + z * basis[2].z;
|
||||||
|
|
||||||
|
ray.org = p;
|
||||||
|
ray.dir.x = rx;
|
||||||
|
ray.dir.y = ry;
|
||||||
|
ray.dir.z = rz;
|
||||||
|
|
||||||
|
occIsect.t = 1.0e+17f;
|
||||||
|
occIsect.hit = 0;
|
||||||
|
|
||||||
|
for (int snum = 0; snum < 3; ++snum)
|
||||||
|
ray_sphere_intersect(occIsect, ray, spheres[snum]);
|
||||||
|
ray_plane_intersect (occIsect, ray, plane);
|
||||||
|
|
||||||
|
if (occIsect.hit) occlusion += 1.f;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
occlusion = (ntheta * nphi - occlusion) / (float)(ntheta * nphi);
|
||||||
|
return occlusion;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/* Compute the image for the scanlines from [y0,y1), for an overall image
|
||||||
|
of width w and height h.
|
||||||
|
*/
|
||||||
|
static void ao_scanlines(int y0, int y1, int w, int h, int nsubsamples,
|
||||||
|
float image[]) {
|
||||||
|
static Plane plane = { vec(0.0f, -0.5f, 0.0f), vec(0.f, 1.f, 0.f) };
|
||||||
|
static Sphere spheres[3] = {
|
||||||
|
{ vec(-2.0f, 0.0f, -3.5f), 0.5f },
|
||||||
|
{ vec(-0.5f, 0.0f, -3.0f), 0.5f },
|
||||||
|
{ vec(1.0f, 0.0f, -2.2f), 0.5f } };
|
||||||
|
|
||||||
|
srand48(y0);
|
||||||
|
|
||||||
|
for (int y = y0; y < y1; ++y) {
|
||||||
|
for (int x = 0; x < w; ++x) {
|
||||||
|
int offset = 3 * (y * w + x);
|
||||||
|
for (int u = 0; u < nsubsamples; ++u) {
|
||||||
|
for (int v = 0; v < nsubsamples; ++v) {
|
||||||
|
float px = (x + (u / (float)nsubsamples) - (w / 2.0f)) / (w / 2.0f);
|
||||||
|
float py = -(y + (v / (float)nsubsamples) - (h / 2.0f)) / (h / 2.0f);
|
||||||
|
float ret = 0.f;
|
||||||
|
Ray ray;
|
||||||
|
Isect isect;
|
||||||
|
|
||||||
|
ray.org = vec(0.f, 0.f, 0.f);
|
||||||
|
|
||||||
|
ray.dir.x = px;
|
||||||
|
ray.dir.y = py;
|
||||||
|
ray.dir.z = -1.0f;
|
||||||
|
vnormalize(ray.dir);
|
||||||
|
|
||||||
|
isect.t = 1.0e+17f;
|
||||||
|
isect.hit = 0;
|
||||||
|
|
||||||
|
for (int snum = 0; snum < 3; ++snum)
|
||||||
|
ray_sphere_intersect(isect, ray, spheres[snum]);
|
||||||
|
ray_plane_intersect(isect, ray, plane);
|
||||||
|
|
||||||
|
if (isect.hit)
|
||||||
|
ret = ambient_occlusion(isect, plane, spheres);
|
||||||
|
|
||||||
|
// Update image for AO for this ray
|
||||||
|
image[offset+0] += ret;
|
||||||
|
image[offset+1] += ret;
|
||||||
|
image[offset+2] += ret;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Normalize image pixels by number of samples taken per pixel
|
||||||
|
image[offset+0] /= nsubsamples * nsubsamples;
|
||||||
|
image[offset+1] /= nsubsamples * nsubsamples;
|
||||||
|
image[offset+2] /= nsubsamples * nsubsamples;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void ao_serial(int w, int h, int nsubsamples,
|
||||||
|
float image[]) {
|
||||||
|
ao_scanlines(0, h, w, h, nsubsamples, image);
|
||||||
|
}
|
||||||
180
examples_cuda/aobench/aobench.vcxproj
Normal file
180
examples_cuda/aobench/aobench.vcxproj
Normal file
@@ -0,0 +1,180 @@
|
|||||||
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
|
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
|
||||||
|
<ItemGroup Label="ProjectConfigurations">
|
||||||
|
<ProjectConfiguration Include="Debug|Win32">
|
||||||
|
<Configuration>Debug</Configuration>
|
||||||
|
<Platform>Win32</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Debug|x64">
|
||||||
|
<Configuration>Debug</Configuration>
|
||||||
|
<Platform>x64</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Release|Win32">
|
||||||
|
<Configuration>Release</Configuration>
|
||||||
|
<Platform>Win32</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Release|x64">
|
||||||
|
<Configuration>Release</Configuration>
|
||||||
|
<Platform>x64</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
</ItemGroup>
|
||||||
|
<ItemGroup>
|
||||||
|
<ClCompile Include="ao.cpp" />
|
||||||
|
<ClCompile Include="ao_serial.cpp" />
|
||||||
|
<ClCompile Include="../tasksys.cpp" />
|
||||||
|
</ItemGroup>
|
||||||
|
<ItemGroup>
|
||||||
|
<CustomBuild Include="ao.ispc">
|
||||||
|
<FileType>Document</FileType>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --arch=x86 --target=sse2,sse4,avx
|
||||||
|
</Command>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --target=sse2,sse4,avx
|
||||||
|
</Command>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --arch=x86 --target=sse2,sse4,avx
|
||||||
|
</Command>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Release|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --target=sse2,sse4,avx
|
||||||
|
</Command>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|x64'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
</CustomBuild>
|
||||||
|
</ItemGroup>
|
||||||
|
<PropertyGroup Label="Globals">
|
||||||
|
<ProjectGuid>{F29204CA-19DF-4F3C-87D5-03F4EEDAAFEB}</ProjectGuid>
|
||||||
|
<Keyword>Win32Proj</Keyword>
|
||||||
|
<RootNamespace>aobench</RootNamespace>
|
||||||
|
</PropertyGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>true</UseDebugLibraries>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>true</UseDebugLibraries>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>false</UseDebugLibraries>
|
||||||
|
<WholeProgramOptimization>true</WholeProgramOptimization>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>false</UseDebugLibraries>
|
||||||
|
<WholeProgramOptimization>true</WholeProgramOptimization>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
|
||||||
|
<ImportGroup Label="ExtensionSettings">
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="PropertySheets">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="PropertySheets">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<PropertyGroup Label="UserMacros" />
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<LinkIncremental>true</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
<TargetName>ao</TargetName>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
|
||||||
|
<LinkIncremental>true</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ExecutablePath);$(ProjectDir)..\..</ExecutablePath>
|
||||||
|
<TargetName>ao</TargetName>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<LinkIncremental>false</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
<TargetName>ao</TargetName>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
|
||||||
|
<LinkIncremental>false</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
<TargetName>ao</TargetName>
|
||||||
|
</PropertyGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<ClCompile>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<Optimization>Disabled</Optimization>
|
||||||
|
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
|
||||||
|
<ClCompile>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<Optimization>Disabled</Optimization>
|
||||||
|
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<ClCompile>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<Optimization>MaxSpeed</Optimization>
|
||||||
|
<FunctionLevelLinking>true</FunctionLevelLinking>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
<EnableCOMDATFolding>true</EnableCOMDATFolding>
|
||||||
|
<OptimizeReferences>true</OptimizeReferences>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
|
||||||
|
<ClCompile>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<Optimization>MaxSpeed</Optimization>
|
||||||
|
<FunctionLevelLinking>true</FunctionLevelLinking>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
<EnableCOMDATFolding>true</EnableCOMDATFolding>
|
||||||
|
<OptimizeReferences>true</OptimizeReferences>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
|
||||||
|
<ImportGroup Label="ExtensionTargets">
|
||||||
|
</ImportGroup>
|
||||||
|
</Project>
|
||||||
2
examples_cuda/aobench_instrumented/.gitignore
vendored
Normal file
2
examples_cuda/aobench_instrumented/.gitignore
vendored
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
ao
|
||||||
|
*.ppm
|
||||||
26
examples_cuda/aobench_instrumented/Makefile
Normal file
26
examples_cuda/aobench_instrumented/Makefile
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
|
||||||
|
CXX=clang++ -m64
|
||||||
|
CXXFLAGS=-Iobjs/ -g3 -Wall
|
||||||
|
ISPC=ispc
|
||||||
|
ISPCFLAGS=-O2 --instrument --arch=x86-64 --target=sse2
|
||||||
|
|
||||||
|
default: ao
|
||||||
|
|
||||||
|
.PHONY: dirs clean
|
||||||
|
|
||||||
|
dirs:
|
||||||
|
/bin/mkdir -p objs/
|
||||||
|
|
||||||
|
clean:
|
||||||
|
/bin/rm -rf objs *~ ao
|
||||||
|
|
||||||
|
ao: objs/ao.o objs/instrument.o objs/ao_ispc.o ../tasksys.cpp
|
||||||
|
$(CXX) $(CXXFLAGS) -o $@ $^ -lm -lpthread
|
||||||
|
|
||||||
|
objs/%.o: %.cpp dirs
|
||||||
|
$(CXX) $< $(CXXFLAGS) -c -o $@
|
||||||
|
|
||||||
|
objs/ao.o: objs/ao_ispc.h
|
||||||
|
|
||||||
|
objs/%_ispc.h objs/%_ispc.o: %.ispc dirs
|
||||||
|
$(ISPC) $(ISPCFLAGS) $< -o objs/$*_ispc.o -h objs/$*_instrumented_ispc.h
|
||||||
131
examples_cuda/aobench_instrumented/ao.cpp
Normal file
131
examples_cuda/aobench_instrumented/ao.cpp
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2010-2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef _MSC_VER
|
||||||
|
#define NOMINMAX
|
||||||
|
#pragma warning (disable: 4244)
|
||||||
|
#pragma warning (disable: 4305)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <assert.h>
|
||||||
|
#ifdef __linux__
|
||||||
|
#include <malloc.h>
|
||||||
|
#endif
|
||||||
|
#include <math.h>
|
||||||
|
#include <map>
|
||||||
|
#include <string>
|
||||||
|
#include <algorithm>
|
||||||
|
#include <sys/types.h>
|
||||||
|
|
||||||
|
#include "ao_instrumented_ispc.h"
|
||||||
|
using namespace ispc;
|
||||||
|
|
||||||
|
#include "instrument.h"
|
||||||
|
#include "../timing.h"
|
||||||
|
|
||||||
|
#define NSUBSAMPLES 2
|
||||||
|
|
||||||
|
static unsigned int test_iterations;
|
||||||
|
static unsigned int width, height;
|
||||||
|
static unsigned char *img;
|
||||||
|
static float *fimg;
|
||||||
|
|
||||||
|
|
||||||
|
static unsigned char
|
||||||
|
clamp(float f)
|
||||||
|
{
|
||||||
|
int i = (int)(f * 255.5);
|
||||||
|
|
||||||
|
if (i < 0) i = 0;
|
||||||
|
if (i > 255) i = 255;
|
||||||
|
|
||||||
|
return (unsigned char)i;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void
|
||||||
|
savePPM(const char *fname, int w, int h)
|
||||||
|
{
|
||||||
|
for (int y = 0; y < h; y++) {
|
||||||
|
for (int x = 0; x < w; x++) {
|
||||||
|
img[3 * (y * w + x) + 0] = clamp(fimg[3 *(y * w + x) + 0]);
|
||||||
|
img[3 * (y * w + x) + 1] = clamp(fimg[3 *(y * w + x) + 1]);
|
||||||
|
img[3 * (y * w + x) + 2] = clamp(fimg[3 *(y * w + x) + 2]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
FILE *fp = fopen(fname, "wb");
|
||||||
|
if (!fp) {
|
||||||
|
perror(fname);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
fprintf(fp, "P6\n");
|
||||||
|
fprintf(fp, "%d %d\n", w, h);
|
||||||
|
fprintf(fp, "255\n");
|
||||||
|
fwrite(img, w * h * 3, 1, fp);
|
||||||
|
fclose(fp);
|
||||||
|
printf("Wrote image file %s\n", fname);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
int main(int argc, char **argv)
|
||||||
|
{
|
||||||
|
if (argc != 4) {
|
||||||
|
printf ("%s\n", argv[0]);
|
||||||
|
printf ("Usage: ao [num test iterations] [width] [height]\n");
|
||||||
|
getchar();
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
test_iterations = atoi(argv[1]);
|
||||||
|
width = atoi (argv[2]);
|
||||||
|
height = atoi (argv[3]);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Allocate space for output images
|
||||||
|
img = new unsigned char[width * height * 3];
|
||||||
|
fimg = new float[width * height * 3];
|
||||||
|
|
||||||
|
ao_ispc(width, height, NSUBSAMPLES, fimg);
|
||||||
|
|
||||||
|
savePPM("ao-ispc.ppm", width, height);
|
||||||
|
|
||||||
|
ISPCPrintInstrument();
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
333
examples_cuda/aobench_instrumented/ao.ispc
Normal file
333
examples_cuda/aobench_instrumented/ao.ispc
Normal file
@@ -0,0 +1,333 @@
|
|||||||
|
// -*- mode: c++ -*-
|
||||||
|
/*
|
||||||
|
Copyright (c) 2010-2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
/*
|
||||||
|
Based on Syoyo Fujita's aobench: http://code.google.com/p/aobench
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define NAO_SAMPLES 8
|
||||||
|
#define M_PI 3.1415926535f
|
||||||
|
|
||||||
|
typedef float<3> vec;
|
||||||
|
|
||||||
|
struct Isect {
|
||||||
|
float t;
|
||||||
|
vec p;
|
||||||
|
vec n;
|
||||||
|
int hit;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct Sphere {
|
||||||
|
vec center;
|
||||||
|
float radius;
|
||||||
|
|
||||||
|
};
|
||||||
|
|
||||||
|
struct Plane {
|
||||||
|
vec p;
|
||||||
|
vec n;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct Ray {
|
||||||
|
vec org;
|
||||||
|
vec dir;
|
||||||
|
};
|
||||||
|
|
||||||
|
static inline float dot(vec a, vec b) {
|
||||||
|
return a.x * b.x + a.y * b.y + a.z * b.z;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline vec vcross(vec v0, vec v1) {
|
||||||
|
vec ret;
|
||||||
|
ret.x = v0.y * v1.z - v0.z * v1.y;
|
||||||
|
ret.y = v0.z * v1.x - v0.x * v1.z;
|
||||||
|
ret.z = v0.x * v1.y - v0.y * v1.x;
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
static inline void vnormalize(vec &v) {
|
||||||
|
float len2 = dot(v, v);
|
||||||
|
float invlen = rsqrt(len2);
|
||||||
|
v *= invlen;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline void
|
||||||
|
ray_plane_intersect(Isect &isect, Ray &ray, Plane &plane) {
|
||||||
|
float d = -dot(plane.p, plane.n);
|
||||||
|
float v = dot(ray.dir, plane.n);
|
||||||
|
|
||||||
|
cif (abs(v) < 1.0e-17)
|
||||||
|
return;
|
||||||
|
else {
|
||||||
|
float t = -(dot(ray.org, plane.n) + d) / v;
|
||||||
|
|
||||||
|
cif ((t > 0.0) && (t < isect.t)) {
|
||||||
|
isect.t = t;
|
||||||
|
isect.hit = 1;
|
||||||
|
isect.p = ray.org + ray.dir * t;
|
||||||
|
isect.n = plane.n;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline void
|
||||||
|
ray_sphere_intersect(Isect &isect, Ray &ray, Sphere &sphere) {
|
||||||
|
vec rs = ray.org - sphere.center;
|
||||||
|
|
||||||
|
float B = dot(rs, ray.dir);
|
||||||
|
float C = dot(rs, rs) - sphere.radius * sphere.radius;
|
||||||
|
float D = B * B - C;
|
||||||
|
|
||||||
|
cif (D > 0.) {
|
||||||
|
float t = -B - sqrt(D);
|
||||||
|
|
||||||
|
cif ((t > 0.0) && (t < isect.t)) {
|
||||||
|
isect.t = t;
|
||||||
|
isect.hit = 1;
|
||||||
|
isect.p = ray.org + t * ray.dir;
|
||||||
|
isect.n = isect.p - sphere.center;
|
||||||
|
vnormalize(isect.n);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline void
|
||||||
|
orthoBasis(vec basis[3], vec n) {
|
||||||
|
basis[2] = n;
|
||||||
|
basis[1].x = 0.0; basis[1].y = 0.0; basis[1].z = 0.0;
|
||||||
|
|
||||||
|
if ((n.x < 0.6) && (n.x > -0.6)) {
|
||||||
|
basis[1].x = 1.0;
|
||||||
|
} else if ((n.y < 0.6) && (n.y > -0.6)) {
|
||||||
|
basis[1].y = 1.0;
|
||||||
|
} else if ((n.z < 0.6) && (n.z > -0.6)) {
|
||||||
|
basis[1].z = 1.0;
|
||||||
|
} else {
|
||||||
|
basis[1].x = 1.0;
|
||||||
|
}
|
||||||
|
|
||||||
|
basis[0] = vcross(basis[1], basis[2]);
|
||||||
|
vnormalize(basis[0]);
|
||||||
|
|
||||||
|
basis[1] = vcross(basis[2], basis[0]);
|
||||||
|
vnormalize(basis[1]);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline float
|
||||||
|
ambient_occlusion(Isect &isect, Plane &plane, Sphere spheres[3],
|
||||||
|
RNGState &rngstate) {
|
||||||
|
float eps = 0.0001f;
|
||||||
|
vec p, n;
|
||||||
|
vec basis[3];
|
||||||
|
float occlusion = 0.0;
|
||||||
|
|
||||||
|
p = isect.p + eps * isect.n;
|
||||||
|
|
||||||
|
orthoBasis(basis, isect.n);
|
||||||
|
|
||||||
|
static const uniform int ntheta = NAO_SAMPLES;
|
||||||
|
static const uniform int nphi = NAO_SAMPLES;
|
||||||
|
for (uniform int j = 0; j < ntheta; j++) {
|
||||||
|
for (uniform int i = 0; i < nphi; i++) {
|
||||||
|
Ray ray;
|
||||||
|
Isect occIsect;
|
||||||
|
|
||||||
|
float theta = sqrt(frandom(&rngstate));
|
||||||
|
float phi = 2.0f * M_PI * frandom(&rngstate);
|
||||||
|
float x = cos(phi) * theta;
|
||||||
|
float y = sin(phi) * theta;
|
||||||
|
float z = sqrt(1.0 - theta * theta);
|
||||||
|
|
||||||
|
// local . global
|
||||||
|
float rx = x * basis[0].x + y * basis[1].x + z * basis[2].x;
|
||||||
|
float ry = x * basis[0].y + y * basis[1].y + z * basis[2].y;
|
||||||
|
float rz = x * basis[0].z + y * basis[1].z + z * basis[2].z;
|
||||||
|
|
||||||
|
ray.org = p;
|
||||||
|
ray.dir.x = rx;
|
||||||
|
ray.dir.y = ry;
|
||||||
|
ray.dir.z = rz;
|
||||||
|
|
||||||
|
occIsect.t = 1.0e+17;
|
||||||
|
occIsect.hit = 0;
|
||||||
|
|
||||||
|
for (uniform int snum = 0; snum < 3; ++snum)
|
||||||
|
ray_sphere_intersect(occIsect, ray, spheres[snum]);
|
||||||
|
ray_plane_intersect (occIsect, ray, plane);
|
||||||
|
|
||||||
|
if (occIsect.hit) occlusion += 1.0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
occlusion = (ntheta * nphi - occlusion) / (float)(ntheta * nphi);
|
||||||
|
return occlusion;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/* Compute the image for the scanlines from [y0,y1), for an overall image
|
||||||
|
of width w and height h.
|
||||||
|
*/
|
||||||
|
static void ao_scanlines(uniform int y0, uniform int y1, uniform int w,
|
||||||
|
uniform int h, uniform int nsubsamples,
|
||||||
|
uniform float image[]) {
|
||||||
|
static Plane plane = { { 0.0f, -0.5f, 0.0f }, { 0.f, 1.f, 0.f } };
|
||||||
|
static Sphere spheres[3] = {
|
||||||
|
{ { -2.0f, 0.0f, -3.5f }, 0.5f },
|
||||||
|
{ { -0.5f, 0.0f, -3.0f }, 0.5f },
|
||||||
|
{ { 1.0f, 0.0f, -2.2f }, 0.5f } };
|
||||||
|
RNGState rngstate;
|
||||||
|
|
||||||
|
seed_rng(&rngstate, programIndex + (y0 << (programIndex & 15)));
|
||||||
|
|
||||||
|
// Compute the mapping between the 'programCount'-wide program
|
||||||
|
// instances running in parallel and samples in the image.
|
||||||
|
//
|
||||||
|
// For now, we'll always take four samples per pixel, so start by
|
||||||
|
// initializing du and dv with offsets into subpixel samples. We'll
|
||||||
|
// take care of further updating du and dv for the case where we're
|
||||||
|
// doing more than 4 program instances in parallel shortly.
|
||||||
|
uniform float uSteps[4] = { 0, 1, 0, 1 };
|
||||||
|
uniform float vSteps[4] = { 0, 0, 1, 1 };
|
||||||
|
float du = uSteps[programIndex % 4] / nsubsamples;
|
||||||
|
float dv = vSteps[programIndex % 4] / nsubsamples;
|
||||||
|
|
||||||
|
// Now handle the case where we are able to do more than one pixel's
|
||||||
|
// worth of work at once. nx records the number of pixels in the x
|
||||||
|
// direction we do per iteration and ny the number in y.
|
||||||
|
uniform int nx = 1, ny = 1;
|
||||||
|
|
||||||
|
// FIXME: We actually need ny to be 1 regardless of the decomposition,
|
||||||
|
// since the task decomposition is one scanline high.
|
||||||
|
|
||||||
|
if (programCount == 8) {
|
||||||
|
// Do two pixels at once in the x direction
|
||||||
|
nx = 2;
|
||||||
|
if (programIndex >= 4)
|
||||||
|
// And shift the offsets for the second pixel's worth of work
|
||||||
|
++du;
|
||||||
|
}
|
||||||
|
else if (programCount == 16) {
|
||||||
|
nx = 4;
|
||||||
|
ny = 1;
|
||||||
|
if (programIndex >= 4 && programIndex < 8)
|
||||||
|
++du;
|
||||||
|
if (programIndex >= 8 && programIndex < 12)
|
||||||
|
du += 2;
|
||||||
|
if (programIndex >= 12)
|
||||||
|
du += 3;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Now loop over all of the pixels, stepping in x and y as calculated
|
||||||
|
// above. (Assumes that ny divides y and nx divides x...)
|
||||||
|
for (uniform int y = y0; y < y1; y += ny) {
|
||||||
|
for (uniform int x = 0; x < w; x += nx) {
|
||||||
|
// Figure out x,y pixel in NDC
|
||||||
|
float px = (x + du - (w / 2.0f)) / (w / 2.0f);
|
||||||
|
float py = -(y + dv - (h / 2.0f)) / (h / 2.0f);
|
||||||
|
float ret = 0.f;
|
||||||
|
Ray ray;
|
||||||
|
Isect isect;
|
||||||
|
|
||||||
|
ray.org = 0.f;
|
||||||
|
|
||||||
|
// Poor man's perspective projection
|
||||||
|
ray.dir.x = px;
|
||||||
|
ray.dir.y = py;
|
||||||
|
ray.dir.z = -1.0;
|
||||||
|
vnormalize(ray.dir);
|
||||||
|
|
||||||
|
isect.t = 1.0e+17;
|
||||||
|
isect.hit = 0;
|
||||||
|
|
||||||
|
for (uniform int snum = 0; snum < 3; ++snum)
|
||||||
|
ray_sphere_intersect(isect, ray, spheres[snum]);
|
||||||
|
ray_plane_intersect(isect, ray, plane);
|
||||||
|
|
||||||
|
// Note use of 'coherent' if statement; the set of rays we
|
||||||
|
// trace will often all hit or all miss the scene
|
||||||
|
cif (isect.hit)
|
||||||
|
ret = ambient_occlusion(isect, plane, spheres, rngstate);
|
||||||
|
|
||||||
|
// This is a little grungy; we have results for
|
||||||
|
// programCount-worth of values. Because we're doing 2x2
|
||||||
|
// subsamples, we need to peel them off in groups of four,
|
||||||
|
// average the four values for each pixel, and update the
|
||||||
|
// output image.
|
||||||
|
//
|
||||||
|
// Store the varying value to a uniform array of the same size.
|
||||||
|
// See the discussion about communication among program
|
||||||
|
// instances in the ispc user's manual for more discussion on
|
||||||
|
// this idiom.
|
||||||
|
uniform float retArray[programCount];
|
||||||
|
retArray[programIndex] = ret;
|
||||||
|
|
||||||
|
// offset to the first pixel in the image
|
||||||
|
uniform int offset = 3 * (y * w + x);
|
||||||
|
for (uniform int p = 0; p < programCount; p += 4, offset += 3) {
|
||||||
|
// Get the four sample values for this pixel
|
||||||
|
uniform float sumret = retArray[p] + retArray[p+1] + retArray[p+2] +
|
||||||
|
retArray[p+3];
|
||||||
|
|
||||||
|
// Normalize by number of samples taken
|
||||||
|
sumret /= nsubsamples * nsubsamples;
|
||||||
|
|
||||||
|
// Store result in the image
|
||||||
|
image[offset+0] = sumret;
|
||||||
|
image[offset+1] = sumret;
|
||||||
|
image[offset+2] = sumret;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
export void ao_ispc(uniform int w, uniform int h, uniform int nsubsamples,
|
||||||
|
uniform float image[]) {
|
||||||
|
ao_scanlines(0, h, w, h, nsubsamples, image);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void task ao_task(uniform int width, uniform int height,
|
||||||
|
uniform int nsubsamples, uniform float image[]) {
|
||||||
|
ao_scanlines(taskIndex, taskIndex+1, width, height, nsubsamples, image);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
export void ao_ispc_tasks(uniform int w, uniform int h, uniform int nsubsamples,
|
||||||
|
uniform float image[]) {
|
||||||
|
launch[h] ao_task(w, h, nsubsamples, image);
|
||||||
|
}
|
||||||
174
examples_cuda/aobench_instrumented/aobench_instrumented.vcxproj
Normal file
174
examples_cuda/aobench_instrumented/aobench_instrumented.vcxproj
Normal file
@@ -0,0 +1,174 @@
|
|||||||
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
|
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
|
||||||
|
<ItemGroup Label="ProjectConfigurations">
|
||||||
|
<ProjectConfiguration Include="Debug|Win32">
|
||||||
|
<Configuration>Debug</Configuration>
|
||||||
|
<Platform>Win32</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Debug|x64">
|
||||||
|
<Configuration>Debug</Configuration>
|
||||||
|
<Platform>x64</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Release|Win32">
|
||||||
|
<Configuration>Release</Configuration>
|
||||||
|
<Platform>Win32</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Release|x64">
|
||||||
|
<Configuration>Release</Configuration>
|
||||||
|
<Platform>x64</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
</ItemGroup>
|
||||||
|
<ItemGroup>
|
||||||
|
<ClCompile Include="ao.cpp" />
|
||||||
|
<ClCompile Include="instrument.cpp" />
|
||||||
|
<ClCompile Include="../tasksys.cpp" />
|
||||||
|
</ItemGroup>
|
||||||
|
<ItemGroup>
|
||||||
|
<CustomBuild Include="ao.ispc">
|
||||||
|
<FileType>Document</FileType>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename)_instrumented.obj -h $(TargetDir)%(Filename)_instrumented_ispc.h --arch=x86 --instrument --target=sse2
|
||||||
|
</Command>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename)_instrumented.obj -h $(TargetDir)%(Filename)_instrumented_ispc.h --instrument --target=sse2
|
||||||
|
</Command>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">$(TargetDir)%(Filename)_instrumented.obj;$(TargetDir)%(Filename)_instrumented_ispc.h</Outputs>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">$(TargetDir)%(Filename)_instrumented.obj;$(TargetDir)%(Filename)_instrumented_ispc.h</Outputs>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename)_instrumented.obj -h $(TargetDir)%(Filename)_instrumented_ispc.h --arch=x86 --instrument --target=sse2
|
||||||
|
</Command>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Release|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename)_instrumented.obj -h $(TargetDir)%(Filename)_instrumented_ispc.h --instrument --target=sse2
|
||||||
|
</Command>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">$(TargetDir)%(Filename)_instrumented.obj;$(TargetDir)%(Filename)_instrumented_ispc.h</Outputs>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|x64'">$(TargetDir)%(Filename)_instrumented.obj;$(TargetDir)%(Filename)_instrumented_ispc.h</Outputs>
|
||||||
|
</CustomBuild>
|
||||||
|
</ItemGroup>
|
||||||
|
<PropertyGroup Label="Globals">
|
||||||
|
<ProjectGuid>{B3B4AE3D-6D5A-4CF9-AF5B-43CF2131B958}</ProjectGuid>
|
||||||
|
<Keyword>Win32Proj</Keyword>
|
||||||
|
<RootNamespace>aobench_instrumented</RootNamespace>
|
||||||
|
</PropertyGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>true</UseDebugLibraries>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>true</UseDebugLibraries>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>false</UseDebugLibraries>
|
||||||
|
<WholeProgramOptimization>true</WholeProgramOptimization>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>false</UseDebugLibraries>
|
||||||
|
<WholeProgramOptimization>true</WholeProgramOptimization>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
|
||||||
|
<ImportGroup Label="ExtensionSettings">
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="PropertySheets">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="PropertySheets">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<PropertyGroup Label="UserMacros" />
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<LinkIncremental>true</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
<PreBuildEventUseInBuild>true</PreBuildEventUseInBuild>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
|
||||||
|
<LinkIncremental>true</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
<PreBuildEventUseInBuild>true</PreBuildEventUseInBuild>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<LinkIncremental>false</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
<PreBuildEventUseInBuild>true</PreBuildEventUseInBuild>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
|
||||||
|
<LinkIncremental>false</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
<PreBuildEventUseInBuild>true</PreBuildEventUseInBuild>
|
||||||
|
</PropertyGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<ClCompile>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<Optimization>Disabled</Optimization>
|
||||||
|
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
|
||||||
|
<ClCompile>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<Optimization>Disabled</Optimization>
|
||||||
|
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<ClCompile>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<Optimization>MaxSpeed</Optimization>
|
||||||
|
<FunctionLevelLinking>true</FunctionLevelLinking>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
<EnableCOMDATFolding>true</EnableCOMDATFolding>
|
||||||
|
<OptimizeReferences>true</OptimizeReferences>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
|
||||||
|
<ClCompile>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<Optimization>MaxSpeed</Optimization>
|
||||||
|
<FunctionLevelLinking>true</FunctionLevelLinking>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
<EnableCOMDATFolding>true</EnableCOMDATFolding>
|
||||||
|
<OptimizeReferences>true</OptimizeReferences>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
|
||||||
|
<ImportGroup Label="ExtensionTargets">
|
||||||
|
</ImportGroup>
|
||||||
|
</Project>
|
||||||
94
examples_cuda/aobench_instrumented/instrument.cpp
Normal file
94
examples_cuda/aobench_instrumented/instrument.cpp
Normal file
@@ -0,0 +1,94 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2010-2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "instrument.h"
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <assert.h>
|
||||||
|
#include <string>
|
||||||
|
#include <map>
|
||||||
|
|
||||||
|
struct CallInfo {
|
||||||
|
CallInfo() { count = laneCount = allOff = 0; }
|
||||||
|
int count;
|
||||||
|
int laneCount;
|
||||||
|
int allOff;
|
||||||
|
};
|
||||||
|
|
||||||
|
static std::map<std::string, CallInfo> callInfo;
|
||||||
|
|
||||||
|
int countbits(int i) {
|
||||||
|
int ret = 0;
|
||||||
|
while (i) {
|
||||||
|
if (i & 0x1)
|
||||||
|
++ret;
|
||||||
|
i >>= 1;
|
||||||
|
}
|
||||||
|
return ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
// Callback function that ispc compiler emits calls to when --instrument
|
||||||
|
// command-line flag is given while compiling.
|
||||||
|
void
|
||||||
|
ISPCInstrument(const char *fn, const char *note, int line, uint64_t mask) {
|
||||||
|
char sline[16];
|
||||||
|
sprintf(sline, "%04d", line);
|
||||||
|
std::string s = std::string(fn) + std::string("(") + std::string(sline) +
|
||||||
|
std::string(") - ") + std::string(note);
|
||||||
|
|
||||||
|
// Find or create a CallInfo instance for this callsite.
|
||||||
|
CallInfo &ci = callInfo[s];
|
||||||
|
|
||||||
|
// And update its statistics...
|
||||||
|
++ci.count;
|
||||||
|
if (mask == 0)
|
||||||
|
++ci.allOff;
|
||||||
|
ci.laneCount += countbits(mask);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void
|
||||||
|
ISPCPrintInstrument() {
|
||||||
|
// When program execution is done, go through the stats and print them
|
||||||
|
// out. (This function is called by ao.cpp).
|
||||||
|
std::map<std::string, CallInfo>::iterator citer = callInfo.begin();
|
||||||
|
while (citer != callInfo.end()) {
|
||||||
|
CallInfo &ci = citer->second;
|
||||||
|
float activePct = 100.f * ci.laneCount / (4.f * ci.count);
|
||||||
|
float allOffPct = 100.f * ci.allOff / ci.count;
|
||||||
|
printf("%s: %d calls (%d / %.2f%% all off!), %.2f%% active lanes\n",
|
||||||
|
citer->first.c_str(), ci.count, ci.allOff, allOffPct,
|
||||||
|
activePct);
|
||||||
|
++citer;
|
||||||
|
}
|
||||||
|
}
|
||||||
45
examples_cuda/aobench_instrumented/instrument.h
Normal file
45
examples_cuda/aobench_instrumented/instrument.h
Normal file
@@ -0,0 +1,45 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2010-2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef INSTRUMENT_H
|
||||||
|
#define INSTRUMENT_H 1
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
|
||||||
|
extern "C" {
|
||||||
|
void ISPCInstrument(const char *fn, const char *note, int line, uint64_t mask);
|
||||||
|
}
|
||||||
|
|
||||||
|
void ISPCPrintInstrument();
|
||||||
|
|
||||||
|
#endif // INSTRUMENT_H
|
||||||
96
examples_cuda/common.mk
Normal file
96
examples_cuda/common.mk
Normal file
@@ -0,0 +1,96 @@
|
|||||||
|
|
||||||
|
TASK_CXX=../tasksys.cpp
|
||||||
|
TASK_LIB=-lpthread
|
||||||
|
TASK_OBJ=objs/tasksys.o
|
||||||
|
|
||||||
|
CXX=clang++
|
||||||
|
CXXFLAGS+=-Iobjs/ -O2
|
||||||
|
CC=clang
|
||||||
|
CCFLAGS+=-Iobjs/ -O2
|
||||||
|
|
||||||
|
LIBS=-lm $(TASK_LIB) -lstdc++
|
||||||
|
ISPC=ispc
|
||||||
|
ISPC_FLAGS+=-O2
|
||||||
|
ISPC_HEADER=objs/$(ISPC_SRC:.ispc=_ispc.h)
|
||||||
|
|
||||||
|
ARCH:=$(shell uname -m | sed -e s/x86_64/x86/ -e s/i686/x86/ -e s/arm.*/arm/ -e s/sa110/arm/)
|
||||||
|
|
||||||
|
ifeq ($(ARCH),x86)
|
||||||
|
ISPC_OBJS=$(addprefix objs/, $(ISPC_SRC:.ispc=)_ispc.o $(ISPC_SRC:.ispc=)_ispc_sse2.o \
|
||||||
|
$(ISPC_SRC:.ispc=)_ispc_sse4.o $(ISPC_SRC:.ispc=)_ispc_avx.o)
|
||||||
|
ISPC_TARGETS=$(ISPC_IA_TARGETS)
|
||||||
|
ARCH_BIT:=$(shell getconf LONG_BIT)
|
||||||
|
ifeq ($(ARCH_BIT),32)
|
||||||
|
ISPC_FLAGS += --arch=x86
|
||||||
|
CXXFLAGS += -m32
|
||||||
|
CCFLAGS += -m32
|
||||||
|
else
|
||||||
|
ISPC_FLAGS += --arch=x86-64
|
||||||
|
CXXFLAGS += -m64
|
||||||
|
CCFLAGS += -m64
|
||||||
|
endif
|
||||||
|
else ifeq ($(ARCH),arm)
|
||||||
|
ISPC_OBJS=$(addprefix objs/, $(ISPC_SRC:.ispc=_ispc.o))
|
||||||
|
ISPC_TARGETS=$(ISPC_ARM_TARGETS)
|
||||||
|
else
|
||||||
|
$(error Unknown architecture $(ARCH) from uname -m)
|
||||||
|
endif
|
||||||
|
|
||||||
|
CPP_OBJS=$(addprefix objs/, $(CPP_SRC:.cpp=.o))
|
||||||
|
CC_OBJS=$(addprefix objs/, $(CC_SRC:.c=.o))
|
||||||
|
OBJS=$(CPP_OBJS) $(CC_OBJS) $(TASK_OBJ) $(ISPC_OBJS)
|
||||||
|
|
||||||
|
default: $(EXAMPLE)
|
||||||
|
|
||||||
|
all: $(EXAMPLE) $(EXAMPLE)-sse4 $(EXAMPLE)-generic16 $(EXAMPLE)-scalar
|
||||||
|
|
||||||
|
.PHONY: dirs clean
|
||||||
|
|
||||||
|
dirs:
|
||||||
|
/bin/mkdir -p objs/
|
||||||
|
|
||||||
|
objs/%.cpp objs/%.o objs/%.h: dirs
|
||||||
|
|
||||||
|
clean:
|
||||||
|
/bin/rm -rf objs *~ $(EXAMPLE) $(EXAMPLE)-sse4 $(EXAMPLE)-generic16 ref test
|
||||||
|
|
||||||
|
$(EXAMPLE): $(OBJS)
|
||||||
|
$(CXX) $(CXXFLAGS) -o $@ $^ $(LIBS)
|
||||||
|
|
||||||
|
objs/%.o: %.cpp dirs $(ISPC_HEADER)
|
||||||
|
$(CXX) $< $(CXXFLAGS) -c -o $@
|
||||||
|
|
||||||
|
objs/%.o: %.c dirs $(ISPC_HEADER)
|
||||||
|
$(CC) $< $(CCFLAGS) -c -o $@
|
||||||
|
|
||||||
|
objs/%.o: ../%.cpp dirs
|
||||||
|
$(CXX) $< $(CXXFLAGS) -c -o $@
|
||||||
|
|
||||||
|
objs/$(EXAMPLE).o: objs/$(EXAMPLE)_ispc.h
|
||||||
|
|
||||||
|
objs/%_ispc.h objs/%_ispc.o objs/%_ispc_sse2.o objs/%_ispc_sse4.o objs/%_ispc_avx.o: %.ispc
|
||||||
|
$(ISPC) $(ISPC_FLAGS) --target=$(ISPC_TARGETS) $< -o objs/$*_ispc.o -h objs/$*_ispc.h
|
||||||
|
|
||||||
|
objs/$(ISPC_SRC:.ispc=)_sse4.cpp: $(ISPC_SRC)
|
||||||
|
$(ISPC) $(ISPC_FLAGS) $< -o $@ --target=generic-4 --emit-c++ --c++-include-file=sse4.h
|
||||||
|
|
||||||
|
objs/$(ISPC_SRC:.ispc=)_sse4.o: objs/$(ISPC_SRC:.ispc=)_sse4.cpp
|
||||||
|
$(CXX) -I../intrinsics -msse4.2 $< $(CXXFLAGS) -c -o $@
|
||||||
|
|
||||||
|
$(EXAMPLE)-sse4: $(CPP_OBJS) objs/$(ISPC_SRC:.ispc=)_sse4.o
|
||||||
|
$(CXX) $(CXXFLAGS) -o $@ $^ $(LIBS)
|
||||||
|
|
||||||
|
objs/$(ISPC_SRC:.ispc=)_generic16.cpp: $(ISPC_SRC)
|
||||||
|
$(ISPC) $(ISPC_FLAGS) $< -o $@ --target=generic-16 --emit-c++ --c++-include-file=generic-16.h
|
||||||
|
|
||||||
|
objs/$(ISPC_SRC:.ispc=)_generic16.o: objs/$(ISPC_SRC:.ispc=)_generic16.cpp
|
||||||
|
$(CXX) -I../intrinsics $< $(CXXFLAGS) -c -o $@
|
||||||
|
|
||||||
|
$(EXAMPLE)-generic16: $(CPP_OBJS) objs/$(ISPC_SRC:.ispc=)_generic16.o
|
||||||
|
$(CXX) $(CXXFLAGS) -o $@ $^ $(LIBS)
|
||||||
|
|
||||||
|
objs/$(ISPC_SRC:.ispc=)_scalar.o: $(ISPC_SRC)
|
||||||
|
$(ISPC) $(ISPC_FLAGS) $< -o $@ --target=generic-1
|
||||||
|
|
||||||
|
$(EXAMPLE)-scalar: $(CPP_OBJS) objs/$(ISPC_SRC:.ispc=)_scalar.o
|
||||||
|
$(CXX) $(CXXFLAGS) -o $@ $^ $(LIBS)
|
||||||
8
examples_cuda/deferred/Makefile
Normal file
8
examples_cuda/deferred/Makefile
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
|
||||||
|
EXAMPLE=deferred_shading
|
||||||
|
CPP_SRC=common.cpp main.cpp dynamic_c.cpp dynamic_cilk.cpp
|
||||||
|
ISPC_SRC=kernels.ispc
|
||||||
|
ISPC_IA_TARGETS=avx
|
||||||
|
ISPC_FLAGS=--opt=fast-math
|
||||||
|
|
||||||
|
include ../common.mk
|
||||||
210
examples_cuda/deferred/common.cpp
Normal file
210
examples_cuda/deferred/common.cpp
Normal file
@@ -0,0 +1,210 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef _MSC_VER
|
||||||
|
#define _CRT_SECURE_NO_WARNINGS
|
||||||
|
#define ISPC_IS_WINDOWS
|
||||||
|
#elif defined(__linux__)
|
||||||
|
#define ISPC_IS_LINUX
|
||||||
|
#elif defined(__APPLE__)
|
||||||
|
#define ISPC_IS_APPLE
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#include <fcntl.h>
|
||||||
|
#include <float.h>
|
||||||
|
#include <math.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <sys/types.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <algorithm>
|
||||||
|
#include <assert.h>
|
||||||
|
#include <vector>
|
||||||
|
#ifdef ISPC_IS_WINDOWS
|
||||||
|
#define WIN32_LEAN_AND_MEAN
|
||||||
|
#include <windows.h>
|
||||||
|
#endif
|
||||||
|
#ifdef ISPC_IS_LINUX
|
||||||
|
#include <malloc.h>
|
||||||
|
#endif
|
||||||
|
#include "deferred.h"
|
||||||
|
#include "../timing.h"
|
||||||
|
|
||||||
|
///////////////////////////////////////////////////////////////////////////
|
||||||
|
|
||||||
|
static void *
|
||||||
|
lAlignedMalloc(size_t size, int32_t alignment) {
|
||||||
|
#ifdef ISPC_IS_WINDOWS
|
||||||
|
return _aligned_malloc(size, alignment);
|
||||||
|
#endif
|
||||||
|
#ifdef ISPC_IS_LINUX
|
||||||
|
return memalign(alignment, size);
|
||||||
|
#endif
|
||||||
|
#ifdef ISPC_IS_APPLE
|
||||||
|
void *mem = malloc(size + (alignment-1) + sizeof(void*));
|
||||||
|
char *amem = ((char*)mem) + sizeof(void*);
|
||||||
|
amem = amem + uint32_t(alignment - (reinterpret_cast<uint64_t>(amem) &
|
||||||
|
(alignment - 1)));
|
||||||
|
((void**)amem)[-1] = mem;
|
||||||
|
return amem;
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void
|
||||||
|
lAlignedFree(void *ptr) {
|
||||||
|
#ifdef ISPC_IS_WINDOWS
|
||||||
|
_aligned_free(ptr);
|
||||||
|
#endif
|
||||||
|
#ifdef ISPC_IS_LINUX
|
||||||
|
free(ptr);
|
||||||
|
#endif
|
||||||
|
#ifdef ISPC_IS_APPLE
|
||||||
|
free(((void**)ptr)[-1]);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
Framebuffer::Framebuffer(int width, int height) {
|
||||||
|
nPixels = width*height;
|
||||||
|
r = (uint8_t *)lAlignedMalloc(nPixels, ALIGNMENT_BYTES);
|
||||||
|
g = (uint8_t *)lAlignedMalloc(nPixels, ALIGNMENT_BYTES);
|
||||||
|
b = (uint8_t *)lAlignedMalloc(nPixels, ALIGNMENT_BYTES);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
Framebuffer::~Framebuffer() {
|
||||||
|
lAlignedFree(r);
|
||||||
|
lAlignedFree(g);
|
||||||
|
lAlignedFree(b);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void
|
||||||
|
Framebuffer::clear() {
|
||||||
|
memset(r, 0, nPixels);
|
||||||
|
memset(g, 0, nPixels);
|
||||||
|
memset(b, 0, nPixels);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
InputData *
|
||||||
|
CreateInputDataFromFile(const char *path) {
|
||||||
|
FILE *in = fopen(path, "rb");
|
||||||
|
if (!in) return 0;
|
||||||
|
|
||||||
|
InputData *input = new InputData;
|
||||||
|
|
||||||
|
// Load header
|
||||||
|
if (fread(&input->header, sizeof(ispc::InputHeader), 1, in) != 1) {
|
||||||
|
fprintf(stderr, "Preumature EOF reading file \"%s\"\n", path);
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Load data chunk and update pointers
|
||||||
|
input->chunk = (uint8_t *)lAlignedMalloc(input->header.inputDataChunkSize,
|
||||||
|
ALIGNMENT_BYTES);
|
||||||
|
if (fread(input->chunk, input->header.inputDataChunkSize, 1, in) != 1) {
|
||||||
|
fprintf(stderr, "Preumature EOF reading file \"%s\"\n", path);
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
|
||||||
|
input->arrays.zBuffer =
|
||||||
|
(float *)&input->chunk[input->header.inputDataArrayOffsets[idaZBuffer]];
|
||||||
|
input->arrays.normalEncoded_x =
|
||||||
|
(uint16_t *)&input->chunk[input->header.inputDataArrayOffsets[idaNormalEncoded_x]];
|
||||||
|
input->arrays.normalEncoded_y =
|
||||||
|
(uint16_t *)&input->chunk[input->header.inputDataArrayOffsets[idaNormalEncoded_y]];
|
||||||
|
input->arrays.specularAmount =
|
||||||
|
(uint16_t *)&input->chunk[input->header.inputDataArrayOffsets[idaSpecularAmount]];
|
||||||
|
input->arrays.specularPower =
|
||||||
|
(uint16_t *)&input->chunk[input->header.inputDataArrayOffsets[idaSpecularPower]];
|
||||||
|
input->arrays.albedo_x =
|
||||||
|
(uint8_t *)&input->chunk[input->header.inputDataArrayOffsets[idaAlbedo_x]];
|
||||||
|
input->arrays.albedo_y =
|
||||||
|
(uint8_t *)&input->chunk[input->header.inputDataArrayOffsets[idaAlbedo_y]];
|
||||||
|
input->arrays.albedo_z =
|
||||||
|
(uint8_t *)&input->chunk[input->header.inputDataArrayOffsets[idaAlbedo_z]];
|
||||||
|
input->arrays.lightPositionView_x =
|
||||||
|
(float *)&input->chunk[input->header.inputDataArrayOffsets[idaLightPositionView_x]];
|
||||||
|
input->arrays.lightPositionView_y =
|
||||||
|
(float *)&input->chunk[input->header.inputDataArrayOffsets[idaLightPositionView_y]];
|
||||||
|
input->arrays.lightPositionView_z =
|
||||||
|
(float *)&input->chunk[input->header.inputDataArrayOffsets[idaLightPositionView_z]];
|
||||||
|
input->arrays.lightAttenuationBegin =
|
||||||
|
(float *)&input->chunk[input->header.inputDataArrayOffsets[idaLightAttenuationBegin]];
|
||||||
|
input->arrays.lightColor_x =
|
||||||
|
(float *)&input->chunk[input->header.inputDataArrayOffsets[idaLightColor_x]];
|
||||||
|
input->arrays.lightColor_y =
|
||||||
|
(float *)&input->chunk[input->header.inputDataArrayOffsets[idaLightColor_y]];
|
||||||
|
input->arrays.lightColor_z =
|
||||||
|
(float *)&input->chunk[input->header.inputDataArrayOffsets[idaLightColor_z]];
|
||||||
|
input->arrays.lightAttenuationEnd =
|
||||||
|
(float *)&input->chunk[input->header.inputDataArrayOffsets[idaLightAttenuationEnd]];
|
||||||
|
|
||||||
|
fclose(in);
|
||||||
|
return input;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void DeleteInputData(InputData *input) {
|
||||||
|
lAlignedFree(input->chunk);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void WriteFrame(const char *filename, const InputData *input,
|
||||||
|
const Framebuffer &framebuffer) {
|
||||||
|
// Deswizzle and copy to RGBA output
|
||||||
|
// Doesn't need to be fast... only happens once
|
||||||
|
size_t imageBytes = 3 * input->header.framebufferWidth *
|
||||||
|
input->header.framebufferHeight;
|
||||||
|
uint8_t* framebufferAOS = (uint8_t *)lAlignedMalloc(imageBytes, ALIGNMENT_BYTES);
|
||||||
|
memset(framebufferAOS, 0, imageBytes);
|
||||||
|
|
||||||
|
for (int i = 0; i < input->header.framebufferWidth *
|
||||||
|
input->header.framebufferHeight; ++i) {
|
||||||
|
framebufferAOS[3 * i + 0] = framebuffer.r[i];
|
||||||
|
framebufferAOS[3 * i + 1] = framebuffer.g[i];
|
||||||
|
framebufferAOS[3 * i + 2] = framebuffer.b[i];
|
||||||
|
}
|
||||||
|
|
||||||
|
// Write out simple PPM file
|
||||||
|
FILE *out = fopen(filename, "wb");
|
||||||
|
fprintf(out, "P6 %d %d 255\n", input->header.framebufferWidth,
|
||||||
|
input->header.framebufferHeight);
|
||||||
|
fwrite(framebufferAOS, imageBytes, 1, out);
|
||||||
|
fclose(out);
|
||||||
|
|
||||||
|
lAlignedFree(framebufferAOS);
|
||||||
|
}
|
||||||
BIN
examples_cuda/deferred/data/pp1280x720.bin
Normal file
BIN
examples_cuda/deferred/data/pp1280x720.bin
Normal file
Binary file not shown.
BIN
examples_cuda/deferred/data/pp1920x1200.bin
Normal file
BIN
examples_cuda/deferred/data/pp1920x1200.bin
Normal file
Binary file not shown.
BIN
examples_cuda/deferred/deferred-ispc-static.ppm
Normal file
BIN
examples_cuda/deferred/deferred-ispc-static.ppm
Normal file
Binary file not shown.
BIN
examples_cuda/deferred/deferred-serial-dynamic.ppm
Normal file
BIN
examples_cuda/deferred/deferred-serial-dynamic.ppm
Normal file
Binary file not shown.
108
examples_cuda/deferred/deferred.h
Normal file
108
examples_cuda/deferred/deferred.h
Normal file
@@ -0,0 +1,108 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef DEFERRED_H
|
||||||
|
#define DEFERRED_H
|
||||||
|
|
||||||
|
// Currently tile widths must be a multiple of SIMD width (i.e. 8 for ispc sse4x2)!
|
||||||
|
#define MIN_TILE_WIDTH 16
|
||||||
|
#define MIN_TILE_HEIGHT 16
|
||||||
|
#define MAX_LIGHTS 1024
|
||||||
|
|
||||||
|
enum InputDataArraysEnum {
|
||||||
|
idaZBuffer = 0,
|
||||||
|
idaNormalEncoded_x,
|
||||||
|
idaNormalEncoded_y,
|
||||||
|
idaSpecularAmount,
|
||||||
|
idaSpecularPower,
|
||||||
|
idaAlbedo_x,
|
||||||
|
idaAlbedo_y,
|
||||||
|
idaAlbedo_z,
|
||||||
|
idaLightPositionView_x,
|
||||||
|
idaLightPositionView_y,
|
||||||
|
idaLightPositionView_z,
|
||||||
|
idaLightAttenuationBegin,
|
||||||
|
idaLightColor_x,
|
||||||
|
idaLightColor_y,
|
||||||
|
idaLightColor_z,
|
||||||
|
idaLightAttenuationEnd,
|
||||||
|
|
||||||
|
idaNum
|
||||||
|
};
|
||||||
|
|
||||||
|
#ifndef ISPC
|
||||||
|
|
||||||
|
#include <stdint.h>
|
||||||
|
#include "kernels_ispc.h"
|
||||||
|
|
||||||
|
#define ALIGNMENT_BYTES 64
|
||||||
|
|
||||||
|
#define MAX_LIGHTS 1024
|
||||||
|
|
||||||
|
#define VISUALIZE_LIGHT_COUNT 0
|
||||||
|
|
||||||
|
struct InputData
|
||||||
|
{
|
||||||
|
ispc::InputHeader header;
|
||||||
|
ispc::InputDataArrays arrays;
|
||||||
|
uint8_t *chunk;
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
struct Framebuffer {
|
||||||
|
Framebuffer(int width, int height);
|
||||||
|
~Framebuffer();
|
||||||
|
|
||||||
|
void clear();
|
||||||
|
|
||||||
|
uint8_t *r, *g, *b;
|
||||||
|
|
||||||
|
private:
|
||||||
|
int nPixels;
|
||||||
|
Framebuffer(const Framebuffer &);
|
||||||
|
Framebuffer &operator=(const Framebuffer *);
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
InputData *CreateInputDataFromFile(const char *path);
|
||||||
|
void DeleteInputData(InputData *input);
|
||||||
|
void WriteFrame(const char *filename, const InputData *input,
|
||||||
|
const Framebuffer &framebuffer);
|
||||||
|
void InitDynamicC(InputData *input);
|
||||||
|
void InitDynamicCilk(InputData *input);
|
||||||
|
void DispatchDynamicC(InputData *input, Framebuffer *framebuffer);
|
||||||
|
void DispatchDynamicCilk(InputData *input, Framebuffer *framebuffer);
|
||||||
|
|
||||||
|
#endif // !ISPC
|
||||||
|
|
||||||
|
#endif // DEFERRED_H
|
||||||
BIN
examples_cuda/deferred/deferred_shading
Executable file
BIN
examples_cuda/deferred/deferred_shading
Executable file
Binary file not shown.
178
examples_cuda/deferred/deferred_shading.vcxproj
Executable file
178
examples_cuda/deferred/deferred_shading.vcxproj
Executable file
@@ -0,0 +1,178 @@
|
|||||||
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
|
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
|
||||||
|
<ItemGroup Label="ProjectConfigurations">
|
||||||
|
<ProjectConfiguration Include="Debug|Win32">
|
||||||
|
<Configuration>Debug</Configuration>
|
||||||
|
<Platform>Win32</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Debug|x64">
|
||||||
|
<Configuration>Debug</Configuration>
|
||||||
|
<Platform>x64</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Release|Win32">
|
||||||
|
<Configuration>Release</Configuration>
|
||||||
|
<Platform>Win32</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Release|x64">
|
||||||
|
<Configuration>Release</Configuration>
|
||||||
|
<Platform>x64</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
</ItemGroup>
|
||||||
|
<PropertyGroup Label="Globals">
|
||||||
|
<ProjectGuid>{87f53c53-957e-4e91-878a-bc27828fb9eb}</ProjectGuid>
|
||||||
|
<Keyword>Win32Proj</Keyword>
|
||||||
|
<RootNamespace>mandelbrot</RootNamespace>
|
||||||
|
</PropertyGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>true</UseDebugLibraries>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>true</UseDebugLibraries>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>false</UseDebugLibraries>
|
||||||
|
<WholeProgramOptimization>true</WholeProgramOptimization>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>false</UseDebugLibraries>
|
||||||
|
<WholeProgramOptimization>true</WholeProgramOptimization>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
|
||||||
|
<ImportGroup Label="ExtensionSettings">
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="PropertySheets">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="PropertySheets">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<PropertyGroup Label="UserMacros" />
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<LinkIncremental>true</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
|
||||||
|
<LinkIncremental>true</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<LinkIncremental>false</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
|
||||||
|
<LinkIncremental>false</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
</PropertyGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<ClCompile>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<Optimization>Disabled</Optimization>
|
||||||
|
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
|
||||||
|
<ClCompile>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<Optimization>Disabled</Optimization>
|
||||||
|
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<ClCompile>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<Optimization>MaxSpeed</Optimization>
|
||||||
|
<FunctionLevelLinking>true</FunctionLevelLinking>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
<EnableCOMDATFolding>true</EnableCOMDATFolding>
|
||||||
|
<OptimizeReferences>true</OptimizeReferences>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
|
||||||
|
<ClCompile>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<Optimization>MaxSpeed</Optimization>
|
||||||
|
<FunctionLevelLinking>true</FunctionLevelLinking>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
<EnableCOMDATFolding>true</EnableCOMDATFolding>
|
||||||
|
<OptimizeReferences>true</OptimizeReferences>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemGroup>
|
||||||
|
<ClCompile Include="common.cpp" />
|
||||||
|
<ClCompile Include="dynamic_c.cpp" />
|
||||||
|
<ClCompile Include="dynamic_cilk.cpp" />
|
||||||
|
<ClCompile Include="main.cpp" />
|
||||||
|
<ClCompile Include="../tasksys.cpp" />
|
||||||
|
</ItemGroup>
|
||||||
|
<ItemGroup>
|
||||||
|
<CustomBuild Include="kernels.ispc">
|
||||||
|
<FileType>Document</FileType>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --arch=x86 --target=sse2,sse4-x2,avx-x2
|
||||||
|
</Command>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --target=sse2,sse4-x2,avx-x2
|
||||||
|
</Command>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --arch=x86 --target=sse2,sse4-x2,avx-x2
|
||||||
|
</Command>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Release|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --target=sse2,sse4-x2,avx-x2
|
||||||
|
</Command>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|x64'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
</CustomBuild>
|
||||||
|
</ItemGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
|
||||||
|
<ImportGroup Label="ExtensionTargets">
|
||||||
|
</ImportGroup>
|
||||||
|
</Project>
|
||||||
870
examples_cuda/deferred/dynamic_c.cpp
Normal file
870
examples_cuda/deferred/dynamic_c.cpp
Normal file
@@ -0,0 +1,870 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "deferred.h"
|
||||||
|
#include "kernels_ispc.h"
|
||||||
|
#include <algorithm>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <assert.h>
|
||||||
|
#include <math.h>
|
||||||
|
|
||||||
|
#ifdef _MSC_VER
|
||||||
|
#define ISPC_IS_WINDOWS
|
||||||
|
#elif defined(__linux__)
|
||||||
|
#define ISPC_IS_LINUX
|
||||||
|
#elif defined(__APPLE__)
|
||||||
|
#define ISPC_IS_APPLE
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef ISPC_IS_LINUX
|
||||||
|
#include <malloc.h>
|
||||||
|
#endif // ISPC_IS_LINUX
|
||||||
|
|
||||||
|
// Currently tile widths must be a multiple of SIMD width (i.e. 8 for ispc sse4x2)!
|
||||||
|
#define MIN_TILE_WIDTH 16
|
||||||
|
#define MIN_TILE_HEIGHT 16
|
||||||
|
|
||||||
|
|
||||||
|
#define DYNAMIC_TREE_LEVELS 5
|
||||||
|
// If this is set to 1 then the result will be identical to the static version
|
||||||
|
#define DYNAMIC_MIN_LIGHTS_TO_SUBDIVIDE 1
|
||||||
|
|
||||||
|
static void *
|
||||||
|
lAlignedMalloc(size_t size, int32_t alignment) {
|
||||||
|
#ifdef ISPC_IS_WINDOWS
|
||||||
|
return _aligned_malloc(size, alignment);
|
||||||
|
#endif
|
||||||
|
#ifdef ISPC_IS_LINUX
|
||||||
|
return memalign(alignment, size);
|
||||||
|
#endif
|
||||||
|
#ifdef ISPC_IS_APPLE
|
||||||
|
void *mem = malloc(size + (alignment-1) + sizeof(void*));
|
||||||
|
char *amem = ((char*)mem) + sizeof(void*);
|
||||||
|
amem = amem + uint32_t(alignment - (reinterpret_cast<uint64_t>(amem) &
|
||||||
|
(alignment - 1)));
|
||||||
|
((void**)amem)[-1] = mem;
|
||||||
|
return amem;
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void
|
||||||
|
lAlignedFree(void *ptr) {
|
||||||
|
#ifdef ISPC_IS_WINDOWS
|
||||||
|
_aligned_free(ptr);
|
||||||
|
#endif
|
||||||
|
#ifdef ISPC_IS_LINUX
|
||||||
|
free(ptr);
|
||||||
|
#endif
|
||||||
|
#ifdef ISPC_IS_APPLE
|
||||||
|
free(((void**)ptr)[-1]);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void
|
||||||
|
ComputeZBounds(int tileStartX, int tileEndX,
|
||||||
|
int tileStartY, int tileEndY,
|
||||||
|
// G-buffer data
|
||||||
|
float zBuffer[],
|
||||||
|
int gBufferWidth,
|
||||||
|
// Camera data
|
||||||
|
float cameraProj_33, float cameraProj_43,
|
||||||
|
float cameraNear, float cameraFar,
|
||||||
|
// Output
|
||||||
|
float *minZ, float *maxZ)
|
||||||
|
{
|
||||||
|
// Find Z bounds
|
||||||
|
float laneMinZ = cameraFar;
|
||||||
|
float laneMaxZ = cameraNear;
|
||||||
|
for (int y = tileStartY; y < tileEndY; ++y) {
|
||||||
|
for (int x = tileStartX; x < tileEndX; ++x) {
|
||||||
|
// Unproject depth buffer Z value into view space
|
||||||
|
float z = zBuffer[(y * gBufferWidth + x)];
|
||||||
|
float viewSpaceZ = cameraProj_43 / (z - cameraProj_33);
|
||||||
|
|
||||||
|
// Work out Z bounds for our samples
|
||||||
|
// Avoid considering skybox/background or otherwise invalid pixels
|
||||||
|
if ((viewSpaceZ < cameraFar) && (viewSpaceZ >= cameraNear)) {
|
||||||
|
laneMinZ = std::min(laneMinZ, viewSpaceZ);
|
||||||
|
laneMaxZ = std::max(laneMaxZ, viewSpaceZ);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
*minZ = laneMinZ;
|
||||||
|
*maxZ = laneMaxZ;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void
|
||||||
|
ComputeZBoundsRow(int tileY, int tileWidth, int tileHeight,
|
||||||
|
int numTilesX, int numTilesY,
|
||||||
|
// G-buffer data
|
||||||
|
float zBuffer[],
|
||||||
|
int gBufferWidth,
|
||||||
|
// Camera data
|
||||||
|
float cameraProj_33, float cameraProj_43,
|
||||||
|
float cameraNear, float cameraFar,
|
||||||
|
// Output
|
||||||
|
float minZArray[],
|
||||||
|
float maxZArray[])
|
||||||
|
{
|
||||||
|
for (int tileX = 0; tileX < numTilesX; ++tileX) {
|
||||||
|
float minZ, maxZ;
|
||||||
|
ComputeZBounds(tileX * tileWidth, tileX * tileWidth + tileWidth,
|
||||||
|
tileY * tileHeight, tileY * tileHeight + tileHeight,
|
||||||
|
zBuffer, gBufferWidth, cameraProj_33, cameraProj_43,
|
||||||
|
cameraNear, cameraFar, &minZ, &maxZ);
|
||||||
|
minZArray[tileX] = minZ;
|
||||||
|
maxZArray[tileX] = maxZ;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class MinMaxZTree
|
||||||
|
{
|
||||||
|
public:
|
||||||
|
// Currently (min) tile dimensions must divide gBuffer dimensions evenly
|
||||||
|
// Levels must be small enough that neither dimension goes below one tile
|
||||||
|
MinMaxZTree(
|
||||||
|
int tileWidth, int tileHeight, int levels,
|
||||||
|
int gBufferWidth, int gBufferHeight)
|
||||||
|
: mTileWidth(tileWidth), mTileHeight(tileHeight), mLevels(levels)
|
||||||
|
{
|
||||||
|
mNumTilesX = gBufferWidth / mTileWidth;
|
||||||
|
mNumTilesY = gBufferHeight / mTileHeight;
|
||||||
|
|
||||||
|
// Allocate arrays
|
||||||
|
mMinZArrays = (float **)lAlignedMalloc(sizeof(float *) * mLevels, 16);
|
||||||
|
mMaxZArrays = (float **)lAlignedMalloc(sizeof(float *) * mLevels, 16);
|
||||||
|
for (int i = 0; i < mLevels; ++i) {
|
||||||
|
int x = NumTilesX(i);
|
||||||
|
int y = NumTilesY(i);
|
||||||
|
assert(x > 0);
|
||||||
|
assert(y > 0);
|
||||||
|
// NOTE: If the following two asserts fire it probably means that
|
||||||
|
// the base tile dimensions do not evenly divide the G-buffer dimensions
|
||||||
|
assert(x * (mTileWidth << i) >= gBufferWidth);
|
||||||
|
assert(y * (mTileHeight << i) >= gBufferHeight);
|
||||||
|
mMinZArrays[i] = (float *)lAlignedMalloc(sizeof(float) * x * y, 16);
|
||||||
|
mMaxZArrays[i] = (float *)lAlignedMalloc(sizeof(float) * x * y, 16);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void Update(float *zBuffer, int gBufferPitchInElements,
|
||||||
|
float cameraProj_33, float cameraProj_43,
|
||||||
|
float cameraNear, float cameraFar)
|
||||||
|
{
|
||||||
|
for (int tileY = 0; tileY < mNumTilesY; ++tileY) {
|
||||||
|
ComputeZBoundsRow(tileY, mTileWidth, mTileHeight, mNumTilesX, mNumTilesY,
|
||||||
|
zBuffer, gBufferPitchInElements,
|
||||||
|
cameraProj_33, cameraProj_43, cameraNear, cameraFar,
|
||||||
|
mMinZArrays[0] + (tileY * mNumTilesX),
|
||||||
|
mMaxZArrays[0] + (tileY * mNumTilesX));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Generate other levels
|
||||||
|
for (int level = 1; level < mLevels; ++level) {
|
||||||
|
int destTilesX = NumTilesX(level);
|
||||||
|
int destTilesY = NumTilesY(level);
|
||||||
|
int srcLevel = level - 1;
|
||||||
|
int srcTilesX = NumTilesX(srcLevel);
|
||||||
|
int srcTilesY = NumTilesY(srcLevel);
|
||||||
|
for (int y = 0; y < destTilesY; ++y) {
|
||||||
|
for (int x = 0; x < destTilesX; ++x) {
|
||||||
|
int srcX = x << 1;
|
||||||
|
int srcY = y << 1;
|
||||||
|
// NOTE: Ugly branches to deal with non-multiple dimensions at some levels
|
||||||
|
// TODO: SSE branchless min/max is probably better...
|
||||||
|
float minZ = mMinZArrays[srcLevel][(srcY) * srcTilesX + (srcX)];
|
||||||
|
float maxZ = mMaxZArrays[srcLevel][(srcY) * srcTilesX + (srcX)];
|
||||||
|
if (srcX + 1 < srcTilesX) {
|
||||||
|
minZ = std::min(minZ, mMinZArrays[srcLevel][(srcY) * srcTilesX +
|
||||||
|
(srcX + 1)]);
|
||||||
|
maxZ = std::max(maxZ, mMaxZArrays[srcLevel][(srcY) * srcTilesX +
|
||||||
|
(srcX + 1)]);
|
||||||
|
if (srcY + 1 < srcTilesY) {
|
||||||
|
minZ = std::min(minZ, mMinZArrays[srcLevel][(srcY + 1) * srcTilesX +
|
||||||
|
(srcX + 1)]);
|
||||||
|
maxZ = std::max(maxZ, mMaxZArrays[srcLevel][(srcY + 1) * srcTilesX +
|
||||||
|
(srcX + 1)]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (srcY + 1 < srcTilesY) {
|
||||||
|
minZ = std::min(minZ, mMinZArrays[srcLevel][(srcY + 1) * srcTilesX +
|
||||||
|
(srcX )]);
|
||||||
|
maxZ = std::max(maxZ, mMaxZArrays[srcLevel][(srcY + 1) * srcTilesX +
|
||||||
|
(srcX )]);
|
||||||
|
}
|
||||||
|
mMinZArrays[level][y * destTilesX + x] = minZ;
|
||||||
|
mMaxZArrays[level][y * destTilesX + x] = maxZ;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
~MinMaxZTree() {
|
||||||
|
for (int i = 0; i < mLevels; ++i) {
|
||||||
|
lAlignedFree(mMinZArrays[i]);
|
||||||
|
lAlignedFree(mMaxZArrays[i]);
|
||||||
|
}
|
||||||
|
lAlignedFree(mMinZArrays);
|
||||||
|
lAlignedFree(mMaxZArrays);
|
||||||
|
}
|
||||||
|
|
||||||
|
int Levels() const { return mLevels; }
|
||||||
|
|
||||||
|
// These round UP, so beware that the last tile for a given level may not be completely full
|
||||||
|
// TODO: Verify this...
|
||||||
|
int NumTilesX(int level = 0) const { return (mNumTilesX + (1 << level) - 1) >> level; }
|
||||||
|
int NumTilesY(int level = 0) const { return (mNumTilesY + (1 << level) - 1) >> level; }
|
||||||
|
int TileWidth(int level = 0) const { return (mTileWidth << level); }
|
||||||
|
int TileHeight(int level = 0) const { return (mTileHeight << level); }
|
||||||
|
|
||||||
|
float MinZ(int level, int tileX, int tileY) const {
|
||||||
|
return mMinZArrays[level][tileY * NumTilesX(level) + tileX];
|
||||||
|
}
|
||||||
|
float MaxZ(int level, int tileX, int tileY) const {
|
||||||
|
return mMaxZArrays[level][tileY * NumTilesX(level) + tileX];
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
int mTileWidth;
|
||||||
|
int mTileHeight;
|
||||||
|
int mLevels;
|
||||||
|
int mNumTilesX;
|
||||||
|
int mNumTilesY;
|
||||||
|
|
||||||
|
// One array for each "level" in the tree
|
||||||
|
float **mMinZArrays;
|
||||||
|
float **mMaxZArrays;
|
||||||
|
};
|
||||||
|
|
||||||
|
static MinMaxZTree *gMinMaxZTree = 0;
|
||||||
|
|
||||||
|
void InitDynamicC(InputData *input) {
|
||||||
|
gMinMaxZTree =
|
||||||
|
new MinMaxZTree(MIN_TILE_WIDTH, MIN_TILE_HEIGHT, DYNAMIC_TREE_LEVELS,
|
||||||
|
input->header.framebufferWidth,
|
||||||
|
input->header.framebufferHeight);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/* We're going to split a tile into 4 sub-tiles. This function
|
||||||
|
reclassifies the tile's lights with respect to the sub-tiles. */
|
||||||
|
static void
|
||||||
|
SplitTileMinMax(
|
||||||
|
int tileMidX, int tileMidY,
|
||||||
|
// Subtile data (00, 10, 01, 11)
|
||||||
|
float subtileMinZ[],
|
||||||
|
float subtileMaxZ[],
|
||||||
|
// G-buffer data
|
||||||
|
int gBufferWidth, int gBufferHeight,
|
||||||
|
// Camera data
|
||||||
|
float cameraProj_11, float cameraProj_22,
|
||||||
|
// Light Data
|
||||||
|
int lightIndices[],
|
||||||
|
int numLights,
|
||||||
|
float light_positionView_x_array[],
|
||||||
|
float light_positionView_y_array[],
|
||||||
|
float light_positionView_z_array[],
|
||||||
|
float light_attenuationEnd_array[],
|
||||||
|
// Outputs
|
||||||
|
int subtileIndices[],
|
||||||
|
int subtileIndicesPitch,
|
||||||
|
int subtileNumLights[]
|
||||||
|
)
|
||||||
|
{
|
||||||
|
float gBufferScale_x = 0.5f * (float)gBufferWidth;
|
||||||
|
float gBufferScale_y = 0.5f * (float)gBufferHeight;
|
||||||
|
|
||||||
|
float frustumPlanes_xy[2] = { -(cameraProj_11 * gBufferScale_x),
|
||||||
|
(cameraProj_22 * gBufferScale_y) };
|
||||||
|
float frustumPlanes_z[2] = { tileMidX - gBufferScale_x,
|
||||||
|
tileMidY - gBufferScale_y };
|
||||||
|
|
||||||
|
for (int i = 0; i < 2; ++i) {
|
||||||
|
// Normalize
|
||||||
|
float norm = 1.f / sqrtf(frustumPlanes_xy[i] * frustumPlanes_xy[i] +
|
||||||
|
frustumPlanes_z[i] * frustumPlanes_z[i]);
|
||||||
|
frustumPlanes_xy[i] *= norm;
|
||||||
|
frustumPlanes_z[i] *= norm;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initialize
|
||||||
|
int subtileLightOffset[4];
|
||||||
|
subtileLightOffset[0] = 0 * subtileIndicesPitch;
|
||||||
|
subtileLightOffset[1] = 1 * subtileIndicesPitch;
|
||||||
|
subtileLightOffset[2] = 2 * subtileIndicesPitch;
|
||||||
|
subtileLightOffset[3] = 3 * subtileIndicesPitch;
|
||||||
|
|
||||||
|
for (int i = 0; i < numLights; ++i) {
|
||||||
|
int lightIndex = lightIndices[i];
|
||||||
|
|
||||||
|
float light_positionView_x = light_positionView_x_array[lightIndex];
|
||||||
|
float light_positionView_y = light_positionView_y_array[lightIndex];
|
||||||
|
float light_positionView_z = light_positionView_z_array[lightIndex];
|
||||||
|
float light_attenuationEnd = light_attenuationEnd_array[lightIndex];
|
||||||
|
float light_attenuationEndNeg = -light_attenuationEnd;
|
||||||
|
|
||||||
|
// Test lights again against subtile z bounds
|
||||||
|
bool inFrustum[4];
|
||||||
|
inFrustum[0] = (light_positionView_z - subtileMinZ[0] >= light_attenuationEndNeg) &&
|
||||||
|
(subtileMaxZ[0] - light_positionView_z >= light_attenuationEndNeg);
|
||||||
|
inFrustum[1] = (light_positionView_z - subtileMinZ[1] >= light_attenuationEndNeg) &&
|
||||||
|
(subtileMaxZ[1] - light_positionView_z >= light_attenuationEndNeg);
|
||||||
|
inFrustum[2] = (light_positionView_z - subtileMinZ[2] >= light_attenuationEndNeg) &&
|
||||||
|
(subtileMaxZ[2] - light_positionView_z >= light_attenuationEndNeg);
|
||||||
|
inFrustum[3] = (light_positionView_z - subtileMinZ[3] >= light_attenuationEndNeg) &&
|
||||||
|
(subtileMaxZ[3] - light_positionView_z >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
float dx = light_positionView_z * frustumPlanes_z[0] +
|
||||||
|
light_positionView_x * frustumPlanes_xy[0];
|
||||||
|
float dy = light_positionView_z * frustumPlanes_z[1] +
|
||||||
|
light_positionView_y * frustumPlanes_xy[1];
|
||||||
|
|
||||||
|
if (fabsf(dx) > light_attenuationEnd) {
|
||||||
|
bool positiveX = dx > 0.0f;
|
||||||
|
inFrustum[0] = inFrustum[0] && positiveX; // 00 subtile
|
||||||
|
inFrustum[1] = inFrustum[1] && !positiveX; // 10 subtile
|
||||||
|
inFrustum[2] = inFrustum[2] && positiveX; // 01 subtile
|
||||||
|
inFrustum[3] = inFrustum[3] && !positiveX; // 11 subtile
|
||||||
|
}
|
||||||
|
if (fabsf(dy) > light_attenuationEnd) {
|
||||||
|
bool positiveY = dy > 0.0f;
|
||||||
|
inFrustum[0] = inFrustum[0] && positiveY; // 00 subtile
|
||||||
|
inFrustum[1] = inFrustum[1] && positiveY; // 10 subtile
|
||||||
|
inFrustum[2] = inFrustum[2] && !positiveY; // 01 subtile
|
||||||
|
inFrustum[3] = inFrustum[3] && !positiveY; // 11 subtile
|
||||||
|
}
|
||||||
|
|
||||||
|
if (inFrustum[0])
|
||||||
|
subtileIndices[subtileLightOffset[0]++] = lightIndex;
|
||||||
|
if (inFrustum[1])
|
||||||
|
subtileIndices[subtileLightOffset[1]++] = lightIndex;
|
||||||
|
if (inFrustum[2])
|
||||||
|
subtileIndices[subtileLightOffset[2]++] = lightIndex;
|
||||||
|
if (inFrustum[3])
|
||||||
|
subtileIndices[subtileLightOffset[3]++] = lightIndex;
|
||||||
|
}
|
||||||
|
|
||||||
|
subtileNumLights[0] = subtileLightOffset[0] - 0 * subtileIndicesPitch;
|
||||||
|
subtileNumLights[1] = subtileLightOffset[1] - 1 * subtileIndicesPitch;
|
||||||
|
subtileNumLights[2] = subtileLightOffset[2] - 2 * subtileIndicesPitch;
|
||||||
|
subtileNumLights[3] = subtileLightOffset[3] - 3 * subtileIndicesPitch;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline float
|
||||||
|
dot3(float x, float y, float z, float a, float b, float c) {
|
||||||
|
return (x*a + y*b + z*c);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline void
|
||||||
|
normalize3(float x, float y, float z, float &ox, float &oy, float &oz) {
|
||||||
|
float n = 1.f / sqrtf(x*x + y*y + z*z);
|
||||||
|
ox = x * n;
|
||||||
|
oy = y * n;
|
||||||
|
oz = z * n;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline float
|
||||||
|
Unorm8ToFloat32(uint8_t u) {
|
||||||
|
return (float)u * (1.0f / 255.0f);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline uint8_t
|
||||||
|
Float32ToUnorm8(float f) {
|
||||||
|
return (uint8_t)(f * 255.0f);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline float
|
||||||
|
half_to_float_fast(uint16_t h) {
|
||||||
|
uint32_t hs = h & (int32_t)0x8000u; // Pick off sign bit
|
||||||
|
uint32_t he = h & (int32_t)0x7C00u; // Pick off exponent bits
|
||||||
|
uint32_t hm = h & (int32_t)0x03FFu; // Pick off mantissa bits
|
||||||
|
|
||||||
|
// sign
|
||||||
|
uint32_t xs = ((uint32_t) hs) << 16;
|
||||||
|
// Exponent: unbias the halfp, then bias the single
|
||||||
|
int32_t xes = ((int32_t) (he >> 10)) - 15 + 127;
|
||||||
|
// Exponent
|
||||||
|
uint32_t xe = (uint32_t) (xes << 23);
|
||||||
|
// Mantissa
|
||||||
|
uint32_t xm = ((uint32_t) hm) << 13;
|
||||||
|
|
||||||
|
uint32_t bits = (xs | xe | xm);
|
||||||
|
float *fp = reinterpret_cast<float *>(&bits);
|
||||||
|
return *fp;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void
|
||||||
|
ShadeTileC(
|
||||||
|
int32_t tileStartX, int32_t tileEndX,
|
||||||
|
int32_t tileStartY, int32_t tileEndY,
|
||||||
|
int32_t gBufferWidth, int32_t gBufferHeight,
|
||||||
|
const ispc::InputDataArrays &inputData,
|
||||||
|
// Camera data
|
||||||
|
float cameraProj_11, float cameraProj_22,
|
||||||
|
float cameraProj_33, float cameraProj_43,
|
||||||
|
// Light list
|
||||||
|
int32_t tileLightIndices[],
|
||||||
|
int32_t tileNumLights,
|
||||||
|
// UI
|
||||||
|
bool visualizeLightCount,
|
||||||
|
// Output
|
||||||
|
uint8_t framebuffer_r[],
|
||||||
|
uint8_t framebuffer_g[],
|
||||||
|
uint8_t framebuffer_b[]
|
||||||
|
)
|
||||||
|
{
|
||||||
|
if (tileNumLights == 0 || visualizeLightCount) {
|
||||||
|
uint8_t c = (uint8_t)(std::min(tileNumLights << 2, 255));
|
||||||
|
for (int32_t y = tileStartY; y < tileEndY; ++y) {
|
||||||
|
for (int32_t x = tileStartX; x < tileEndX; ++x) {
|
||||||
|
int32_t framebufferIndex = (y * gBufferWidth + x);
|
||||||
|
framebuffer_r[framebufferIndex] = c;
|
||||||
|
framebuffer_g[framebufferIndex] = c;
|
||||||
|
framebuffer_b[framebufferIndex] = c;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
float twoOverGBufferWidth = 2.0f / gBufferWidth;
|
||||||
|
float twoOverGBufferHeight = 2.0f / gBufferHeight;
|
||||||
|
|
||||||
|
for (int32_t y = tileStartY; y < tileEndY; ++y) {
|
||||||
|
float positionScreen_y = -(((0.5f + y) * twoOverGBufferHeight) - 1.f);
|
||||||
|
|
||||||
|
for (int32_t x = tileStartX; x < tileEndX; ++x) {
|
||||||
|
int32_t gBufferOffset = y * gBufferWidth + x;
|
||||||
|
|
||||||
|
// Reconstruct position and (negative) view vector from G-buffer
|
||||||
|
float surface_positionView_x, surface_positionView_y, surface_positionView_z;
|
||||||
|
float Vneg_x, Vneg_y, Vneg_z;
|
||||||
|
|
||||||
|
float z = inputData.zBuffer[gBufferOffset];
|
||||||
|
|
||||||
|
// Compute screen/clip-space position
|
||||||
|
// NOTE: Mind DX11 viewport transform and pixel center!
|
||||||
|
float positionScreen_x = (0.5f + (float)(x)) *
|
||||||
|
twoOverGBufferWidth - 1.0f;
|
||||||
|
|
||||||
|
// Unproject depth buffer Z value into view space
|
||||||
|
surface_positionView_z = cameraProj_43 / (z - cameraProj_33);
|
||||||
|
surface_positionView_x = positionScreen_x * surface_positionView_z /
|
||||||
|
cameraProj_11;
|
||||||
|
surface_positionView_y = positionScreen_y * surface_positionView_z /
|
||||||
|
cameraProj_22;
|
||||||
|
|
||||||
|
// We actually end up with a vector pointing *at* the
|
||||||
|
// surface (i.e. the negative view vector)
|
||||||
|
normalize3(surface_positionView_x, surface_positionView_y,
|
||||||
|
surface_positionView_z, Vneg_x, Vneg_y, Vneg_z);
|
||||||
|
|
||||||
|
// Reconstruct normal from G-buffer
|
||||||
|
float surface_normal_x, surface_normal_y, surface_normal_z;
|
||||||
|
float normal_x = half_to_float_fast(inputData.normalEncoded_x[gBufferOffset]);
|
||||||
|
float normal_y = half_to_float_fast(inputData.normalEncoded_y[gBufferOffset]);
|
||||||
|
|
||||||
|
float f = (normal_x - normal_x * normal_x) + (normal_y - normal_y * normal_y);
|
||||||
|
float m = sqrtf(4.0f * f - 1.0f);
|
||||||
|
|
||||||
|
surface_normal_x = m * (4.0f * normal_x - 2.0f);
|
||||||
|
surface_normal_y = m * (4.0f * normal_y - 2.0f);
|
||||||
|
surface_normal_z = 3.0f - 8.0f * f;
|
||||||
|
|
||||||
|
// Load other G-buffer parameters
|
||||||
|
float surface_specularAmount =
|
||||||
|
half_to_float_fast(inputData.specularAmount[gBufferOffset]);
|
||||||
|
float surface_specularPower =
|
||||||
|
half_to_float_fast(inputData.specularPower[gBufferOffset]);
|
||||||
|
float surface_albedo_x = Unorm8ToFloat32(inputData.albedo_x[gBufferOffset]);
|
||||||
|
float surface_albedo_y = Unorm8ToFloat32(inputData.albedo_y[gBufferOffset]);
|
||||||
|
float surface_albedo_z = Unorm8ToFloat32(inputData.albedo_z[gBufferOffset]);
|
||||||
|
|
||||||
|
float lit_x = 0.0f;
|
||||||
|
float lit_y = 0.0f;
|
||||||
|
float lit_z = 0.0f;
|
||||||
|
for (int32_t tileLightIndex = 0; tileLightIndex < tileNumLights;
|
||||||
|
++tileLightIndex) {
|
||||||
|
int32_t lightIndex = tileLightIndices[tileLightIndex];
|
||||||
|
|
||||||
|
// Gather light data relevant to initial culling
|
||||||
|
float light_positionView_x =
|
||||||
|
inputData.lightPositionView_x[lightIndex];
|
||||||
|
float light_positionView_y =
|
||||||
|
inputData.lightPositionView_y[lightIndex];
|
||||||
|
float light_positionView_z =
|
||||||
|
inputData.lightPositionView_z[lightIndex];
|
||||||
|
float light_attenuationEnd =
|
||||||
|
inputData.lightAttenuationEnd[lightIndex];
|
||||||
|
|
||||||
|
// Compute light vector
|
||||||
|
float L_x = light_positionView_x - surface_positionView_x;
|
||||||
|
float L_y = light_positionView_y - surface_positionView_y;
|
||||||
|
float L_z = light_positionView_z - surface_positionView_z;
|
||||||
|
|
||||||
|
float distanceToLight2 = dot3(L_x, L_y, L_z, L_x, L_y, L_z);
|
||||||
|
|
||||||
|
// Clip at end of attenuation
|
||||||
|
float light_attenutaionEnd2 = light_attenuationEnd * light_attenuationEnd;
|
||||||
|
|
||||||
|
if (distanceToLight2 < light_attenutaionEnd2) {
|
||||||
|
float distanceToLight = sqrtf(distanceToLight2);
|
||||||
|
|
||||||
|
float distanceToLightRcp = 1.f / distanceToLight;
|
||||||
|
L_x *= distanceToLightRcp;
|
||||||
|
L_y *= distanceToLightRcp;
|
||||||
|
L_z *= distanceToLightRcp;
|
||||||
|
|
||||||
|
// Start computing brdf
|
||||||
|
float NdotL = dot3(surface_normal_x, surface_normal_y,
|
||||||
|
surface_normal_z, L_x, L_y, L_z);
|
||||||
|
|
||||||
|
// Clip back facing
|
||||||
|
if (NdotL > 0.0f) {
|
||||||
|
float light_attenuationBegin =
|
||||||
|
inputData.lightAttenuationBegin[lightIndex];
|
||||||
|
|
||||||
|
// Light distance attenuation (linstep)
|
||||||
|
float lightRange = (light_attenuationEnd - light_attenuationBegin);
|
||||||
|
float falloffPosition = (light_attenuationEnd - distanceToLight);
|
||||||
|
float attenuation = std::min(falloffPosition / lightRange, 1.0f);
|
||||||
|
|
||||||
|
float H_x = (L_x - Vneg_x);
|
||||||
|
float H_y = (L_y - Vneg_y);
|
||||||
|
float H_z = (L_z - Vneg_z);
|
||||||
|
normalize3(H_x, H_y, H_z, H_x, H_y, H_z);
|
||||||
|
|
||||||
|
float NdotH = dot3(surface_normal_x, surface_normal_y,
|
||||||
|
surface_normal_z, H_x, H_y, H_z);
|
||||||
|
NdotH = std::max(NdotH, 0.0f);
|
||||||
|
|
||||||
|
float specular = powf(NdotH, surface_specularPower);
|
||||||
|
float specularNorm = (surface_specularPower + 2.0f) *
|
||||||
|
(1.0f / 8.0f);
|
||||||
|
float specularContrib = surface_specularAmount *
|
||||||
|
specularNorm * specular;
|
||||||
|
|
||||||
|
float k = attenuation * NdotL * (1.0f + specularContrib);
|
||||||
|
|
||||||
|
float light_color_x = inputData.lightColor_x[lightIndex];
|
||||||
|
float light_color_y = inputData.lightColor_y[lightIndex];
|
||||||
|
float light_color_z = inputData.lightColor_z[lightIndex];
|
||||||
|
|
||||||
|
float lightContrib_x = surface_albedo_x * light_color_x;
|
||||||
|
float lightContrib_y = surface_albedo_y * light_color_y;
|
||||||
|
float lightContrib_z = surface_albedo_z * light_color_z;
|
||||||
|
|
||||||
|
lit_x += lightContrib_x * k;
|
||||||
|
lit_y += lightContrib_y * k;
|
||||||
|
lit_z += lightContrib_z * k;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Gamma correct
|
||||||
|
float gamma = 1.0 / 2.2f;
|
||||||
|
lit_x = powf(std::min(std::max(lit_x, 0.0f), 1.0f), gamma);
|
||||||
|
lit_y = powf(std::min(std::max(lit_y, 0.0f), 1.0f), gamma);
|
||||||
|
lit_z = powf(std::min(std::max(lit_z, 0.0f), 1.0f), gamma);
|
||||||
|
|
||||||
|
framebuffer_r[gBufferOffset] = Float32ToUnorm8(lit_x);
|
||||||
|
framebuffer_g[gBufferOffset] = Float32ToUnorm8(lit_y);
|
||||||
|
framebuffer_b[gBufferOffset] = Float32ToUnorm8(lit_z);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void
|
||||||
|
ShadeDynamicTileRecurse(InputData *input, int level, int tileX, int tileY,
|
||||||
|
int *lightIndices, int numLights,
|
||||||
|
Framebuffer *framebuffer) {
|
||||||
|
const MinMaxZTree *minMaxZTree = gMinMaxZTree;
|
||||||
|
|
||||||
|
// If we few enough lights or this is the base case (last level), shade
|
||||||
|
// this full tile directly
|
||||||
|
if (level == 0 || numLights < DYNAMIC_MIN_LIGHTS_TO_SUBDIVIDE) {
|
||||||
|
int width = minMaxZTree->TileWidth(level);
|
||||||
|
int height = minMaxZTree->TileHeight(level);
|
||||||
|
int startX = tileX * width;
|
||||||
|
int startY = tileY * height;
|
||||||
|
int endX = std::min(input->header.framebufferWidth, startX + width);
|
||||||
|
int endY = std::min(input->header.framebufferHeight, startY + height);
|
||||||
|
|
||||||
|
// Skip entirely offscreen tiles
|
||||||
|
if (endX > startX && endY > startY) {
|
||||||
|
ShadeTileC(startX, endX, startY, endY,
|
||||||
|
input->header.framebufferWidth, input->header.framebufferHeight,
|
||||||
|
input->arrays,
|
||||||
|
input->header.cameraProj[0][0], input->header.cameraProj[1][1],
|
||||||
|
input->header.cameraProj[2][2], input->header.cameraProj[3][2],
|
||||||
|
lightIndices, numLights, VISUALIZE_LIGHT_COUNT,
|
||||||
|
framebuffer->r, framebuffer->g, framebuffer->b);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
// Otherwise, subdivide and 4-way recurse using X and Y splitting planes
|
||||||
|
// Move down a level in the tree
|
||||||
|
--level;
|
||||||
|
tileX <<= 1;
|
||||||
|
tileY <<= 1;
|
||||||
|
int width = minMaxZTree->TileWidth(level);
|
||||||
|
int height = minMaxZTree->TileHeight(level);
|
||||||
|
|
||||||
|
// Work out splitting coords
|
||||||
|
int midX = (tileX + 1) * width;
|
||||||
|
int midY = (tileY + 1) * height;
|
||||||
|
|
||||||
|
// Read subtile min/max data
|
||||||
|
// NOTE: We must be sure to handle out-of-bounds access here since
|
||||||
|
// sometimes we'll only have 1 or 2 subtiles for non-pow-2
|
||||||
|
// framebuffer sizes.
|
||||||
|
bool rightTileExists = (tileX + 1 < minMaxZTree->NumTilesX(level));
|
||||||
|
bool bottomTileExists = (tileY + 1 < minMaxZTree->NumTilesY(level));
|
||||||
|
|
||||||
|
// NOTE: Order is 00, 10, 01, 11
|
||||||
|
// Set defaults up to cull all lights if the tile doesn't exist (offscreen)
|
||||||
|
float minZ[4] = {input->header.cameraFar, input->header.cameraFar,
|
||||||
|
input->header.cameraFar, input->header.cameraFar};
|
||||||
|
float maxZ[4] = {input->header.cameraNear, input->header.cameraNear,
|
||||||
|
input->header.cameraNear, input->header.cameraNear};
|
||||||
|
|
||||||
|
minZ[0] = minMaxZTree->MinZ(level, tileX, tileY);
|
||||||
|
maxZ[0] = minMaxZTree->MaxZ(level, tileX, tileY);
|
||||||
|
if (rightTileExists) {
|
||||||
|
minZ[1] = minMaxZTree->MinZ(level, tileX + 1, tileY);
|
||||||
|
maxZ[1] = minMaxZTree->MaxZ(level, tileX + 1, tileY);
|
||||||
|
if (bottomTileExists) {
|
||||||
|
minZ[3] = minMaxZTree->MinZ(level, tileX + 1, tileY + 1);
|
||||||
|
maxZ[3] = minMaxZTree->MaxZ(level, tileX + 1, tileY + 1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (bottomTileExists) {
|
||||||
|
minZ[2] = minMaxZTree->MinZ(level, tileX, tileY + 1);
|
||||||
|
maxZ[2] = minMaxZTree->MaxZ(level, tileX, tileY + 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cull lights into subtile lists
|
||||||
|
#ifdef ISPC_IS_WINDOWS
|
||||||
|
__declspec(align(ALIGNMENT_BYTES))
|
||||||
|
#endif
|
||||||
|
int subtileLightIndices[4][MAX_LIGHTS]
|
||||||
|
#ifndef ISPC_IS_WINDOWS
|
||||||
|
__attribute__ ((aligned(ALIGNMENT_BYTES)))
|
||||||
|
#endif
|
||||||
|
;
|
||||||
|
int subtileNumLights[4];
|
||||||
|
SplitTileMinMax(midX, midY, minZ, maxZ,
|
||||||
|
input->header.framebufferWidth, input->header.framebufferHeight,
|
||||||
|
input->header.cameraProj[0][0], input->header.cameraProj[1][1],
|
||||||
|
lightIndices, numLights, input->arrays.lightPositionView_x,
|
||||||
|
input->arrays.lightPositionView_y, input->arrays.lightPositionView_z,
|
||||||
|
input->arrays.lightAttenuationEnd,
|
||||||
|
subtileLightIndices[0], MAX_LIGHTS, subtileNumLights);
|
||||||
|
|
||||||
|
// Recurse into subtiles
|
||||||
|
ShadeDynamicTileRecurse(input, level, tileX , tileY,
|
||||||
|
subtileLightIndices[0], subtileNumLights[0],
|
||||||
|
framebuffer);
|
||||||
|
ShadeDynamicTileRecurse(input, level, tileX + 1, tileY,
|
||||||
|
subtileLightIndices[1], subtileNumLights[1],
|
||||||
|
framebuffer);
|
||||||
|
ShadeDynamicTileRecurse(input, level, tileX , tileY + 1,
|
||||||
|
subtileLightIndices[2], subtileNumLights[2],
|
||||||
|
framebuffer);
|
||||||
|
ShadeDynamicTileRecurse(input, level, tileX + 1, tileY + 1,
|
||||||
|
subtileLightIndices[3], subtileNumLights[3],
|
||||||
|
framebuffer);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static int
|
||||||
|
IntersectLightsWithTileMinMax(
|
||||||
|
int tileStartX, int tileEndX,
|
||||||
|
int tileStartY, int tileEndY,
|
||||||
|
// Tile data
|
||||||
|
float minZ,
|
||||||
|
float maxZ,
|
||||||
|
// G-buffer data
|
||||||
|
int gBufferWidth, int gBufferHeight,
|
||||||
|
// Camera data
|
||||||
|
float cameraProj_11, float cameraProj_22,
|
||||||
|
// Light Data
|
||||||
|
int numLights,
|
||||||
|
float light_positionView_x_array[],
|
||||||
|
float light_positionView_y_array[],
|
||||||
|
float light_positionView_z_array[],
|
||||||
|
float light_attenuationEnd_array[],
|
||||||
|
// Output
|
||||||
|
int tileLightIndices[]
|
||||||
|
)
|
||||||
|
{
|
||||||
|
float gBufferScale_x = 0.5f * (float)gBufferWidth;
|
||||||
|
float gBufferScale_y = 0.5f * (float)gBufferHeight;
|
||||||
|
|
||||||
|
float frustumPlanes_xy[4];
|
||||||
|
float frustumPlanes_z[4];
|
||||||
|
|
||||||
|
// This one is totally constant over the whole screen... worth pulling it up at all?
|
||||||
|
float frustumPlanes_xy_v[4] = { -(cameraProj_11 * gBufferScale_x),
|
||||||
|
(cameraProj_11 * gBufferScale_x),
|
||||||
|
(cameraProj_22 * gBufferScale_y),
|
||||||
|
-(cameraProj_22 * gBufferScale_y) };
|
||||||
|
|
||||||
|
float frustumPlanes_z_v[4] = { tileEndX - gBufferScale_x,
|
||||||
|
-tileStartX + gBufferScale_x,
|
||||||
|
tileEndY - gBufferScale_y,
|
||||||
|
-tileStartY + gBufferScale_y };
|
||||||
|
|
||||||
|
for (int i = 0; i < 4; ++i) {
|
||||||
|
float norm = 1.f / sqrtf(frustumPlanes_xy_v[i] * frustumPlanes_xy_v[i] +
|
||||||
|
frustumPlanes_z_v[i] * frustumPlanes_z_v[i]);
|
||||||
|
frustumPlanes_xy_v[i] *= norm;
|
||||||
|
frustumPlanes_z_v[i] *= norm;
|
||||||
|
|
||||||
|
frustumPlanes_xy[i] = frustumPlanes_xy_v[i];
|
||||||
|
frustumPlanes_z[i] = frustumPlanes_z_v[i];
|
||||||
|
}
|
||||||
|
|
||||||
|
int tileNumLights = 0;
|
||||||
|
|
||||||
|
for (int lightIndex = 0; lightIndex < numLights; ++lightIndex) {
|
||||||
|
float light_positionView_z = light_positionView_z_array[lightIndex];
|
||||||
|
float light_attenuationEnd = light_attenuationEnd_array[lightIndex];
|
||||||
|
float light_attenuationEndNeg = -light_attenuationEnd;
|
||||||
|
|
||||||
|
float d = light_positionView_z - minZ;
|
||||||
|
bool inFrustum = (d >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
d = maxZ - light_positionView_z;
|
||||||
|
inFrustum = inFrustum && (d >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
if (!inFrustum)
|
||||||
|
continue;
|
||||||
|
|
||||||
|
float light_positionView_x = light_positionView_x_array[lightIndex];
|
||||||
|
float light_positionView_y = light_positionView_y_array[lightIndex];
|
||||||
|
|
||||||
|
d = light_positionView_z * frustumPlanes_z[0] +
|
||||||
|
light_positionView_x * frustumPlanes_xy[0];
|
||||||
|
inFrustum = inFrustum && (d >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
d = light_positionView_z * frustumPlanes_z[1] +
|
||||||
|
light_positionView_x * frustumPlanes_xy[1];
|
||||||
|
inFrustum = inFrustum && (d >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
d = light_positionView_z * frustumPlanes_z[2] +
|
||||||
|
light_positionView_y * frustumPlanes_xy[2];
|
||||||
|
inFrustum = inFrustum && (d >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
d = light_positionView_z * frustumPlanes_z[3] +
|
||||||
|
light_positionView_y * frustumPlanes_xy[3];
|
||||||
|
inFrustum = inFrustum && (d >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
// Pack and store intersecting lights
|
||||||
|
if (inFrustum)
|
||||||
|
tileLightIndices[tileNumLights++] = lightIndex;
|
||||||
|
}
|
||||||
|
|
||||||
|
return tileNumLights;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void
|
||||||
|
ShadeDynamicTile(InputData *input, int level, int tileX, int tileY,
|
||||||
|
Framebuffer *framebuffer) {
|
||||||
|
const MinMaxZTree *minMaxZTree = gMinMaxZTree;
|
||||||
|
|
||||||
|
// Get Z min/max for this tile
|
||||||
|
int width = minMaxZTree->TileWidth(level);
|
||||||
|
int height = minMaxZTree->TileHeight(level);
|
||||||
|
float minZ = minMaxZTree->MinZ(level, tileX, tileY);
|
||||||
|
float maxZ = minMaxZTree->MaxZ(level, tileX, tileY);
|
||||||
|
|
||||||
|
int startX = tileX * width;
|
||||||
|
int startY = tileY * height;
|
||||||
|
int endX = std::min(input->header.framebufferWidth, startX + width);
|
||||||
|
int endY = std::min(input->header.framebufferHeight, startY + height);
|
||||||
|
|
||||||
|
// This is a root tile, so first do a full 6-plane cull
|
||||||
|
#ifdef ISPC_IS_WINDOWS
|
||||||
|
__declspec(align(ALIGNMENT_BYTES))
|
||||||
|
#endif
|
||||||
|
int lightIndices[MAX_LIGHTS]
|
||||||
|
#ifndef ISPC_IS_WINDOWS
|
||||||
|
__attribute__ ((aligned(ALIGNMENT_BYTES)))
|
||||||
|
#endif
|
||||||
|
;
|
||||||
|
int numLights = IntersectLightsWithTileMinMax(
|
||||||
|
startX, endX, startY, endY, minZ, maxZ,
|
||||||
|
input->header.framebufferWidth, input->header.framebufferHeight,
|
||||||
|
input->header.cameraProj[0][0], input->header.cameraProj[1][1],
|
||||||
|
MAX_LIGHTS, input->arrays.lightPositionView_x,
|
||||||
|
input->arrays.lightPositionView_y, input->arrays.lightPositionView_z,
|
||||||
|
input->arrays.lightAttenuationEnd, lightIndices);
|
||||||
|
|
||||||
|
// Now kick off the recursive process for this tile
|
||||||
|
ShadeDynamicTileRecurse(input, level, tileX, tileY, lightIndices,
|
||||||
|
numLights, framebuffer);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void
|
||||||
|
DispatchDynamicC(InputData *input, Framebuffer *framebuffer)
|
||||||
|
{
|
||||||
|
MinMaxZTree *minMaxZTree = gMinMaxZTree;
|
||||||
|
|
||||||
|
// Update min/max Z tree
|
||||||
|
minMaxZTree->Update(input->arrays.zBuffer, input->header.framebufferWidth,
|
||||||
|
input->header.cameraProj[2][2], input->header.cameraProj[3][2],
|
||||||
|
input->header.cameraNear, input->header.cameraFar);
|
||||||
|
|
||||||
|
int rootLevel = minMaxZTree->Levels() - 1;
|
||||||
|
int rootTilesX = minMaxZTree->NumTilesX(rootLevel);
|
||||||
|
int rootTilesY = minMaxZTree->NumTilesY(rootLevel);
|
||||||
|
int rootTiles = rootTilesX * rootTilesY;
|
||||||
|
for (int g = 0; g < rootTiles; ++g) {
|
||||||
|
uint32_t tileY = g / rootTilesX;
|
||||||
|
uint32_t tileX = g % rootTilesX;
|
||||||
|
ShadeDynamicTile(input, rootLevel, tileX, tileY, framebuffer);
|
||||||
|
}
|
||||||
|
}
|
||||||
398
examples_cuda/deferred/dynamic_cilk.cpp
Normal file
398
examples_cuda/deferred/dynamic_cilk.cpp
Normal file
@@ -0,0 +1,398 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef __cilk
|
||||||
|
|
||||||
|
#include "deferred.h"
|
||||||
|
#include "kernels_ispc.h"
|
||||||
|
#include <algorithm>
|
||||||
|
#include <assert.h>
|
||||||
|
|
||||||
|
#ifdef _MSC_VER
|
||||||
|
#define ISPC_IS_WINDOWS
|
||||||
|
#elif defined(__linux__)
|
||||||
|
#define ISPC_IS_LINUX
|
||||||
|
#elif defined(__APPLE__)
|
||||||
|
#define ISPC_IS_APPLE
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#ifdef ISPC_IS_LINUX
|
||||||
|
#include <malloc.h>
|
||||||
|
#endif // ISPC_IS_LINUX
|
||||||
|
|
||||||
|
// Currently tile widths must be a multiple of SIMD width (i.e. 8 for ispc sse4x2)!
|
||||||
|
#define MIN_TILE_WIDTH 16
|
||||||
|
#define MIN_TILE_HEIGHT 16
|
||||||
|
|
||||||
|
|
||||||
|
#define DYNAMIC_TREE_LEVELS 5
|
||||||
|
// If this is set to 1 then the result will be identical to the static version
|
||||||
|
#define DYNAMIC_MIN_LIGHTS_TO_SUBDIVIDE 1
|
||||||
|
|
||||||
|
static void *
|
||||||
|
lAlignedMalloc(size_t size, int32_t alignment) {
|
||||||
|
#ifdef ISPC_IS_WINDOWS
|
||||||
|
return _aligned_malloc(size, alignment);
|
||||||
|
#endif
|
||||||
|
#ifdef ISPC_IS_LINUX
|
||||||
|
return memalign(alignment, size);
|
||||||
|
#endif
|
||||||
|
#ifdef ISPC_IS_APPLE
|
||||||
|
void *mem = malloc(size + (alignment-1) + sizeof(void*));
|
||||||
|
char *amem = ((char*)mem) + sizeof(void*);
|
||||||
|
amem = amem + uint32_t(alignment - (reinterpret_cast<uint64_t>(amem) &
|
||||||
|
(alignment - 1)));
|
||||||
|
((void**)amem)[-1] = mem;
|
||||||
|
return amem;
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void
|
||||||
|
lAlignedFree(void *ptr) {
|
||||||
|
#ifdef ISPC_IS_WINDOWS
|
||||||
|
_aligned_free(ptr);
|
||||||
|
#endif
|
||||||
|
#ifdef ISPC_IS_LINUX
|
||||||
|
free(ptr);
|
||||||
|
#endif
|
||||||
|
#ifdef ISPC_IS_APPLE
|
||||||
|
free(((void**)ptr)[-1]);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class MinMaxZTreeCilk
|
||||||
|
{
|
||||||
|
public:
|
||||||
|
// Currently (min) tile dimensions must divide gBuffer dimensions evenly
|
||||||
|
// Levels must be small enough that neither dimension goes below one tile
|
||||||
|
MinMaxZTreeCilk(
|
||||||
|
int tileWidth, int tileHeight, int levels,
|
||||||
|
int gBufferWidth, int gBufferHeight)
|
||||||
|
: mTileWidth(tileWidth), mTileHeight(tileHeight), mLevels(levels)
|
||||||
|
{
|
||||||
|
mNumTilesX = gBufferWidth / mTileWidth;
|
||||||
|
mNumTilesY = gBufferHeight / mTileHeight;
|
||||||
|
|
||||||
|
// Allocate arrays
|
||||||
|
mMinZArrays = (float **)lAlignedMalloc(sizeof(float *) * mLevels, 16);
|
||||||
|
mMaxZArrays = (float **)lAlignedMalloc(sizeof(float *) * mLevels, 16);
|
||||||
|
for (int i = 0; i < mLevels; ++i) {
|
||||||
|
int x = NumTilesX(i);
|
||||||
|
int y = NumTilesY(i);
|
||||||
|
assert(x > 0);
|
||||||
|
assert(y > 0);
|
||||||
|
// NOTE: If the following two asserts fire it probably means that
|
||||||
|
// the base tile dimensions do not evenly divide the G-buffer dimensions
|
||||||
|
assert(x * (mTileWidth << i) >= gBufferWidth);
|
||||||
|
assert(y * (mTileHeight << i) >= gBufferHeight);
|
||||||
|
mMinZArrays[i] = (float *)lAlignedMalloc(sizeof(float) * x * y, 16);
|
||||||
|
mMaxZArrays[i] = (float *)lAlignedMalloc(sizeof(float) * x * y, 16);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void Update(float *zBuffer, int gBufferPitchInElements,
|
||||||
|
float cameraProj_33, float cameraProj_43,
|
||||||
|
float cameraNear, float cameraFar)
|
||||||
|
{
|
||||||
|
// Compute level 0 in parallel. Outer loops is here since we use Cilk
|
||||||
|
_Cilk_for (int tileY = 0; tileY < mNumTilesY; ++tileY) {
|
||||||
|
ispc::ComputeZBoundsRow(tileY,
|
||||||
|
mTileWidth, mTileHeight, mNumTilesX, mNumTilesY,
|
||||||
|
zBuffer, gBufferPitchInElements,
|
||||||
|
cameraProj_33, cameraProj_43, cameraNear, cameraFar,
|
||||||
|
mMinZArrays[0] + (tileY * mNumTilesX),
|
||||||
|
mMaxZArrays[0] + (tileY * mNumTilesX));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Generate other levels
|
||||||
|
// NOTE: We currently don't use ispc here since it's sort of an
|
||||||
|
// awkward gather-based reduction Using SSE odd pack/unpack
|
||||||
|
// instructions might actually work here when we need to optimize
|
||||||
|
for (int level = 1; level < mLevels; ++level) {
|
||||||
|
int destTilesX = NumTilesX(level);
|
||||||
|
int destTilesY = NumTilesY(level);
|
||||||
|
int srcLevel = level - 1;
|
||||||
|
int srcTilesX = NumTilesX(srcLevel);
|
||||||
|
int srcTilesY = NumTilesY(srcLevel);
|
||||||
|
_Cilk_for (int y = 0; y < destTilesY; ++y) {
|
||||||
|
for (int x = 0; x < destTilesX; ++x) {
|
||||||
|
int srcX = x << 1;
|
||||||
|
int srcY = y << 1;
|
||||||
|
// NOTE: Ugly branches to deal with non-multiple dimensions at some levels
|
||||||
|
// TODO: SSE branchless min/max is probably better...
|
||||||
|
float minZ = mMinZArrays[srcLevel][(srcY) * srcTilesX + (srcX)];
|
||||||
|
float maxZ = mMaxZArrays[srcLevel][(srcY) * srcTilesX + (srcX)];
|
||||||
|
if (srcX + 1 < srcTilesX) {
|
||||||
|
minZ = std::min(minZ, mMinZArrays[srcLevel][(srcY) * srcTilesX +
|
||||||
|
(srcX + 1)]);
|
||||||
|
maxZ = std::max(maxZ, mMaxZArrays[srcLevel][(srcY) * srcTilesX +
|
||||||
|
(srcX + 1)]);
|
||||||
|
if (srcY + 1 < srcTilesY) {
|
||||||
|
minZ = std::min(minZ, mMinZArrays[srcLevel][(srcY + 1) * srcTilesX +
|
||||||
|
(srcX + 1)]);
|
||||||
|
maxZ = std::max(maxZ, mMaxZArrays[srcLevel][(srcY + 1) * srcTilesX +
|
||||||
|
(srcX + 1)]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (srcY + 1 < srcTilesY) {
|
||||||
|
minZ = std::min(minZ, mMinZArrays[srcLevel][(srcY + 1) * srcTilesX +
|
||||||
|
(srcX )]);
|
||||||
|
maxZ = std::max(maxZ, mMaxZArrays[srcLevel][(srcY + 1) * srcTilesX +
|
||||||
|
(srcX )]);
|
||||||
|
}
|
||||||
|
mMinZArrays[level][y * destTilesX + x] = minZ;
|
||||||
|
mMaxZArrays[level][y * destTilesX + x] = maxZ;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
~MinMaxZTreeCilk() {
|
||||||
|
for (int i = 0; i < mLevels; ++i) {
|
||||||
|
lAlignedFree(mMinZArrays[i]);
|
||||||
|
lAlignedFree(mMaxZArrays[i]);
|
||||||
|
}
|
||||||
|
lAlignedFree(mMinZArrays);
|
||||||
|
lAlignedFree(mMaxZArrays);
|
||||||
|
}
|
||||||
|
|
||||||
|
int Levels() const { return mLevels; }
|
||||||
|
|
||||||
|
// These round UP, so beware that the last tile for a given level may not be completely full
|
||||||
|
// TODO: Verify this...
|
||||||
|
int NumTilesX(int level = 0) const { return (mNumTilesX + (1 << level) - 1) >> level; }
|
||||||
|
int NumTilesY(int level = 0) const { return (mNumTilesY + (1 << level) - 1) >> level; }
|
||||||
|
int TileWidth(int level = 0) const { return (mTileWidth << level); }
|
||||||
|
int TileHeight(int level = 0) const { return (mTileHeight << level); }
|
||||||
|
|
||||||
|
float MinZ(int level, int tileX, int tileY) const {
|
||||||
|
return mMinZArrays[level][tileY * NumTilesX(level) + tileX];
|
||||||
|
}
|
||||||
|
float MaxZ(int level, int tileX, int tileY) const {
|
||||||
|
return mMaxZArrays[level][tileY * NumTilesX(level) + tileX];
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
int mTileWidth;
|
||||||
|
int mTileHeight;
|
||||||
|
int mLevels;
|
||||||
|
int mNumTilesX;
|
||||||
|
int mNumTilesY;
|
||||||
|
|
||||||
|
// One array for each "level" in the tree
|
||||||
|
float **mMinZArrays;
|
||||||
|
float **mMaxZArrays;
|
||||||
|
};
|
||||||
|
|
||||||
|
static MinMaxZTreeCilk *gMinMaxZTreeCilk = 0;
|
||||||
|
|
||||||
|
void InitDynamicCilk(InputData *input) {
|
||||||
|
gMinMaxZTreeCilk =
|
||||||
|
new MinMaxZTreeCilk(MIN_TILE_WIDTH, MIN_TILE_HEIGHT, DYNAMIC_TREE_LEVELS,
|
||||||
|
input->header.framebufferWidth,
|
||||||
|
input->header.framebufferHeight);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void
|
||||||
|
ShadeDynamicTileRecurse(InputData *input, int level, int tileX, int tileY,
|
||||||
|
int *lightIndices, int numLights,
|
||||||
|
Framebuffer *framebuffer) {
|
||||||
|
const MinMaxZTreeCilk *minMaxZTree = gMinMaxZTreeCilk;
|
||||||
|
|
||||||
|
// If we few enough lights or this is the base case (last level), shade
|
||||||
|
// this full tile directly
|
||||||
|
if (level == 0 || numLights < DYNAMIC_MIN_LIGHTS_TO_SUBDIVIDE) {
|
||||||
|
int width = minMaxZTree->TileWidth(level);
|
||||||
|
int height = minMaxZTree->TileHeight(level);
|
||||||
|
int startX = tileX * width;
|
||||||
|
int startY = tileY * height;
|
||||||
|
int endX = std::min(input->header.framebufferWidth, startX + width);
|
||||||
|
int endY = std::min(input->header.framebufferHeight, startY + height);
|
||||||
|
|
||||||
|
// Skip entirely offscreen tiles
|
||||||
|
if (endX > startX && endY > startY) {
|
||||||
|
ispc::ShadeTile(
|
||||||
|
startX, endX, startY, endY,
|
||||||
|
input->header.framebufferWidth, input->header.framebufferHeight,
|
||||||
|
&input->arrays,
|
||||||
|
input->header.cameraProj[0][0], input->header.cameraProj[1][1],
|
||||||
|
input->header.cameraProj[2][2], input->header.cameraProj[3][2],
|
||||||
|
lightIndices, numLights, VISUALIZE_LIGHT_COUNT,
|
||||||
|
framebuffer->r, framebuffer->g, framebuffer->b);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
// Otherwise, subdivide and 4-way recurse using X and Y splitting planes
|
||||||
|
// Move down a level in the tree
|
||||||
|
--level;
|
||||||
|
tileX <<= 1;
|
||||||
|
tileY <<= 1;
|
||||||
|
int width = minMaxZTree->TileWidth(level);
|
||||||
|
int height = minMaxZTree->TileHeight(level);
|
||||||
|
|
||||||
|
// Work out splitting coords
|
||||||
|
int midX = (tileX + 1) * width;
|
||||||
|
int midY = (tileY + 1) * height;
|
||||||
|
|
||||||
|
// Read subtile min/max data
|
||||||
|
// NOTE: We must be sure to handle out-of-bounds access here since
|
||||||
|
// sometimes we'll only have 1 or 2 subtiles for non-pow-2
|
||||||
|
// framebuffer sizes.
|
||||||
|
bool rightTileExists = (tileX + 1 < minMaxZTree->NumTilesX(level));
|
||||||
|
bool bottomTileExists = (tileY + 1 < minMaxZTree->NumTilesY(level));
|
||||||
|
|
||||||
|
// NOTE: Order is 00, 10, 01, 11
|
||||||
|
// Set defaults up to cull all lights if the tile doesn't exist (offscreen)
|
||||||
|
float minZ[4] = {input->header.cameraFar, input->header.cameraFar,
|
||||||
|
input->header.cameraFar, input->header.cameraFar};
|
||||||
|
float maxZ[4] = {input->header.cameraNear, input->header.cameraNear,
|
||||||
|
input->header.cameraNear, input->header.cameraNear};
|
||||||
|
|
||||||
|
minZ[0] = minMaxZTree->MinZ(level, tileX, tileY);
|
||||||
|
maxZ[0] = minMaxZTree->MaxZ(level, tileX, tileY);
|
||||||
|
if (rightTileExists) {
|
||||||
|
minZ[1] = minMaxZTree->MinZ(level, tileX + 1, tileY);
|
||||||
|
maxZ[1] = minMaxZTree->MaxZ(level, tileX + 1, tileY);
|
||||||
|
if (bottomTileExists) {
|
||||||
|
minZ[3] = minMaxZTree->MinZ(level, tileX + 1, tileY + 1);
|
||||||
|
maxZ[3] = minMaxZTree->MaxZ(level, tileX + 1, tileY + 1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (bottomTileExists) {
|
||||||
|
minZ[2] = minMaxZTree->MinZ(level, tileX, tileY + 1);
|
||||||
|
maxZ[2] = minMaxZTree->MaxZ(level, tileX, tileY + 1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cull lights into subtile lists
|
||||||
|
#ifdef ISPC_IS_WINDOWS
|
||||||
|
__declspec(align(ALIGNMENT_BYTES))
|
||||||
|
#endif
|
||||||
|
int subtileLightIndices[4][MAX_LIGHTS]
|
||||||
|
#ifndef ISPC_IS_WINDOWS
|
||||||
|
__attribute__ ((aligned(ALIGNMENT_BYTES)))
|
||||||
|
#endif
|
||||||
|
;
|
||||||
|
int subtileNumLights[4];
|
||||||
|
ispc::SplitTileMinMax(midX, midY, minZ, maxZ,
|
||||||
|
input->header.framebufferWidth, input->header.framebufferHeight,
|
||||||
|
input->header.cameraProj[0][0], input->header.cameraProj[1][1],
|
||||||
|
lightIndices, numLights, input->arrays.lightPositionView_x,
|
||||||
|
input->arrays.lightPositionView_y, input->arrays.lightPositionView_z,
|
||||||
|
input->arrays.lightAttenuationEnd,
|
||||||
|
subtileLightIndices[0], MAX_LIGHTS, subtileNumLights);
|
||||||
|
|
||||||
|
// Recurse into subtiles
|
||||||
|
_Cilk_spawn ShadeDynamicTileRecurse(input, level, tileX , tileY,
|
||||||
|
subtileLightIndices[0], subtileNumLights[0],
|
||||||
|
framebuffer);
|
||||||
|
_Cilk_spawn ShadeDynamicTileRecurse(input, level, tileX + 1, tileY,
|
||||||
|
subtileLightIndices[1], subtileNumLights[1],
|
||||||
|
framebuffer);
|
||||||
|
_Cilk_spawn ShadeDynamicTileRecurse(input, level, tileX , tileY + 1,
|
||||||
|
subtileLightIndices[2], subtileNumLights[2],
|
||||||
|
framebuffer);
|
||||||
|
ShadeDynamicTileRecurse(input, level, tileX + 1, tileY + 1,
|
||||||
|
subtileLightIndices[3], subtileNumLights[3],
|
||||||
|
framebuffer);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void
|
||||||
|
ShadeDynamicTile(InputData *input, int level, int tileX, int tileY,
|
||||||
|
Framebuffer *framebuffer) {
|
||||||
|
const MinMaxZTreeCilk *minMaxZTree = gMinMaxZTreeCilk;
|
||||||
|
|
||||||
|
// Get Z min/max for this tile
|
||||||
|
int width = minMaxZTree->TileWidth(level);
|
||||||
|
int height = minMaxZTree->TileHeight(level);
|
||||||
|
float minZ = minMaxZTree->MinZ(level, tileX, tileY);
|
||||||
|
float maxZ = minMaxZTree->MaxZ(level, tileX, tileY);
|
||||||
|
|
||||||
|
int startX = tileX * width;
|
||||||
|
int startY = tileY * height;
|
||||||
|
int endX = std::min(input->header.framebufferWidth, startX + width);
|
||||||
|
int endY = std::min(input->header.framebufferHeight, startY + height);
|
||||||
|
|
||||||
|
// This is a root tile, so first do a full 6-plane cull
|
||||||
|
#ifdef ISPC_IS_WINDOWS
|
||||||
|
__declspec(align(ALIGNMENT_BYTES))
|
||||||
|
#endif
|
||||||
|
int lightIndices[MAX_LIGHTS]
|
||||||
|
#ifndef ISPC_IS_WINDOWS
|
||||||
|
__attribute__ ((aligned(ALIGNMENT_BYTES)))
|
||||||
|
#endif
|
||||||
|
;
|
||||||
|
int numLights = ispc::IntersectLightsWithTileMinMax(
|
||||||
|
startX, endX, startY, endY, minZ, maxZ,
|
||||||
|
input->header.framebufferWidth, input->header.framebufferHeight,
|
||||||
|
input->header.cameraProj[0][0], input->header.cameraProj[1][1],
|
||||||
|
MAX_LIGHTS, input->arrays.lightPositionView_x,
|
||||||
|
input->arrays.lightPositionView_y, input->arrays.lightPositionView_z,
|
||||||
|
input->arrays.lightAttenuationEnd, lightIndices);
|
||||||
|
|
||||||
|
// Now kick off the recursive process for this tile
|
||||||
|
ShadeDynamicTileRecurse(input, level, tileX, tileY, lightIndices,
|
||||||
|
numLights, framebuffer);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void
|
||||||
|
DispatchDynamicCilk(InputData *input, Framebuffer *framebuffer)
|
||||||
|
{
|
||||||
|
MinMaxZTreeCilk *minMaxZTree = gMinMaxZTreeCilk;
|
||||||
|
|
||||||
|
// Update min/max Z tree
|
||||||
|
minMaxZTree->Update(input->arrays.zBuffer, input->header.framebufferWidth,
|
||||||
|
input->header.cameraProj[2][2], input->header.cameraProj[3][2],
|
||||||
|
input->header.cameraNear, input->header.cameraFar);
|
||||||
|
|
||||||
|
// Launch the "root" tiles. Ideally these should at least fill the
|
||||||
|
// machine... at the moment we have a static number of "levels" to the
|
||||||
|
// mip tree but it might make sense to compute it based on the width of
|
||||||
|
// the machine.
|
||||||
|
int rootLevel = minMaxZTree->Levels() - 1;
|
||||||
|
int rootTilesX = minMaxZTree->NumTilesX(rootLevel);
|
||||||
|
int rootTilesY = minMaxZTree->NumTilesY(rootLevel);
|
||||||
|
int rootTiles = rootTilesX * rootTilesY;
|
||||||
|
_Cilk_for (int g = 0; g < rootTiles; ++g) {
|
||||||
|
uint32_t tileY = g / rootTilesX;
|
||||||
|
uint32_t tileX = g % rootTilesX;
|
||||||
|
ShadeDynamicTile(input, rootLevel, tileX, tileY, framebuffer);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif // __cilk
|
||||||
672
examples_cuda/deferred/kernels.ispc
Normal file
672
examples_cuda/deferred/kernels.ispc
Normal file
@@ -0,0 +1,672 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2010-2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "deferred.h"
|
||||||
|
|
||||||
|
struct InputDataArrays
|
||||||
|
{
|
||||||
|
float *zBuffer;
|
||||||
|
unsigned int16 *normalEncoded_x; // half float
|
||||||
|
unsigned int16 *normalEncoded_y; // half float
|
||||||
|
unsigned int16 *specularAmount; // half float
|
||||||
|
unsigned int16 *specularPower; // half float
|
||||||
|
unsigned int8 *albedo_x; // unorm8
|
||||||
|
unsigned int8 *albedo_y; // unorm8
|
||||||
|
unsigned int8 *albedo_z; // unorm8
|
||||||
|
float *lightPositionView_x;
|
||||||
|
float *lightPositionView_y;
|
||||||
|
float *lightPositionView_z;
|
||||||
|
float *lightAttenuationBegin;
|
||||||
|
float *lightColor_x;
|
||||||
|
float *lightColor_y;
|
||||||
|
float *lightColor_z;
|
||||||
|
float *lightAttenuationEnd;
|
||||||
|
};
|
||||||
|
|
||||||
|
struct InputHeader
|
||||||
|
{
|
||||||
|
float cameraProj[4][4];
|
||||||
|
float cameraNear;
|
||||||
|
float cameraFar;
|
||||||
|
|
||||||
|
int32 framebufferWidth;
|
||||||
|
int32 framebufferHeight;
|
||||||
|
int32 numLights;
|
||||||
|
int32 inputDataChunkSize;
|
||||||
|
int32 inputDataArrayOffsets[idaNum];
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
///////////////////////////////////////////////////////////////////////////
|
||||||
|
// Common utility routines
|
||||||
|
|
||||||
|
static inline float
|
||||||
|
dot3(float x, float y, float z, float a, float b, float c) {
|
||||||
|
return (x*a + y*b + z*c);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline void
|
||||||
|
normalize3(float x, float y, float z, float &ox, float &oy, float &oz) {
|
||||||
|
float n = rsqrt(x*x + y*y + z*z);
|
||||||
|
ox = x * n;
|
||||||
|
oy = y * n;
|
||||||
|
oz = z * n;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline float
|
||||||
|
Unorm8ToFloat32(unsigned int8 u) {
|
||||||
|
return (float)u * (1.0f / 255.0f);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static inline unsigned int8
|
||||||
|
Float32ToUnorm8(float f) {
|
||||||
|
return (unsigned int8)(f * 255.0f);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void
|
||||||
|
ComputeZBounds(
|
||||||
|
uniform int32 tileStartX, uniform int32 tileEndX,
|
||||||
|
uniform int32 tileStartY, uniform int32 tileEndY,
|
||||||
|
// G-buffer data
|
||||||
|
uniform float zBuffer[],
|
||||||
|
uniform int32 gBufferWidth,
|
||||||
|
// Camera data
|
||||||
|
uniform float cameraProj_33, uniform float cameraProj_43,
|
||||||
|
uniform float cameraNear, uniform float cameraFar,
|
||||||
|
// Output
|
||||||
|
uniform float &minZ,
|
||||||
|
uniform float &maxZ
|
||||||
|
)
|
||||||
|
{
|
||||||
|
// Find Z bounds
|
||||||
|
float laneMinZ = cameraFar;
|
||||||
|
float laneMaxZ = cameraNear;
|
||||||
|
for (uniform int32 y = tileStartY; y < tileEndY; ++y) {
|
||||||
|
foreach (x = tileStartX ... tileEndX) {
|
||||||
|
// Unproject depth buffer Z value into view space
|
||||||
|
float z = zBuffer[y * gBufferWidth + x];
|
||||||
|
float viewSpaceZ = cameraProj_43 / (z - cameraProj_33);
|
||||||
|
|
||||||
|
// Work out Z bounds for our samples
|
||||||
|
// Avoid considering skybox/background or otherwise invalid pixels
|
||||||
|
if ((viewSpaceZ < cameraFar) && (viewSpaceZ >= cameraNear)) {
|
||||||
|
laneMinZ = min(laneMinZ, viewSpaceZ);
|
||||||
|
laneMaxZ = max(laneMaxZ, viewSpaceZ);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
minZ = reduce_min(laneMinZ);
|
||||||
|
maxZ = reduce_max(laneMaxZ);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
export uniform int32
|
||||||
|
IntersectLightsWithTileMinMax(
|
||||||
|
uniform int32 tileStartX, uniform int32 tileEndX,
|
||||||
|
uniform int32 tileStartY, uniform int32 tileEndY,
|
||||||
|
// Tile data
|
||||||
|
uniform float minZ,
|
||||||
|
uniform float maxZ,
|
||||||
|
// G-buffer data
|
||||||
|
uniform int32 gBufferWidth, uniform int32 gBufferHeight,
|
||||||
|
// Camera data
|
||||||
|
uniform float cameraProj_11, uniform float cameraProj_22,
|
||||||
|
// Light Data
|
||||||
|
uniform int32 numLights,
|
||||||
|
uniform float light_positionView_x_array[],
|
||||||
|
uniform float light_positionView_y_array[],
|
||||||
|
uniform float light_positionView_z_array[],
|
||||||
|
uniform float light_attenuationEnd_array[],
|
||||||
|
// Output
|
||||||
|
uniform int32 tileLightIndices[]
|
||||||
|
)
|
||||||
|
{
|
||||||
|
uniform float gBufferScale_x = 0.5f * (float)gBufferWidth;
|
||||||
|
uniform float gBufferScale_y = 0.5f * (float)gBufferHeight;
|
||||||
|
|
||||||
|
uniform float frustumPlanes_xy[4] = {
|
||||||
|
-(cameraProj_11 * gBufferScale_x),
|
||||||
|
(cameraProj_11 * gBufferScale_x),
|
||||||
|
(cameraProj_22 * gBufferScale_y),
|
||||||
|
-(cameraProj_22 * gBufferScale_y) };
|
||||||
|
uniform float frustumPlanes_z[4] = {
|
||||||
|
tileEndX - gBufferScale_x,
|
||||||
|
-tileStartX + gBufferScale_x,
|
||||||
|
tileEndY - gBufferScale_y,
|
||||||
|
-tileStartY + gBufferScale_y };
|
||||||
|
|
||||||
|
for (uniform int i = 0; i < 4; ++i) {
|
||||||
|
uniform float norm = rsqrt(frustumPlanes_xy[i] * frustumPlanes_xy[i] +
|
||||||
|
frustumPlanes_z[i] * frustumPlanes_z[i]);
|
||||||
|
frustumPlanes_xy[i] *= norm;
|
||||||
|
frustumPlanes_z[i] *= norm;
|
||||||
|
}
|
||||||
|
|
||||||
|
uniform int32 tileNumLights = 0;
|
||||||
|
|
||||||
|
foreach (lightIndex = 0 ... numLights) {
|
||||||
|
float light_positionView_z = light_positionView_z_array[lightIndex];
|
||||||
|
float light_attenuationEnd = light_attenuationEnd_array[lightIndex];
|
||||||
|
float light_attenuationEndNeg = -light_attenuationEnd;
|
||||||
|
|
||||||
|
float d = light_positionView_z - minZ;
|
||||||
|
bool inFrustum = (d >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
d = maxZ - light_positionView_z;
|
||||||
|
inFrustum = inFrustum && (d >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
// This seems better than cif(!inFrustum) ccontinue; here since we
|
||||||
|
// don't actually need to mask the rest of this function - this is
|
||||||
|
// just a greedy early-out. Could also structure all of this as
|
||||||
|
// nested if() statements, but this a bit easier to read
|
||||||
|
if (any(inFrustum)) {
|
||||||
|
float light_positionView_x = light_positionView_x_array[lightIndex];
|
||||||
|
float light_positionView_y = light_positionView_y_array[lightIndex];
|
||||||
|
|
||||||
|
d = light_positionView_z * frustumPlanes_z[0] +
|
||||||
|
light_positionView_x * frustumPlanes_xy[0];
|
||||||
|
inFrustum = inFrustum && (d >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
d = light_positionView_z * frustumPlanes_z[1] +
|
||||||
|
light_positionView_x * frustumPlanes_xy[1];
|
||||||
|
inFrustum = inFrustum && (d >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
d = light_positionView_z * frustumPlanes_z[2] +
|
||||||
|
light_positionView_y * frustumPlanes_xy[2];
|
||||||
|
inFrustum = inFrustum && (d >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
d = light_positionView_z * frustumPlanes_z[3] +
|
||||||
|
light_positionView_y * frustumPlanes_xy[3];
|
||||||
|
inFrustum = inFrustum && (d >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
// Pack and store intersecting lights
|
||||||
|
cif (inFrustum) {
|
||||||
|
tileNumLights += packed_store_active(&tileLightIndices[tileNumLights],
|
||||||
|
lightIndex);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return tileNumLights;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static uniform int32
|
||||||
|
IntersectLightsWithTile(
|
||||||
|
uniform int32 tileStartX, uniform int32 tileEndX,
|
||||||
|
uniform int32 tileStartY, uniform int32 tileEndY,
|
||||||
|
uniform int32 gBufferWidth, uniform int32 gBufferHeight,
|
||||||
|
// G-buffer data
|
||||||
|
uniform float zBuffer[],
|
||||||
|
// Camera data
|
||||||
|
uniform float cameraProj_11, uniform float cameraProj_22,
|
||||||
|
uniform float cameraProj_33, uniform float cameraProj_43,
|
||||||
|
uniform float cameraNear, uniform float cameraFar,
|
||||||
|
// Light Data
|
||||||
|
uniform int32 numLights,
|
||||||
|
uniform float light_positionView_x_array[],
|
||||||
|
uniform float light_positionView_y_array[],
|
||||||
|
uniform float light_positionView_z_array[],
|
||||||
|
uniform float light_attenuationEnd_array[],
|
||||||
|
// Output
|
||||||
|
uniform int32 tileLightIndices[]
|
||||||
|
)
|
||||||
|
{
|
||||||
|
uniform float minZ, maxZ;
|
||||||
|
ComputeZBounds(tileStartX, tileEndX, tileStartY, tileEndY,
|
||||||
|
zBuffer, gBufferWidth, cameraProj_33, cameraProj_43, cameraNear, cameraFar,
|
||||||
|
minZ, maxZ);
|
||||||
|
|
||||||
|
uniform int32 tileNumLights = IntersectLightsWithTileMinMax(
|
||||||
|
tileStartX, tileEndX, tileStartY, tileEndY, minZ, maxZ,
|
||||||
|
gBufferWidth, gBufferHeight, cameraProj_11, cameraProj_22,
|
||||||
|
MAX_LIGHTS, light_positionView_x_array, light_positionView_y_array,
|
||||||
|
light_positionView_z_array, light_attenuationEnd_array,
|
||||||
|
tileLightIndices);
|
||||||
|
|
||||||
|
return tileNumLights;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
export void
|
||||||
|
ShadeTile(
|
||||||
|
uniform int32 tileStartX, uniform int32 tileEndX,
|
||||||
|
uniform int32 tileStartY, uniform int32 tileEndY,
|
||||||
|
uniform int32 gBufferWidth, uniform int32 gBufferHeight,
|
||||||
|
uniform InputDataArrays &inputData,
|
||||||
|
// Camera data
|
||||||
|
uniform float cameraProj_11, uniform float cameraProj_22,
|
||||||
|
uniform float cameraProj_33, uniform float cameraProj_43,
|
||||||
|
// Light list
|
||||||
|
uniform int32 tileLightIndices[],
|
||||||
|
uniform int32 tileNumLights,
|
||||||
|
// UI
|
||||||
|
uniform bool visualizeLightCount,
|
||||||
|
// Output
|
||||||
|
uniform unsigned int8 framebuffer_r[],
|
||||||
|
uniform unsigned int8 framebuffer_g[],
|
||||||
|
uniform unsigned int8 framebuffer_b[]
|
||||||
|
)
|
||||||
|
{
|
||||||
|
if (tileNumLights == 0 || visualizeLightCount) {
|
||||||
|
uniform unsigned int8 c = (unsigned int8)(min(tileNumLights << 2, 255));
|
||||||
|
for (uniform int32 y = tileStartY; y < tileEndY; ++y) {
|
||||||
|
foreach (x = tileStartX ... tileEndX) {
|
||||||
|
int32 framebufferIndex = (y * gBufferWidth + x);
|
||||||
|
framebuffer_r[framebufferIndex] = c;
|
||||||
|
framebuffer_g[framebufferIndex] = c;
|
||||||
|
framebuffer_b[framebufferIndex] = c;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
uniform float twoOverGBufferWidth = 2.0f / gBufferWidth;
|
||||||
|
uniform float twoOverGBufferHeight = 2.0f / gBufferHeight;
|
||||||
|
|
||||||
|
for (uniform int32 y = tileStartY; y < tileEndY; ++y) {
|
||||||
|
uniform float positionScreen_y = -(((0.5f + y) * twoOverGBufferHeight) - 1.f);
|
||||||
|
|
||||||
|
foreach (x = tileStartX ... tileEndX) {
|
||||||
|
int32 gBufferOffset = y * gBufferWidth + x;
|
||||||
|
|
||||||
|
// Reconstruct position and (negative) view vector from G-buffer
|
||||||
|
float surface_positionView_x, surface_positionView_y, surface_positionView_z;
|
||||||
|
float Vneg_x, Vneg_y, Vneg_z;
|
||||||
|
|
||||||
|
float z = inputData.zBuffer[gBufferOffset];
|
||||||
|
|
||||||
|
// Compute screen/clip-space position
|
||||||
|
// NOTE: Mind DX11 viewport transform and pixel center!
|
||||||
|
float positionScreen_x = (0.5f + (float)(x)) *
|
||||||
|
twoOverGBufferWidth - 1.0f;
|
||||||
|
|
||||||
|
// Unproject depth buffer Z value into view space
|
||||||
|
surface_positionView_z = cameraProj_43 / (z - cameraProj_33);
|
||||||
|
surface_positionView_x = positionScreen_x * surface_positionView_z /
|
||||||
|
cameraProj_11;
|
||||||
|
surface_positionView_y = positionScreen_y * surface_positionView_z /
|
||||||
|
cameraProj_22;
|
||||||
|
|
||||||
|
// We actually end up with a vector pointing *at* the
|
||||||
|
// surface (i.e. the negative view vector)
|
||||||
|
normalize3(surface_positionView_x, surface_positionView_y,
|
||||||
|
surface_positionView_z, Vneg_x, Vneg_y, Vneg_z);
|
||||||
|
|
||||||
|
// Reconstruct normal from G-buffer
|
||||||
|
float surface_normal_x, surface_normal_y, surface_normal_z;
|
||||||
|
float normal_x = half_to_float(inputData.normalEncoded_x[gBufferOffset]);
|
||||||
|
float normal_y = half_to_float(inputData.normalEncoded_y[gBufferOffset]);
|
||||||
|
|
||||||
|
float f = (normal_x - normal_x * normal_x) + (normal_y - normal_y * normal_y);
|
||||||
|
float m = sqrt(4.0f * f - 1.0f);
|
||||||
|
|
||||||
|
surface_normal_x = m * (4.0f * normal_x - 2.0f);
|
||||||
|
surface_normal_y = m * (4.0f * normal_y - 2.0f);
|
||||||
|
surface_normal_z = 3.0f - 8.0f * f;
|
||||||
|
|
||||||
|
// Load other G-buffer parameters
|
||||||
|
float surface_specularAmount =
|
||||||
|
half_to_float(inputData.specularAmount[gBufferOffset]);
|
||||||
|
float surface_specularPower =
|
||||||
|
half_to_float(inputData.specularPower[gBufferOffset]);
|
||||||
|
float surface_albedo_x = Unorm8ToFloat32(inputData.albedo_x[gBufferOffset]);
|
||||||
|
float surface_albedo_y = Unorm8ToFloat32(inputData.albedo_y[gBufferOffset]);
|
||||||
|
float surface_albedo_z = Unorm8ToFloat32(inputData.albedo_z[gBufferOffset]);
|
||||||
|
|
||||||
|
float lit_x = 0.0f;
|
||||||
|
float lit_y = 0.0f;
|
||||||
|
float lit_z = 0.0f;
|
||||||
|
for (uniform int32 tileLightIndex = 0; tileLightIndex < tileNumLights;
|
||||||
|
++tileLightIndex) {
|
||||||
|
uniform int32 lightIndex = tileLightIndices[tileLightIndex];
|
||||||
|
|
||||||
|
// Gather light data relevant to initial culling
|
||||||
|
uniform float light_positionView_x =
|
||||||
|
inputData.lightPositionView_x[lightIndex];
|
||||||
|
uniform float light_positionView_y =
|
||||||
|
inputData.lightPositionView_y[lightIndex];
|
||||||
|
uniform float light_positionView_z =
|
||||||
|
inputData.lightPositionView_z[lightIndex];
|
||||||
|
uniform float light_attenuationEnd =
|
||||||
|
inputData.lightAttenuationEnd[lightIndex];
|
||||||
|
|
||||||
|
// Compute light vector
|
||||||
|
float L_x = light_positionView_x - surface_positionView_x;
|
||||||
|
float L_y = light_positionView_y - surface_positionView_y;
|
||||||
|
float L_z = light_positionView_z - surface_positionView_z;
|
||||||
|
|
||||||
|
float distanceToLight2 = dot3(L_x, L_y, L_z, L_x, L_y, L_z);
|
||||||
|
|
||||||
|
// Clip at end of attenuation
|
||||||
|
float light_attenutaionEnd2 = light_attenuationEnd * light_attenuationEnd;
|
||||||
|
|
||||||
|
cif (distanceToLight2 < light_attenutaionEnd2) {
|
||||||
|
float distanceToLight = sqrt(distanceToLight2);
|
||||||
|
|
||||||
|
// HLSL "rcp" is allowed to be fairly inaccurate
|
||||||
|
float distanceToLightRcp = rcp(distanceToLight);
|
||||||
|
L_x *= distanceToLightRcp;
|
||||||
|
L_y *= distanceToLightRcp;
|
||||||
|
L_z *= distanceToLightRcp;
|
||||||
|
|
||||||
|
// Start computing brdf
|
||||||
|
float NdotL = dot3(surface_normal_x, surface_normal_y,
|
||||||
|
surface_normal_z, L_x, L_y, L_z);
|
||||||
|
|
||||||
|
// Clip back facing
|
||||||
|
cif (NdotL > 0.0f) {
|
||||||
|
uniform float light_attenuationBegin =
|
||||||
|
inputData.lightAttenuationBegin[lightIndex];
|
||||||
|
|
||||||
|
// Light distance attenuation (linstep)
|
||||||
|
float lightRange = (light_attenuationEnd - light_attenuationBegin);
|
||||||
|
float falloffPosition = (light_attenuationEnd - distanceToLight);
|
||||||
|
float attenuation = min(falloffPosition / lightRange, 1.0f);
|
||||||
|
|
||||||
|
float H_x = (L_x - Vneg_x);
|
||||||
|
float H_y = (L_y - Vneg_y);
|
||||||
|
float H_z = (L_z - Vneg_z);
|
||||||
|
normalize3(H_x, H_y, H_z, H_x, H_y, H_z);
|
||||||
|
|
||||||
|
float NdotH = dot3(surface_normal_x, surface_normal_y,
|
||||||
|
surface_normal_z, H_x, H_y, H_z);
|
||||||
|
NdotH = max(NdotH, 0.0f);
|
||||||
|
|
||||||
|
float specular = pow(NdotH, surface_specularPower);
|
||||||
|
float specularNorm = (surface_specularPower + 2.0f) *
|
||||||
|
(1.0f / 8.0f);
|
||||||
|
float specularContrib = surface_specularAmount *
|
||||||
|
specularNorm * specular;
|
||||||
|
|
||||||
|
float k = attenuation * NdotL * (1.0f + specularContrib);
|
||||||
|
|
||||||
|
uniform float light_color_x = inputData.lightColor_x[lightIndex];
|
||||||
|
uniform float light_color_y = inputData.lightColor_y[lightIndex];
|
||||||
|
uniform float light_color_z = inputData.lightColor_z[lightIndex];
|
||||||
|
|
||||||
|
float lightContrib_x = surface_albedo_x * light_color_x;
|
||||||
|
float lightContrib_y = surface_albedo_y * light_color_y;
|
||||||
|
float lightContrib_z = surface_albedo_z * light_color_z;
|
||||||
|
|
||||||
|
lit_x += lightContrib_x * k;
|
||||||
|
lit_y += lightContrib_y * k;
|
||||||
|
lit_z += lightContrib_z * k;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Gamma correct
|
||||||
|
// These pows are pretty slow right now, but we can do
|
||||||
|
// something faster if really necessary to squeeze every
|
||||||
|
// last bit of performance out of it
|
||||||
|
float gamma = 1.0 / 2.2f;
|
||||||
|
lit_x = pow(clamp(lit_x, 0.0f, 1.0f), gamma);
|
||||||
|
lit_y = pow(clamp(lit_y, 0.0f, 1.0f), gamma);
|
||||||
|
lit_z = pow(clamp(lit_z, 0.0f, 1.0f), gamma);
|
||||||
|
|
||||||
|
framebuffer_r[gBufferOffset] = Float32ToUnorm8(lit_x);
|
||||||
|
framebuffer_g[gBufferOffset] = Float32ToUnorm8(lit_y);
|
||||||
|
framebuffer_b[gBufferOffset] = Float32ToUnorm8(lit_z);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
///////////////////////////////////////////////////////////////////////////
|
||||||
|
// Static decomposition
|
||||||
|
|
||||||
|
task void
|
||||||
|
RenderTile(uniform int num_groups_x, uniform int num_groups_y,
|
||||||
|
uniform InputHeader &inputHeader,
|
||||||
|
uniform InputDataArrays &inputData,
|
||||||
|
uniform int visualizeLightCount,
|
||||||
|
// Output
|
||||||
|
uniform unsigned int8 framebuffer_r[],
|
||||||
|
uniform unsigned int8 framebuffer_g[],
|
||||||
|
uniform unsigned int8 framebuffer_b[]) {
|
||||||
|
uniform int32 group_y = taskIndex / num_groups_x;
|
||||||
|
uniform int32 group_x = taskIndex % num_groups_x;
|
||||||
|
uniform int32 tile_start_x = group_x * MIN_TILE_WIDTH;
|
||||||
|
uniform int32 tile_start_y = group_y * MIN_TILE_HEIGHT;
|
||||||
|
uniform int32 tile_end_x = tile_start_x + MIN_TILE_WIDTH;
|
||||||
|
uniform int32 tile_end_y = tile_start_y + MIN_TILE_HEIGHT;
|
||||||
|
|
||||||
|
uniform int framebufferWidth = inputHeader.framebufferWidth;
|
||||||
|
uniform int framebufferHeight = inputHeader.framebufferHeight;
|
||||||
|
uniform float cameraProj_00 = inputHeader.cameraProj[0][0];
|
||||||
|
uniform float cameraProj_11 = inputHeader.cameraProj[1][1];
|
||||||
|
uniform float cameraProj_22 = inputHeader.cameraProj[2][2];
|
||||||
|
uniform float cameraProj_32 = inputHeader.cameraProj[3][2];
|
||||||
|
|
||||||
|
// Light intersection: figure out which lights illuminate this tile.
|
||||||
|
uniform int tileLightIndices[MAX_LIGHTS]; // Light list for the tile
|
||||||
|
uniform int numTileLights =
|
||||||
|
IntersectLightsWithTile(tile_start_x, tile_end_x,
|
||||||
|
tile_start_y, tile_end_y,
|
||||||
|
framebufferWidth, framebufferHeight,
|
||||||
|
inputData.zBuffer,
|
||||||
|
cameraProj_00, cameraProj_11,
|
||||||
|
cameraProj_22, cameraProj_32,
|
||||||
|
inputHeader.cameraNear, inputHeader.cameraFar,
|
||||||
|
MAX_LIGHTS,
|
||||||
|
inputData.lightPositionView_x,
|
||||||
|
inputData.lightPositionView_y,
|
||||||
|
inputData.lightPositionView_z,
|
||||||
|
inputData.lightAttenuationEnd,
|
||||||
|
tileLightIndices);
|
||||||
|
|
||||||
|
// And now shade the tile, using the lights in tileLightIndices
|
||||||
|
ShadeTile(tile_start_x, tile_end_x, tile_start_y, tile_end_y,
|
||||||
|
framebufferWidth, framebufferHeight, inputData,
|
||||||
|
cameraProj_00, cameraProj_11, cameraProj_22, cameraProj_32,
|
||||||
|
tileLightIndices, numTileLights, visualizeLightCount,
|
||||||
|
framebuffer_r, framebuffer_g, framebuffer_b);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
export void
|
||||||
|
RenderStatic(uniform InputHeader &inputHeader,
|
||||||
|
uniform InputDataArrays &inputData,
|
||||||
|
uniform int visualizeLightCount,
|
||||||
|
// Output
|
||||||
|
uniform unsigned int8 framebuffer_r[],
|
||||||
|
uniform unsigned int8 framebuffer_g[],
|
||||||
|
uniform unsigned int8 framebuffer_b[]) {
|
||||||
|
uniform int num_groups_x = (inputHeader.framebufferWidth +
|
||||||
|
MIN_TILE_WIDTH - 1) / MIN_TILE_WIDTH;
|
||||||
|
uniform int num_groups_y = (inputHeader.framebufferHeight +
|
||||||
|
MIN_TILE_HEIGHT - 1) / MIN_TILE_HEIGHT;
|
||||||
|
uniform int num_groups = num_groups_x * num_groups_y;
|
||||||
|
|
||||||
|
// Launch a task to render each tile, each of which is MIN_TILE_WIDTH
|
||||||
|
// by MIN_TILE_HEIGHT pixels.
|
||||||
|
launch[num_groups] RenderTile(num_groups_x, num_groups_y,
|
||||||
|
inputHeader, inputData, visualizeLightCount,
|
||||||
|
framebuffer_r, framebuffer_g, framebuffer_b);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
///////////////////////////////////////////////////////////////////////////
|
||||||
|
// Routines for dynamic decomposition path
|
||||||
|
|
||||||
|
// This computes the z min/max range for a whole row worth of tiles.
|
||||||
|
export void
|
||||||
|
ComputeZBoundsRow(
|
||||||
|
uniform int32 tileY,
|
||||||
|
uniform int32 tileWidth, uniform int32 tileHeight,
|
||||||
|
uniform int32 numTilesX, uniform int32 numTilesY,
|
||||||
|
// G-buffer data
|
||||||
|
uniform float zBuffer[],
|
||||||
|
uniform int32 gBufferWidth,
|
||||||
|
// Camera data
|
||||||
|
uniform float cameraProj_33, uniform float cameraProj_43,
|
||||||
|
uniform float cameraNear, uniform float cameraFar,
|
||||||
|
// Output
|
||||||
|
uniform float minZArray[],
|
||||||
|
uniform float maxZArray[]
|
||||||
|
)
|
||||||
|
{
|
||||||
|
for (uniform int32 tileX = 0; tileX < numTilesX; ++tileX) {
|
||||||
|
uniform float minZ, maxZ;
|
||||||
|
ComputeZBounds(
|
||||||
|
tileX * tileWidth, tileX * tileWidth + tileWidth,
|
||||||
|
tileY * tileHeight, tileY * tileHeight + tileHeight,
|
||||||
|
zBuffer, gBufferWidth,
|
||||||
|
cameraProj_33, cameraProj_43, cameraNear, cameraFar,
|
||||||
|
minZ, maxZ);
|
||||||
|
minZArray[tileX] = minZ;
|
||||||
|
maxZArray[tileX] = maxZ;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
// Reclassifies the lights with respect to four sub-tiles when we refine a tile.
|
||||||
|
// numLights need not be a multiple of programCount here, but the input and output arrays
|
||||||
|
// should be able to handle programCount-sized load/stores.
|
||||||
|
export void
|
||||||
|
SplitTileMinMax(
|
||||||
|
uniform int32 tileMidX, uniform int32 tileMidY,
|
||||||
|
// Subtile data (00, 10, 01, 11)
|
||||||
|
uniform float subtileMinZ[],
|
||||||
|
uniform float subtileMaxZ[],
|
||||||
|
// G-buffer data
|
||||||
|
uniform int32 gBufferWidth, uniform int32 gBufferHeight,
|
||||||
|
// Camera data
|
||||||
|
uniform float cameraProj_11, uniform float cameraProj_22,
|
||||||
|
// Light Data
|
||||||
|
uniform int32 lightIndices[],
|
||||||
|
uniform int32 numLights,
|
||||||
|
uniform float light_positionView_x_array[],
|
||||||
|
uniform float light_positionView_y_array[],
|
||||||
|
uniform float light_positionView_z_array[],
|
||||||
|
uniform float light_attenuationEnd_array[],
|
||||||
|
// Outputs
|
||||||
|
uniform int32 subtileIndices[],
|
||||||
|
uniform int32 subtileIndicesPitch,
|
||||||
|
uniform int32 subtileNumLights[]
|
||||||
|
)
|
||||||
|
{
|
||||||
|
uniform float gBufferScale_x = 0.5f * (float)gBufferWidth;
|
||||||
|
uniform float gBufferScale_y = 0.5f * (float)gBufferHeight;
|
||||||
|
|
||||||
|
uniform float frustumPlanes_xy[2] = { -(cameraProj_11 * gBufferScale_x),
|
||||||
|
(cameraProj_22 * gBufferScale_y) };
|
||||||
|
uniform float frustumPlanes_z[2] = { tileMidX - gBufferScale_x,
|
||||||
|
tileMidY - gBufferScale_y };
|
||||||
|
|
||||||
|
// Normalize
|
||||||
|
uniform float norm[2] = { rsqrt(frustumPlanes_xy[0] * frustumPlanes_xy[0] +
|
||||||
|
frustumPlanes_z[0] * frustumPlanes_z[0]),
|
||||||
|
rsqrt(frustumPlanes_xy[1] * frustumPlanes_xy[1] +
|
||||||
|
frustumPlanes_z[1] * frustumPlanes_z[1]) };
|
||||||
|
frustumPlanes_xy[0] *= norm[0];
|
||||||
|
frustumPlanes_xy[1] *= norm[1];
|
||||||
|
frustumPlanes_z[0] *= norm[0];
|
||||||
|
frustumPlanes_z[1] *= norm[1];
|
||||||
|
|
||||||
|
// Initialize
|
||||||
|
uniform int32 subtileLightOffset[4];
|
||||||
|
subtileLightOffset[0] = 0 * subtileIndicesPitch;
|
||||||
|
subtileLightOffset[1] = 1 * subtileIndicesPitch;
|
||||||
|
subtileLightOffset[2] = 2 * subtileIndicesPitch;
|
||||||
|
subtileLightOffset[3] = 3 * subtileIndicesPitch;
|
||||||
|
|
||||||
|
foreach (i = 0 ... numLights) {
|
||||||
|
int32 lightIndex = lightIndices[i];
|
||||||
|
|
||||||
|
float light_positionView_x = light_positionView_x_array[lightIndex];
|
||||||
|
float light_positionView_y = light_positionView_y_array[lightIndex];
|
||||||
|
float light_positionView_z = light_positionView_z_array[lightIndex];
|
||||||
|
float light_attenuationEnd = light_attenuationEnd_array[lightIndex];
|
||||||
|
float light_attenuationEndNeg = -light_attenuationEnd;
|
||||||
|
|
||||||
|
// Test lights again subtile z bounds
|
||||||
|
bool inFrustum[4];
|
||||||
|
inFrustum[0] = (light_positionView_z - subtileMinZ[0] >= light_attenuationEndNeg) &&
|
||||||
|
(subtileMaxZ[0] - light_positionView_z >= light_attenuationEndNeg);
|
||||||
|
inFrustum[1] = (light_positionView_z - subtileMinZ[1] >= light_attenuationEndNeg) &&
|
||||||
|
(subtileMaxZ[1] - light_positionView_z >= light_attenuationEndNeg);
|
||||||
|
inFrustum[2] = (light_positionView_z - subtileMinZ[2] >= light_attenuationEndNeg) &&
|
||||||
|
(subtileMaxZ[2] - light_positionView_z >= light_attenuationEndNeg);
|
||||||
|
inFrustum[3] = (light_positionView_z - subtileMinZ[3] >= light_attenuationEndNeg) &&
|
||||||
|
(subtileMaxZ[3] - light_positionView_z >= light_attenuationEndNeg);
|
||||||
|
|
||||||
|
float dx = light_positionView_z * frustumPlanes_z[0] +
|
||||||
|
light_positionView_x * frustumPlanes_xy[0];
|
||||||
|
float dy = light_positionView_z * frustumPlanes_z[1] +
|
||||||
|
light_positionView_y * frustumPlanes_xy[1];
|
||||||
|
|
||||||
|
cif (abs(dx) > light_attenuationEnd) {
|
||||||
|
bool positiveX = dx > 0.0f;
|
||||||
|
inFrustum[0] = inFrustum[0] && positiveX; // 00 subtile
|
||||||
|
inFrustum[1] = inFrustum[1] && !positiveX; // 10 subtile
|
||||||
|
inFrustum[2] = inFrustum[2] && positiveX; // 01 subtile
|
||||||
|
inFrustum[3] = inFrustum[3] && !positiveX; // 11 subtile
|
||||||
|
}
|
||||||
|
cif (abs(dy) > light_attenuationEnd) {
|
||||||
|
bool positiveY = dy > 0.0f;
|
||||||
|
inFrustum[0] = inFrustum[0] && positiveY; // 00 subtile
|
||||||
|
inFrustum[1] = inFrustum[1] && positiveY; // 10 subtile
|
||||||
|
inFrustum[2] = inFrustum[2] && !positiveY; // 01 subtile
|
||||||
|
inFrustum[3] = inFrustum[3] && !positiveY; // 11 subtile
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pack and store intersecting lights
|
||||||
|
// TODO: Experiment with a loop here instead
|
||||||
|
cif (inFrustum[0])
|
||||||
|
subtileLightOffset[0] +=
|
||||||
|
packed_store_active(&subtileIndices[subtileLightOffset[0]],
|
||||||
|
lightIndex);
|
||||||
|
cif (inFrustum[1])
|
||||||
|
subtileLightOffset[1] +=
|
||||||
|
packed_store_active(&subtileIndices[subtileLightOffset[1]],
|
||||||
|
lightIndex);
|
||||||
|
cif (inFrustum[2])
|
||||||
|
subtileLightOffset[2] +=
|
||||||
|
packed_store_active(&subtileIndices[subtileLightOffset[2]],
|
||||||
|
lightIndex);
|
||||||
|
cif (inFrustum[3])
|
||||||
|
subtileLightOffset[3] +=
|
||||||
|
packed_store_active(&subtileIndices[subtileLightOffset[3]],
|
||||||
|
lightIndex);
|
||||||
|
}
|
||||||
|
|
||||||
|
subtileNumLights[0] = subtileLightOffset[0] - 0 * subtileIndicesPitch;
|
||||||
|
subtileNumLights[1] = subtileLightOffset[1] - 1 * subtileIndicesPitch;
|
||||||
|
subtileNumLights[2] = subtileLightOffset[2] - 2 * subtileIndicesPitch;
|
||||||
|
subtileNumLights[3] = subtileLightOffset[3] - 3 * subtileIndicesPitch;
|
||||||
|
}
|
||||||
139
examples_cuda/deferred/main.cpp
Normal file
139
examples_cuda/deferred/main.cpp
Normal file
@@ -0,0 +1,139 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef _MSC_VER
|
||||||
|
#define ISPC_IS_WINDOWS
|
||||||
|
#define NOMINMAX
|
||||||
|
#elif defined(__linux__)
|
||||||
|
#define ISPC_IS_LINUX
|
||||||
|
#elif defined(__APPLE__)
|
||||||
|
#define ISPC_IS_APPLE
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#include <fcntl.h>
|
||||||
|
#include <float.h>
|
||||||
|
#include <math.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <sys/types.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <algorithm>
|
||||||
|
#include <assert.h>
|
||||||
|
#include <vector>
|
||||||
|
#ifdef ISPC_IS_WINDOWS
|
||||||
|
#define WIN32_LEAN_AND_MEAN
|
||||||
|
#include <windows.h>
|
||||||
|
#endif
|
||||||
|
#include "deferred.h"
|
||||||
|
#include "kernels_ispc.h"
|
||||||
|
#include "../timing.h"
|
||||||
|
|
||||||
|
///////////////////////////////////////////////////////////////////////////
|
||||||
|
|
||||||
|
int main(int argc, char** argv) {
|
||||||
|
if (argc != 2) {
|
||||||
|
printf("usage: deferred_shading <input_file (e.g. data/pp1280x720.bin)>\n");
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
InputData *input = CreateInputDataFromFile(argv[1]);
|
||||||
|
if (!input) {
|
||||||
|
printf("Failed to load input file \"%s\"!\n", argv[1]);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
Framebuffer framebuffer(input->header.framebufferWidth,
|
||||||
|
input->header.framebufferHeight);
|
||||||
|
|
||||||
|
InitDynamicC(input);
|
||||||
|
#ifdef __cilk
|
||||||
|
InitDynamicCilk(input);
|
||||||
|
#endif // __cilk
|
||||||
|
|
||||||
|
int nframes = 5;
|
||||||
|
double ispcCycles = 1e30;
|
||||||
|
for (int i = 0; i < 5; ++i) {
|
||||||
|
framebuffer.clear();
|
||||||
|
reset_and_start_timer();
|
||||||
|
for (int j = 0; j < nframes; ++j)
|
||||||
|
ispc::RenderStatic(input->header, input->arrays,
|
||||||
|
VISUALIZE_LIGHT_COUNT,
|
||||||
|
framebuffer.r, framebuffer.g, framebuffer.b);
|
||||||
|
double mcycles = get_elapsed_mcycles() / nframes;
|
||||||
|
ispcCycles = std::min(ispcCycles, mcycles);
|
||||||
|
}
|
||||||
|
printf("[ispc static + tasks]:\t\t[%.3f] million cycles to render "
|
||||||
|
"%d x %d image\n", ispcCycles,
|
||||||
|
input->header.framebufferWidth, input->header.framebufferHeight);
|
||||||
|
WriteFrame("deferred-ispc-static.ppm", input, framebuffer);
|
||||||
|
|
||||||
|
#ifdef __cilk
|
||||||
|
double dynamicCilkCycles = 1e30;
|
||||||
|
for (int i = 0; i < 5; ++i) {
|
||||||
|
framebuffer.clear();
|
||||||
|
reset_and_start_timer();
|
||||||
|
for (int j = 0; j < nframes; ++j)
|
||||||
|
DispatchDynamicCilk(input, &framebuffer);
|
||||||
|
double mcycles = get_elapsed_mcycles() / nframes;
|
||||||
|
dynamicCilkCycles = std::min(dynamicCilkCycles, mcycles);
|
||||||
|
}
|
||||||
|
printf("[ispc + Cilk dynamic]:\t\t[%.3f] million cycles to render image\n",
|
||||||
|
dynamicCilkCycles);
|
||||||
|
WriteFrame("deferred-ispc-dynamic.ppm", input, framebuffer);
|
||||||
|
#endif // __cilk
|
||||||
|
|
||||||
|
double serialCycles = 1e30;
|
||||||
|
for (int i = 0; i < 5; ++i) {
|
||||||
|
framebuffer.clear();
|
||||||
|
reset_and_start_timer();
|
||||||
|
for (int j = 0; j < nframes; ++j)
|
||||||
|
DispatchDynamicC(input, &framebuffer);
|
||||||
|
double mcycles = get_elapsed_mcycles() / nframes;
|
||||||
|
serialCycles = std::min(serialCycles, mcycles);
|
||||||
|
}
|
||||||
|
printf("[C++ serial dynamic, 1 core]:\t[%.3f] million cycles to render image\n",
|
||||||
|
serialCycles);
|
||||||
|
WriteFrame("deferred-serial-dynamic.ppm", input, framebuffer);
|
||||||
|
|
||||||
|
#ifdef __cilk
|
||||||
|
printf("\t\t\t\t(%.2fx speedup from static ISPC, %.2fx from Cilk+ISPC)\n",
|
||||||
|
serialCycles/ispcCycles, serialCycles/dynamicCilkCycles);
|
||||||
|
#else
|
||||||
|
printf("\t\t\t\t(%.2fx speedup from ISPC + tasks)\n", serialCycles/ispcCycles);
|
||||||
|
#endif // __cilk
|
||||||
|
|
||||||
|
DeleteInputData(input);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
139
examples_cuda/deferred/main_cu.cpp
Normal file
139
examples_cuda/deferred/main_cu.cpp
Normal file
@@ -0,0 +1,139 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef _MSC_VER
|
||||||
|
#define ISPC_IS_WINDOWS
|
||||||
|
#define NOMINMAX
|
||||||
|
#elif defined(__linux__)
|
||||||
|
#define ISPC_IS_LINUX
|
||||||
|
#elif defined(__APPLE__)
|
||||||
|
#define ISPC_IS_APPLE
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#include <fcntl.h>
|
||||||
|
#include <float.h>
|
||||||
|
#include <math.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <sys/types.h>
|
||||||
|
#include <stdint.h>
|
||||||
|
#include <algorithm>
|
||||||
|
#include <assert.h>
|
||||||
|
#include <vector>
|
||||||
|
#ifdef ISPC_IS_WINDOWS
|
||||||
|
#define WIN32_LEAN_AND_MEAN
|
||||||
|
#include <windows.h>
|
||||||
|
#endif
|
||||||
|
#include "deferred.h"
|
||||||
|
#include "kernels_ispc.h"
|
||||||
|
#include "../timing.h"
|
||||||
|
|
||||||
|
///////////////////////////////////////////////////////////////////////////
|
||||||
|
|
||||||
|
int main(int argc, char** argv) {
|
||||||
|
if (argc != 2) {
|
||||||
|
printf("usage: deferred_shading <input_file (e.g. data/pp1280x720.bin)>\n");
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
InputData *input = CreateInputDataFromFile(argv[1]);
|
||||||
|
if (!input) {
|
||||||
|
printf("Failed to load input file \"%s\"!\n", argv[1]);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
Framebuffer framebuffer(input->header.framebufferWidth,
|
||||||
|
input->header.framebufferHeight);
|
||||||
|
|
||||||
|
InitDynamicC(input);
|
||||||
|
#ifdef __cilk
|
||||||
|
InitDynamicCilk(input);
|
||||||
|
#endif // __cilk
|
||||||
|
|
||||||
|
int nframes = 5;
|
||||||
|
double ispcCycles = 1e30;
|
||||||
|
for (int i = 0; i < 5; ++i) {
|
||||||
|
framebuffer.clear();
|
||||||
|
reset_and_start_timer();
|
||||||
|
for (int j = 0; j < nframes; ++j)
|
||||||
|
ispc::RenderStatic(input->header, input->arrays,
|
||||||
|
VISUALIZE_LIGHT_COUNT,
|
||||||
|
framebuffer.r, framebuffer.g, framebuffer.b);
|
||||||
|
double mcycles = get_elapsed_mcycles() / nframes;
|
||||||
|
ispcCycles = std::min(ispcCycles, mcycles);
|
||||||
|
}
|
||||||
|
printf("[ispc static + tasks]:\t\t[%.3f] million cycles to render "
|
||||||
|
"%d x %d image\n", ispcCycles,
|
||||||
|
input->header.framebufferWidth, input->header.framebufferHeight);
|
||||||
|
WriteFrame("deferred-ispc-static.ppm", input, framebuffer);
|
||||||
|
|
||||||
|
#ifdef __cilk
|
||||||
|
double dynamicCilkCycles = 1e30;
|
||||||
|
for (int i = 0; i < 5; ++i) {
|
||||||
|
framebuffer.clear();
|
||||||
|
reset_and_start_timer();
|
||||||
|
for (int j = 0; j < nframes; ++j)
|
||||||
|
DispatchDynamicCilk(input, &framebuffer);
|
||||||
|
double mcycles = get_elapsed_mcycles() / nframes;
|
||||||
|
dynamicCilkCycles = std::min(dynamicCilkCycles, mcycles);
|
||||||
|
}
|
||||||
|
printf("[ispc + Cilk dynamic]:\t\t[%.3f] million cycles to render image\n",
|
||||||
|
dynamicCilkCycles);
|
||||||
|
WriteFrame("deferred-ispc-dynamic.ppm", input, framebuffer);
|
||||||
|
#endif // __cilk
|
||||||
|
|
||||||
|
double serialCycles = 1e30;
|
||||||
|
for (int i = 0; i < 5; ++i) {
|
||||||
|
framebuffer.clear();
|
||||||
|
reset_and_start_timer();
|
||||||
|
for (int j = 0; j < nframes; ++j)
|
||||||
|
DispatchDynamicC(input, &framebuffer);
|
||||||
|
double mcycles = get_elapsed_mcycles() / nframes;
|
||||||
|
serialCycles = std::min(serialCycles, mcycles);
|
||||||
|
}
|
||||||
|
printf("[C++ serial dynamic, 1 core]:\t[%.3f] million cycles to render image\n",
|
||||||
|
serialCycles);
|
||||||
|
WriteFrame("deferred-serial-dynamic.ppm", input, framebuffer);
|
||||||
|
|
||||||
|
#ifdef __cilk
|
||||||
|
printf("\t\t\t\t(%.2fx speedup from static ISPC, %.2fx from Cilk+ISPC)\n",
|
||||||
|
serialCycles/ispcCycles, serialCycles/dynamicCilkCycles);
|
||||||
|
#else
|
||||||
|
printf("\t\t\t\t(%.2fx speedup from ISPC + tasks)\n", serialCycles/ispcCycles);
|
||||||
|
#endif // __cilk
|
||||||
|
|
||||||
|
DeleteInputData(input);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
136
examples_cuda/examples.sln
Executable file
136
examples_cuda/examples.sln
Executable file
@@ -0,0 +1,136 @@
|
|||||||
|
|
||||||
|
Microsoft Visual Studio Solution File, Format Version 11.00
|
||||||
|
# Visual Studio 2010
|
||||||
|
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "simple", "simple\simple.vcxproj", "{947C5311-8B78-4D05-BEE4-BCF342D4B367}"
|
||||||
|
EndProject
|
||||||
|
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "rt", "rt\rt.vcxproj", "{E787BC3F-2D2E-425E-A64D-4721E2FF3DC9}"
|
||||||
|
EndProject
|
||||||
|
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "aobench", "aobench\aobench.vcxproj", "{F29204CA-19DF-4F3C-87D5-03F4EEDAAFEB}"
|
||||||
|
EndProject
|
||||||
|
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "mandelbrot", "mandelbrot\mandelbrot.vcxproj", "{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C1}"
|
||||||
|
EndProject
|
||||||
|
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "options", "options\options.vcxproj", "{8C7B5D29-1E76-44E6-BBB8-09830E5DEEAE}"
|
||||||
|
EndProject
|
||||||
|
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "mandelbrot_tasks", "mandelbrot_tasks\mandelbrot_tasks.vcxproj", "{E80DA7D4-AB22-4648-A068-327307156BE6}"
|
||||||
|
EndProject
|
||||||
|
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "aobench_instrumented", "aobench_instrumented\aobench_instrumented.vcxproj", "{B3B4AE3D-6D5A-4CF9-AF5B-43CF2131B958}"
|
||||||
|
EndProject
|
||||||
|
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "noise", "noise\noise.vcxproj", "{0E0886D8-8B5E-4EAF-9A21-91E63DAF81FD}"
|
||||||
|
EndProject
|
||||||
|
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "volume", "volume_rendering\volume.vcxproj", "{DEE5733A-E93E-449D-9114-9BFFCAEB4DF9}"
|
||||||
|
EndProject
|
||||||
|
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "stencil", "stencil\stencil.vcxproj", "{2EF070A1-F62F-4E6A-944B-88D140945C3C}"
|
||||||
|
EndProject
|
||||||
|
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "deferred_shading", "deferred\deferred_shading.vcxproj", "{87F53C53-957E-4E91-878A-BC27828FB9EB}"
|
||||||
|
EndProject
|
||||||
|
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "perfbench", "perfbench\perfbench.vcxproj", "{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}"
|
||||||
|
EndProject
|
||||||
|
Global
|
||||||
|
GlobalSection(SolutionConfigurationPlatforms) = preSolution
|
||||||
|
Debug|Win32 = Debug|Win32
|
||||||
|
Debug|x64 = Debug|x64
|
||||||
|
Release|Win32 = Release|Win32
|
||||||
|
Release|x64 = Release|x64
|
||||||
|
EndGlobalSection
|
||||||
|
GlobalSection(ProjectConfigurationPlatforms) = postSolution
|
||||||
|
{947C5311-8B78-4D05-BEE4-BCF342D4B367}.Debug|Win32.ActiveCfg = Debug|Win32
|
||||||
|
{947C5311-8B78-4D05-BEE4-BCF342D4B367}.Debug|Win32.Build.0 = Debug|Win32
|
||||||
|
{947C5311-8B78-4D05-BEE4-BCF342D4B367}.Debug|x64.ActiveCfg = Debug|x64
|
||||||
|
{947C5311-8B78-4D05-BEE4-BCF342D4B367}.Debug|x64.Build.0 = Debug|x64
|
||||||
|
{947C5311-8B78-4D05-BEE4-BCF342D4B367}.Release|Win32.ActiveCfg = Release|Win32
|
||||||
|
{947C5311-8B78-4D05-BEE4-BCF342D4B367}.Release|Win32.Build.0 = Release|Win32
|
||||||
|
{947C5311-8B78-4D05-BEE4-BCF342D4B367}.Release|x64.ActiveCfg = Release|x64
|
||||||
|
{947C5311-8B78-4D05-BEE4-BCF342D4B367}.Release|x64.Build.0 = Release|x64
|
||||||
|
{E787BC3F-2D2E-425E-A64D-4721E2FF3DC9}.Debug|Win32.ActiveCfg = Debug|Win32
|
||||||
|
{E787BC3F-2D2E-425E-A64D-4721E2FF3DC9}.Debug|Win32.Build.0 = Debug|Win32
|
||||||
|
{E787BC3F-2D2E-425E-A64D-4721E2FF3DC9}.Debug|x64.ActiveCfg = Debug|x64
|
||||||
|
{E787BC3F-2D2E-425E-A64D-4721E2FF3DC9}.Debug|x64.Build.0 = Debug|x64
|
||||||
|
{E787BC3F-2D2E-425E-A64D-4721E2FF3DC9}.Release|Win32.ActiveCfg = Release|Win32
|
||||||
|
{E787BC3F-2D2E-425E-A64D-4721E2FF3DC9}.Release|Win32.Build.0 = Release|Win32
|
||||||
|
{E787BC3F-2D2E-425E-A64D-4721E2FF3DC9}.Release|x64.ActiveCfg = Release|x64
|
||||||
|
{E787BC3F-2D2E-425E-A64D-4721E2FF3DC9}.Release|x64.Build.0 = Release|x64
|
||||||
|
{F29204CA-19DF-4F3C-87D5-03F4EEDAAFEB}.Debug|Win32.ActiveCfg = Debug|Win32
|
||||||
|
{F29204CA-19DF-4F3C-87D5-03F4EEDAAFEB}.Debug|Win32.Build.0 = Debug|Win32
|
||||||
|
{F29204CA-19DF-4F3C-87D5-03F4EEDAAFEB}.Debug|x64.ActiveCfg = Debug|x64
|
||||||
|
{F29204CA-19DF-4F3C-87D5-03F4EEDAAFEB}.Debug|x64.Build.0 = Debug|x64
|
||||||
|
{F29204CA-19DF-4F3C-87D5-03F4EEDAAFEB}.Release|Win32.ActiveCfg = Release|Win32
|
||||||
|
{F29204CA-19DF-4F3C-87D5-03F4EEDAAFEB}.Release|Win32.Build.0 = Release|Win32
|
||||||
|
{F29204CA-19DF-4F3C-87D5-03F4EEDAAFEB}.Release|x64.ActiveCfg = Release|x64
|
||||||
|
{F29204CA-19DF-4F3C-87D5-03F4EEDAAFEB}.Release|x64.Build.0 = Release|x64
|
||||||
|
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C1}.Debug|Win32.ActiveCfg = Debug|Win32
|
||||||
|
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C1}.Debug|Win32.Build.0 = Debug|Win32
|
||||||
|
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C1}.Debug|x64.ActiveCfg = Debug|x64
|
||||||
|
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C1}.Debug|x64.Build.0 = Debug|x64
|
||||||
|
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C1}.Release|Win32.ActiveCfg = Release|Win32
|
||||||
|
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C1}.Release|Win32.Build.0 = Release|Win32
|
||||||
|
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C1}.Release|x64.ActiveCfg = Release|x64
|
||||||
|
{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C1}.Release|x64.Build.0 = Release|x64
|
||||||
|
{8C7B5D29-1E76-44E6-BBB8-09830E5DEEAE}.Debug|Win32.ActiveCfg = Debug|Win32
|
||||||
|
{8C7B5D29-1E76-44E6-BBB8-09830E5DEEAE}.Debug|Win32.Build.0 = Debug|Win32
|
||||||
|
{8C7B5D29-1E76-44E6-BBB8-09830E5DEEAE}.Debug|x64.ActiveCfg = Debug|x64
|
||||||
|
{8C7B5D29-1E76-44E6-BBB8-09830E5DEEAE}.Debug|x64.Build.0 = Debug|x64
|
||||||
|
{8C7B5D29-1E76-44E6-BBB8-09830E5DEEAE}.Release|Win32.ActiveCfg = Release|Win32
|
||||||
|
{8C7B5D29-1E76-44E6-BBB8-09830E5DEEAE}.Release|Win32.Build.0 = Release|Win32
|
||||||
|
{8C7B5D29-1E76-44E6-BBB8-09830E5DEEAE}.Release|x64.ActiveCfg = Release|x64
|
||||||
|
{8C7B5D29-1E76-44E6-BBB8-09830E5DEEAE}.Release|x64.Build.0 = Release|x64
|
||||||
|
{E80DA7D4-AB22-4648-A068-327307156BE6}.Debug|Win32.ActiveCfg = Debug|Win32
|
||||||
|
{E80DA7D4-AB22-4648-A068-327307156BE6}.Debug|Win32.Build.0 = Debug|Win32
|
||||||
|
{E80DA7D4-AB22-4648-A068-327307156BE6}.Debug|x64.ActiveCfg = Debug|x64
|
||||||
|
{E80DA7D4-AB22-4648-A068-327307156BE6}.Debug|x64.Build.0 = Debug|x64
|
||||||
|
{E80DA7D4-AB22-4648-A068-327307156BE6}.Release|Win32.ActiveCfg = Release|Win32
|
||||||
|
{E80DA7D4-AB22-4648-A068-327307156BE6}.Release|Win32.Build.0 = Release|Win32
|
||||||
|
{E80DA7D4-AB22-4648-A068-327307156BE6}.Release|x64.ActiveCfg = Release|x64
|
||||||
|
{E80DA7D4-AB22-4648-A068-327307156BE6}.Release|x64.Build.0 = Release|x64
|
||||||
|
{B3B4AE3D-6D5A-4CF9-AF5B-43CF2131B958}.Debug|Win32.ActiveCfg = Debug|Win32
|
||||||
|
{B3B4AE3D-6D5A-4CF9-AF5B-43CF2131B958}.Debug|Win32.Build.0 = Debug|Win32
|
||||||
|
{B3B4AE3D-6D5A-4CF9-AF5B-43CF2131B958}.Debug|x64.ActiveCfg = Debug|x64
|
||||||
|
{B3B4AE3D-6D5A-4CF9-AF5B-43CF2131B958}.Debug|x64.Build.0 = Debug|x64
|
||||||
|
{B3B4AE3D-6D5A-4CF9-AF5B-43CF2131B958}.Release|Win32.ActiveCfg = Release|Win32
|
||||||
|
{B3B4AE3D-6D5A-4CF9-AF5B-43CF2131B958}.Release|Win32.Build.0 = Release|Win32
|
||||||
|
{B3B4AE3D-6D5A-4CF9-AF5B-43CF2131B958}.Release|x64.ActiveCfg = Release|x64
|
||||||
|
{B3B4AE3D-6D5A-4CF9-AF5B-43CF2131B958}.Release|x64.Build.0 = Release|x64
|
||||||
|
{0E0886D8-8B5E-4EAF-9A21-91E63DAF81FD}.Debug|Win32.ActiveCfg = Debug|Win32
|
||||||
|
{0E0886D8-8B5E-4EAF-9A21-91E63DAF81FD}.Debug|Win32.Build.0 = Debug|Win32
|
||||||
|
{0E0886D8-8B5E-4EAF-9A21-91E63DAF81FD}.Debug|x64.ActiveCfg = Debug|x64
|
||||||
|
{0E0886D8-8B5E-4EAF-9A21-91E63DAF81FD}.Debug|x64.Build.0 = Debug|x64
|
||||||
|
{0E0886D8-8B5E-4EAF-9A21-91E63DAF81FD}.Release|Win32.ActiveCfg = Release|Win32
|
||||||
|
{0E0886D8-8B5E-4EAF-9A21-91E63DAF81FD}.Release|Win32.Build.0 = Release|Win32
|
||||||
|
{0E0886D8-8B5E-4EAF-9A21-91E63DAF81FD}.Release|x64.ActiveCfg = Release|x64
|
||||||
|
{0E0886D8-8B5E-4EAF-9A21-91E63DAF81FD}.Release|x64.Build.0 = Release|x64
|
||||||
|
{DEE5733A-E93E-449D-9114-9BFFCAEB4DF9}.Debug|Win32.ActiveCfg = Debug|Win32
|
||||||
|
{DEE5733A-E93E-449D-9114-9BFFCAEB4DF9}.Debug|Win32.Build.0 = Debug|Win32
|
||||||
|
{DEE5733A-E93E-449D-9114-9BFFCAEB4DF9}.Debug|x64.ActiveCfg = Debug|x64
|
||||||
|
{DEE5733A-E93E-449D-9114-9BFFCAEB4DF9}.Debug|x64.Build.0 = Debug|x64
|
||||||
|
{DEE5733A-E93E-449D-9114-9BFFCAEB4DF9}.Release|Win32.ActiveCfg = Release|Win32
|
||||||
|
{DEE5733A-E93E-449D-9114-9BFFCAEB4DF9}.Release|Win32.Build.0 = Release|Win32
|
||||||
|
{DEE5733A-E93E-449D-9114-9BFFCAEB4DF9}.Release|x64.ActiveCfg = Release|x64
|
||||||
|
{DEE5733A-E93E-449D-9114-9BFFCAEB4DF9}.Release|x64.Build.0 = Release|x64
|
||||||
|
{2EF070A1-F62F-4E6A-944B-88D140945C3C}.Debug|Win32.ActiveCfg = Debug|Win32
|
||||||
|
{2EF070A1-F62F-4E6A-944B-88D140945C3C}.Debug|Win32.Build.0 = Debug|Win32
|
||||||
|
{2EF070A1-F62F-4E6A-944B-88D140945C3C}.Debug|x64.ActiveCfg = Debug|x64
|
||||||
|
{2EF070A1-F62F-4E6A-944B-88D140945C3C}.Debug|x64.Build.0 = Debug|x64
|
||||||
|
{2EF070A1-F62F-4E6A-944B-88D140945C3C}.Release|Win32.ActiveCfg = Release|Win32
|
||||||
|
{2EF070A1-F62F-4E6A-944B-88D140945C3C}.Release|Win32.Build.0 = Release|Win32
|
||||||
|
{2EF070A1-F62F-4E6A-944B-88D140945C3C}.Release|x64.ActiveCfg = Release|x64
|
||||||
|
{2EF070A1-F62F-4E6A-944B-88D140945C3C}.Release|x64.Build.0 = Release|x64
|
||||||
|
{87F53C53-957E-4E91-878A-BC27828FB9EB}.Debug|Win32.ActiveCfg = Debug|Win32
|
||||||
|
{87F53C53-957E-4E91-878A-BC27828FB9EB}.Debug|Win32.Build.0 = Debug|Win32
|
||||||
|
{87F53C53-957E-4E91-878A-BC27828FB9EB}.Debug|x64.ActiveCfg = Debug|x64
|
||||||
|
{87F53C53-957E-4E91-878A-BC27828FB9EB}.Debug|x64.Build.0 = Debug|x64
|
||||||
|
{87F53C53-957E-4E91-878A-BC27828FB9EB}.Release|Win32.ActiveCfg = Release|Win32
|
||||||
|
{87F53C53-957E-4E91-878A-BC27828FB9EB}.Release|Win32.Build.0 = Release|Win32
|
||||||
|
{87F53C53-957E-4E91-878A-BC27828FB9EB}.Release|x64.ActiveCfg = Release|x64
|
||||||
|
{87F53C53-957E-4E91-878A-BC27828FB9EB}.Release|x64.Build.0 = Release|x64
|
||||||
|
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Debug|Win32.ActiveCfg = Debug|Win32
|
||||||
|
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Debug|Win32.Build.0 = Debug|Win32
|
||||||
|
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Debug|x64.ActiveCfg = Debug|x64
|
||||||
|
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Debug|x64.Build.0 = Debug|x64
|
||||||
|
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Release|Win32.ActiveCfg = Release|Win32
|
||||||
|
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Release|Win32.Build.0 = Release|Win32
|
||||||
|
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Release|x64.ActiveCfg = Release|x64
|
||||||
|
{D923BB7E-A7C8-4850-8FCF-0EB9CE35B4E8}.Release|x64.Build.0 = Release|x64
|
||||||
|
EndGlobalSection
|
||||||
|
GlobalSection(SolutionProperties) = preSolution
|
||||||
|
HideSolutionNode = FALSE
|
||||||
|
EndGlobalSection
|
||||||
|
EndGlobal
|
||||||
9
examples_cuda/gmres/Makefile
Normal file
9
examples_cuda/gmres/Makefile
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
|
||||||
|
EXAMPLE=gmres
|
||||||
|
CPP_SRC=algorithm.cpp main.cpp matrix.cpp
|
||||||
|
CC_SRC=mmio.c
|
||||||
|
ISPC_SRC=matrix.ispc
|
||||||
|
ISPC_IA_TARGETS=sse2,sse4-x2,avx-x2
|
||||||
|
ISPC_ARM_TARGETS=neon
|
||||||
|
|
||||||
|
include ../common.mk
|
||||||
231
examples_cuda/gmres/algorithm.cpp
Normal file
231
examples_cuda/gmres/algorithm.cpp
Normal file
@@ -0,0 +1,231 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2012, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
/*===========================================================================*\
|
||||||
|
|* Includes
|
||||||
|
\*===========================================================================*/
|
||||||
|
#include "algorithm.h"
|
||||||
|
#include "stdio.h"
|
||||||
|
#include "debug.h"
|
||||||
|
|
||||||
|
|
||||||
|
/*===========================================================================*\
|
||||||
|
|* GMRES
|
||||||
|
\*===========================================================================*/
|
||||||
|
/* upper_triangular_right_solve:
|
||||||
|
* ----------------------------
|
||||||
|
* Given upper triangular matrix R and rhs vector b, solve for
|
||||||
|
* x. This "solve" ignores the rows, columns of R that are greater than the
|
||||||
|
* dimensions of x.
|
||||||
|
*/
|
||||||
|
void upper_triangular_right_solve (const DenseMatrix &R, const Vector &b, Vector &x)
|
||||||
|
{
|
||||||
|
// Dimensionality check
|
||||||
|
ASSERT(R.rows() >= b.size());
|
||||||
|
ASSERT(R.cols() >= x.size());
|
||||||
|
ASSERT(b.size() >= x.size());
|
||||||
|
|
||||||
|
int max_row = x.size() - 1;
|
||||||
|
|
||||||
|
// first solve step:
|
||||||
|
x[max_row] = b[max_row] / R(max_row, max_row);
|
||||||
|
|
||||||
|
for (int row = max_row - 1; row >= 0; row--) {
|
||||||
|
double xi = b[row];
|
||||||
|
for (int col = max_row; col > row; col--)
|
||||||
|
xi -= x[col] * R(row, col);
|
||||||
|
x[row] = xi / R(row, row);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* create_rotation (used in gmres):
|
||||||
|
* -------------------------------
|
||||||
|
* Construct a Givens rotation to zero out the lowest non-zero entry in a partially
|
||||||
|
* factored Hessenburg matrix. Note that the previous Givens rotations should be
|
||||||
|
* applied to this column before creating a new rotation.
|
||||||
|
*/
|
||||||
|
void create_rotation (const DenseMatrix &H, size_t col, Vector &Cn, Vector &Sn)
|
||||||
|
{
|
||||||
|
double a = H(col, col);
|
||||||
|
double b = H(col + 1, col);
|
||||||
|
double r;
|
||||||
|
|
||||||
|
if (b == 0) {
|
||||||
|
Cn[col] = copysign(1, a);
|
||||||
|
Sn[col] = 0;
|
||||||
|
}
|
||||||
|
else if (a == 0) {
|
||||||
|
Cn[col] = 0;
|
||||||
|
Sn[col] = copysign(1, b);
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
r = sqrt(a*a + b*b);
|
||||||
|
Sn[col] = -b / r;
|
||||||
|
Cn[col] = a / r;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Applies the 'col'th Givens rotation stored in vectors Sn and Cn to the 'col'th
|
||||||
|
* column of the DenseMatrix M. (Previous columns don't need the rotation applied b/c
|
||||||
|
* presumeably, the first col-1 columns are already upper triangular, and so their
|
||||||
|
* entries in the col and col+1 rows are 0.)
|
||||||
|
*/
|
||||||
|
void apply_rotation (DenseMatrix &H, size_t col, Vector &Cn, Vector &Sn)
|
||||||
|
{
|
||||||
|
double c = Cn[col];
|
||||||
|
double s = Sn[col];
|
||||||
|
double tmp = c * H(col, col) - s * H(col+1, col);
|
||||||
|
H(col+1, col) = s * H(col, col) + c * H(col+1, col);
|
||||||
|
H(col, col) = tmp;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Applies the 'col'th Givens rotation to the vector.
|
||||||
|
*/
|
||||||
|
void apply_rotation (Vector &v, size_t col, Vector &Cn, Vector &Sn)
|
||||||
|
{
|
||||||
|
double a = v[col];
|
||||||
|
double b = v[col + 1];
|
||||||
|
|
||||||
|
double c = Cn[col];
|
||||||
|
double s = Sn[col];
|
||||||
|
|
||||||
|
v[col] = c * a - s * b;
|
||||||
|
v[col + 1] = s * a + c * b;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Applies the first 'col' Givens rotations to the newly-created column
|
||||||
|
* of H. (Leaves other columns alone.)
|
||||||
|
*/
|
||||||
|
void update_column (DenseMatrix &H, size_t col, Vector &Cn, Vector &Sn)
|
||||||
|
{
|
||||||
|
for (int i = 0; i < col; i++) {
|
||||||
|
double c = Cn[i];
|
||||||
|
double s = Sn[i];
|
||||||
|
double t = c * H(i,col) - s * H(i+1,col);
|
||||||
|
H(i+1, col) = s * H(i,col) + c * H(i+1,col);
|
||||||
|
H(i, col) = t;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* After a new column has been added to the hessenburg matrix, factor it back into
|
||||||
|
* an upper-triangular matrix by:
|
||||||
|
* - applying the previous Givens rotations to the new column
|
||||||
|
* - computing the new Givens rotation to make the column upper triangluar
|
||||||
|
* - applying the new Givens rotation to the column, and
|
||||||
|
* - applying the new Givens rotation to the solution vector
|
||||||
|
*/
|
||||||
|
void update_qr_decomp (DenseMatrix &H, Vector &s, size_t col, Vector &Cn, Vector &Sn)
|
||||||
|
{
|
||||||
|
update_column( H, col, Cn, Sn);
|
||||||
|
create_rotation(H, col, Cn, Sn);
|
||||||
|
apply_rotation( H, col, Cn, Sn);
|
||||||
|
apply_rotation( s, col, Cn, Sn);
|
||||||
|
}
|
||||||
|
|
||||||
|
void gmres (const Matrix &A, const Vector &b, Vector &x, int num_iters, double max_err)
|
||||||
|
{
|
||||||
|
DEBUG_PRINT("gmres starting!\n");
|
||||||
|
x.zero();
|
||||||
|
|
||||||
|
ASSERT(A.rows() == A.cols());
|
||||||
|
DenseMatrix Qstar(num_iters + 1, A.rows());
|
||||||
|
DenseMatrix H(num_iters + 1, num_iters);
|
||||||
|
|
||||||
|
// arrays for storing parameters of givens rotations
|
||||||
|
Vector Sn(num_iters);
|
||||||
|
Vector Cn(num_iters);
|
||||||
|
|
||||||
|
// array for storing the rhs projected onto the hessenburg's column space
|
||||||
|
Vector G(num_iters+1);
|
||||||
|
G.zero();
|
||||||
|
|
||||||
|
double beta = b.norm();
|
||||||
|
G[0] = beta;
|
||||||
|
|
||||||
|
// temp vector, stores Aqi
|
||||||
|
Vector w(A.rows());
|
||||||
|
|
||||||
|
w.copy(b);
|
||||||
|
w.normalize();
|
||||||
|
Qstar.set_row(0, w);
|
||||||
|
|
||||||
|
int iter = 0;
|
||||||
|
Vector temp(A.rows(), false);
|
||||||
|
double rel_err;
|
||||||
|
|
||||||
|
while (iter < num_iters)
|
||||||
|
{
|
||||||
|
// w = Aqi
|
||||||
|
Qstar.row(iter, temp);
|
||||||
|
A.multiply(temp, w);
|
||||||
|
|
||||||
|
// construct ith column of H, i+1th row of Qstar:
|
||||||
|
for (int row = 0; row <= iter; row++) {
|
||||||
|
Qstar.row(row, temp);
|
||||||
|
H(row, iter) = temp.dot(w);
|
||||||
|
w.add_ax(-H(row, iter), temp);
|
||||||
|
}
|
||||||
|
|
||||||
|
H(iter+1, iter) = w.norm();
|
||||||
|
w.divide(H(iter+1, iter));
|
||||||
|
Qstar.set_row(iter+1, w);
|
||||||
|
|
||||||
|
update_qr_decomp (H, G, iter, Cn, Sn);
|
||||||
|
|
||||||
|
rel_err = fabs(G[iter+1] / beta);
|
||||||
|
|
||||||
|
if (rel_err < max_err)
|
||||||
|
break;
|
||||||
|
|
||||||
|
if (iter % 100 == 0)
|
||||||
|
DEBUG_PRINT("Iter %d: %f err\n", iter, rel_err);
|
||||||
|
|
||||||
|
iter++;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (iter == num_iters) {
|
||||||
|
fprintf(stderr, "Error: gmres failed to converge in %d iterations (relative err: %f)\n", num_iters, rel_err);
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// We've reached an acceptable solution (?):
|
||||||
|
|
||||||
|
DEBUG_PRINT("gmres completed in %d iterations (rel. resid. %f, max %f)\n", num_iters, rel_err, max_err);
|
||||||
|
Vector y(iter+1);
|
||||||
|
upper_triangular_right_solve(H, G, y);
|
||||||
|
for (int i = 0; i < iter + 1; i++) {
|
||||||
|
Qstar.row(i, temp);
|
||||||
|
x.add_ax(y[i], temp);
|
||||||
|
}
|
||||||
|
}
|
||||||
50
examples_cuda/gmres/algorithm.h
Normal file
50
examples_cuda/gmres/algorithm.h
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2012, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef __ALGORITHM_H__
|
||||||
|
#define __ALGORITHM_H__
|
||||||
|
|
||||||
|
#include "matrix.h"
|
||||||
|
|
||||||
|
|
||||||
|
/* Generalized Minimal Residual Method:
|
||||||
|
* -----------------------------------
|
||||||
|
* Takes a square matrix and an rhs and uses GMRES to find an estimate for x.
|
||||||
|
* The specified error is relative.
|
||||||
|
*/
|
||||||
|
void gmres (const Matrix &A, const Vector &b, Vector &x, int num_iters, double err);
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#endif
|
||||||
8671
examples_cuda/gmres/data/c-18/c-18.mtx
Normal file
8671
examples_cuda/gmres/data/c-18/c-18.mtx
Normal file
File diff suppressed because it is too large
Load Diff
2176
examples_cuda/gmres/data/c-18/c-18_b.mtx
Normal file
2176
examples_cuda/gmres/data/c-18/c-18_b.mtx
Normal file
File diff suppressed because it is too large
Load Diff
17847
examples_cuda/gmres/data/c-21/c-21.mtx
Normal file
17847
examples_cuda/gmres/data/c-21/c-21.mtx
Normal file
File diff suppressed because it is too large
Load Diff
3516
examples_cuda/gmres/data/c-21/c-21_b.mtx
Normal file
3516
examples_cuda/gmres/data/c-21/c-21_b.mtx
Normal file
File diff suppressed because it is too large
Load Diff
16346
examples_cuda/gmres/data/c-22/c-22.mtx
Normal file
16346
examples_cuda/gmres/data/c-22/c-22.mtx
Normal file
File diff suppressed because it is too large
Load Diff
3799
examples_cuda/gmres/data/c-22/c-22_b.mtx
Normal file
3799
examples_cuda/gmres/data/c-22/c-22_b.mtx
Normal file
File diff suppressed because it is too large
Load Diff
26730
examples_cuda/gmres/data/c-25/c-25.mtx
Normal file
26730
examples_cuda/gmres/data/c-25/c-25.mtx
Normal file
File diff suppressed because it is too large
Load Diff
3804
examples_cuda/gmres/data/c-25/c-25_b.mtx
Normal file
3804
examples_cuda/gmres/data/c-25/c-25_b.mtx
Normal file
File diff suppressed because it is too large
Load Diff
55
examples_cuda/gmres/debug.h
Normal file
55
examples_cuda/gmres/debug.h
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2012, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef __DEBUG_H__
|
||||||
|
#define __DEBUG_H__
|
||||||
|
|
||||||
|
#include <cassert>
|
||||||
|
|
||||||
|
|
||||||
|
/**************************************************************\
|
||||||
|
| Macros
|
||||||
|
\**************************************************************/
|
||||||
|
#define DEBUG
|
||||||
|
|
||||||
|
#ifdef DEBUG
|
||||||
|
#define ASSERT(expr) assert(expr)
|
||||||
|
#define DEBUG_PRINT(...) printf(__VA_ARGS__)
|
||||||
|
#else
|
||||||
|
#define ASSERT(expr)
|
||||||
|
#define DEBUG_PRINT(...)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
|
||||||
|
#endif
|
||||||
79
examples_cuda/gmres/main.cpp
Normal file
79
examples_cuda/gmres/main.cpp
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2012, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
#include "matrix.h"
|
||||||
|
#include "algorithm.h"
|
||||||
|
#include "util.h"
|
||||||
|
#include <cmath>
|
||||||
|
#include "../timing.h"
|
||||||
|
|
||||||
|
|
||||||
|
int main (int argc, char **argv)
|
||||||
|
{
|
||||||
|
if (argc < 4) {
|
||||||
|
printf("usage: %s <input-matrix> <input-rhs> <output-file>\n", argv[0]);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
double gmres_cycles;
|
||||||
|
|
||||||
|
DEBUG_PRINT("Loading A...\n");
|
||||||
|
Matrix *A = CRSMatrix::matrix_from_mtf(argv[1]);
|
||||||
|
if (A == NULL)
|
||||||
|
return -1;
|
||||||
|
DEBUG_PRINT("... size: %lu\n", A->cols());
|
||||||
|
|
||||||
|
DEBUG_PRINT("Loading b...\n");
|
||||||
|
Vector *b = Vector::vector_from_mtf(argv[2]);
|
||||||
|
if (b == NULL)
|
||||||
|
return -1;
|
||||||
|
|
||||||
|
Vector x(A->cols());
|
||||||
|
DEBUG_PRINT("Beginning gmres...\n");
|
||||||
|
gmres(*A, *b, x, A->cols() / 2, .01);
|
||||||
|
|
||||||
|
// Write result out to file
|
||||||
|
x.to_mtf(argv[argc-1]);
|
||||||
|
|
||||||
|
// Compute residual (double-check)
|
||||||
|
#ifdef DEBUG
|
||||||
|
Vector bprime(b->size());
|
||||||
|
A->multiply(x, bprime);
|
||||||
|
Vector resid(bprime.size(), &(bprime[0]));
|
||||||
|
resid.subtract(*b);
|
||||||
|
DEBUG_PRINT("residual error check: %lg\n", resid.norm() / b->norm());
|
||||||
|
#endif
|
||||||
|
// Print profiling results
|
||||||
|
DEBUG_PRINT("-- Total mcycles to solve : %.03f --\n", gmres_cycles);
|
||||||
|
}
|
||||||
246
examples_cuda/gmres/matrix.cpp
Normal file
246
examples_cuda/gmres/matrix.cpp
Normal file
@@ -0,0 +1,246 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2012, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
/**************************************************************\
|
||||||
|
| Includes
|
||||||
|
\**************************************************************/
|
||||||
|
#include "matrix.h"
|
||||||
|
#include "matrix_ispc.h"
|
||||||
|
|
||||||
|
extern "C" {
|
||||||
|
#include "mmio.h"
|
||||||
|
}
|
||||||
|
|
||||||
|
/**************************************************************\
|
||||||
|
| DenseMatrix methods
|
||||||
|
\**************************************************************/
|
||||||
|
void DenseMatrix::multiply (const Vector &v, Vector &r) const
|
||||||
|
{
|
||||||
|
// Dimensionality check
|
||||||
|
ASSERT(v.size() == cols());
|
||||||
|
ASSERT(r.size() == rows());
|
||||||
|
|
||||||
|
for (int i = 0; i < rows(); i++)
|
||||||
|
r[i] = v.dot(entries + i * num_cols);
|
||||||
|
}
|
||||||
|
|
||||||
|
const Vector *DenseMatrix::row (size_t row) const {
|
||||||
|
return new Vector(num_cols, entries + row * num_cols, true);
|
||||||
|
}
|
||||||
|
|
||||||
|
void DenseMatrix::row (size_t row, Vector &r) {
|
||||||
|
r.entries = entries + row * cols();
|
||||||
|
r._size = cols();
|
||||||
|
}
|
||||||
|
|
||||||
|
void DenseMatrix::set_row(size_t row, const Vector &v)
|
||||||
|
{
|
||||||
|
ASSERT(v.size() == num_cols);
|
||||||
|
memcpy(entries + row * num_cols, v.entries, num_cols * sizeof(double));
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/**************************************************************\
|
||||||
|
| CRSMatrix Methods
|
||||||
|
\**************************************************************/
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <vector>
|
||||||
|
#include <algorithm>
|
||||||
|
|
||||||
|
|
||||||
|
struct entry {
|
||||||
|
int row;
|
||||||
|
int col;
|
||||||
|
double val;
|
||||||
|
};
|
||||||
|
|
||||||
|
bool compare_entries(struct entry i, struct entry j) {
|
||||||
|
if (i.row < j.row)
|
||||||
|
return true;
|
||||||
|
if (i.row > j.row)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
return i.col < j.col;
|
||||||
|
}
|
||||||
|
|
||||||
|
#define ERR_OUT(...) { fprintf(stderr, __VA_ARGS__); return NULL; }
|
||||||
|
|
||||||
|
CRSMatrix *CRSMatrix::matrix_from_mtf (char *path) {
|
||||||
|
FILE *f;
|
||||||
|
MM_typecode matcode;
|
||||||
|
|
||||||
|
int m, n, nz;
|
||||||
|
|
||||||
|
if ((f = fopen(path, "r")) == NULL)
|
||||||
|
ERR_OUT("Error: %s does not name a valid/readable file.\n", path);
|
||||||
|
|
||||||
|
if (mm_read_banner(f, &matcode) != 0)
|
||||||
|
ERR_OUT("Error: Could not process Matrix Market banner.\n");
|
||||||
|
|
||||||
|
if (mm_is_complex(matcode))
|
||||||
|
ERR_OUT("Error: Application does not support complex numbers.\n")
|
||||||
|
|
||||||
|
if (mm_is_dense(matcode))
|
||||||
|
ERR_OUT("Error: supplied matrix is dense (should be sparse.)\n");
|
||||||
|
|
||||||
|
if (!mm_is_matrix(matcode))
|
||||||
|
ERR_OUT("Error: %s does not encode a matrix.\n", path)
|
||||||
|
|
||||||
|
if (mm_read_mtx_crd_size(f, &m, &n, &nz) != 0)
|
||||||
|
ERR_OUT("Error: could not read matrix size from file.\n");
|
||||||
|
|
||||||
|
if (m != n)
|
||||||
|
ERR_OUT("Error: Application does not support non-square matrices.");
|
||||||
|
|
||||||
|
std::vector<struct entry> entries;
|
||||||
|
entries.resize(nz);
|
||||||
|
|
||||||
|
for (int i = 0; i < nz; i++) {
|
||||||
|
fscanf(f, "%d %d %lg\n", &entries[i].row, &entries[i].col, &entries[i].val);
|
||||||
|
// Adjust from 1-based to 0-based
|
||||||
|
entries[i].row--;
|
||||||
|
entries[i].col--;
|
||||||
|
}
|
||||||
|
|
||||||
|
sort(entries.begin(), entries.end(), compare_entries);
|
||||||
|
|
||||||
|
CRSMatrix *M = new CRSMatrix(m, n, nz);
|
||||||
|
int cur_row = -1;
|
||||||
|
for (int i = 0; i < nz; i++) {
|
||||||
|
while (entries[i].row > cur_row)
|
||||||
|
M->row_offsets[++cur_row] = i;
|
||||||
|
M->entries[i] = entries[i].val;
|
||||||
|
M->columns[i] = entries[i].col;
|
||||||
|
}
|
||||||
|
|
||||||
|
return M;
|
||||||
|
}
|
||||||
|
|
||||||
|
Vector *Vector::vector_from_mtf (char *path) {
|
||||||
|
FILE *f;
|
||||||
|
MM_typecode matcode;
|
||||||
|
|
||||||
|
int m, n, nz;
|
||||||
|
|
||||||
|
if ((f = fopen(path, "r")) == NULL)
|
||||||
|
ERR_OUT("Error: %s does not name a valid/readable file.\n", path);
|
||||||
|
|
||||||
|
if (mm_read_banner(f, &matcode) != 0)
|
||||||
|
ERR_OUT("Error: Could not process Matrix Market banner.\n");
|
||||||
|
|
||||||
|
if (mm_is_complex(matcode))
|
||||||
|
ERR_OUT("Error: Application does not support complex numbers.\n")
|
||||||
|
|
||||||
|
if (mm_is_dense(matcode)) {
|
||||||
|
if (mm_read_mtx_array_size(f, &m, &n) != 0)
|
||||||
|
ERR_OUT("Error: could not read matrix size from file.\n");
|
||||||
|
} else {
|
||||||
|
if (mm_read_mtx_crd_size(f, &m, &n, &nz) != 0)
|
||||||
|
ERR_OUT("Error: could not read matrix size from file.\n");
|
||||||
|
}
|
||||||
|
if (n != 1)
|
||||||
|
ERR_OUT("Error: %s does not describe a vector.\n", path);
|
||||||
|
|
||||||
|
Vector *x = new Vector(m);
|
||||||
|
|
||||||
|
if (mm_is_dense(matcode)) {
|
||||||
|
double val;
|
||||||
|
for (int i = 0; i < m; i++) {
|
||||||
|
fscanf(f, "%lg\n", &val);
|
||||||
|
(*x)[i] = val;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
x->zero();
|
||||||
|
double val;
|
||||||
|
int row;
|
||||||
|
int col;
|
||||||
|
for (int i = 0; i < nz; i++) {
|
||||||
|
fscanf(f, "%d %d %lg\n", &row, &col, &val);
|
||||||
|
(*x)[row-1] = val;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return x;
|
||||||
|
}
|
||||||
|
|
||||||
|
#define ERR(...) { fprintf(stderr, __VA_ARGS__); exit(-1); }
|
||||||
|
|
||||||
|
void Vector::to_mtf (char *path) {
|
||||||
|
FILE *f;
|
||||||
|
MM_typecode matcode;
|
||||||
|
|
||||||
|
mm_initialize_typecode(&matcode);
|
||||||
|
mm_set_matrix(&matcode);
|
||||||
|
mm_set_real(&matcode);
|
||||||
|
mm_set_dense(&matcode);
|
||||||
|
mm_set_general(&matcode);
|
||||||
|
|
||||||
|
if ((f = fopen(path, "w")) == NULL)
|
||||||
|
ERR("Error: cannot open/write to %s\n", path);
|
||||||
|
|
||||||
|
mm_write_banner(f, matcode);
|
||||||
|
mm_write_mtx_array_size(f, size(), 1);
|
||||||
|
for (int i = 0; i < size(); i++)
|
||||||
|
fprintf(f, "%lg\n", entries[i]);
|
||||||
|
|
||||||
|
fclose(f);
|
||||||
|
}
|
||||||
|
|
||||||
|
void CRSMatrix::multiply (const Vector &v, Vector &r) const
|
||||||
|
{
|
||||||
|
ASSERT(v.size() == cols());
|
||||||
|
ASSERT(r.size() == rows());
|
||||||
|
|
||||||
|
for (int row = 0; row < rows(); row++)
|
||||||
|
{
|
||||||
|
int row_offset = row_offsets[row];
|
||||||
|
int next_offset = ((row + 1 == rows()) ? _nonzeroes : row_offsets[row + 1]);
|
||||||
|
|
||||||
|
double sum = 0;
|
||||||
|
for (int i = row_offset; i < next_offset; i++)
|
||||||
|
{
|
||||||
|
sum += v[columns[i]] * entries[i];
|
||||||
|
}
|
||||||
|
r[row] = sum;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void CRSMatrix::zero ( )
|
||||||
|
{
|
||||||
|
entries.clear();
|
||||||
|
row_offsets.clear();
|
||||||
|
columns.clear();
|
||||||
|
_nonzeroes = 0;
|
||||||
|
}
|
||||||
279
examples_cuda/gmres/matrix.h
Normal file
279
examples_cuda/gmres/matrix.h
Normal file
@@ -0,0 +1,279 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2012, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef __MATRIX_H__
|
||||||
|
#define __MATRIX_H__
|
||||||
|
|
||||||
|
/**************************************************************\
|
||||||
|
| Includes
|
||||||
|
\**************************************************************/
|
||||||
|
#include <cstring> // size_t
|
||||||
|
#include <cstdlib> // malloc, memcpy, etc.
|
||||||
|
#include <cmath> // sqrt
|
||||||
|
#include <vector>
|
||||||
|
|
||||||
|
#include "debug.h"
|
||||||
|
#include "matrix_ispc.h"
|
||||||
|
|
||||||
|
|
||||||
|
class DenseMatrix;
|
||||||
|
/**************************************************************\
|
||||||
|
| Vector class
|
||||||
|
\**************************************************************/
|
||||||
|
class Vector {
|
||||||
|
public:
|
||||||
|
static Vector *vector_from_mtf(char *path);
|
||||||
|
void to_mtf (char *path);
|
||||||
|
|
||||||
|
Vector(size_t size, bool alloc_mem=true)
|
||||||
|
{
|
||||||
|
shared_ptr = false;
|
||||||
|
_size = size;
|
||||||
|
|
||||||
|
if (alloc_mem)
|
||||||
|
entries = (double *) malloc(sizeof(double) * _size);
|
||||||
|
else {
|
||||||
|
shared_ptr = true;
|
||||||
|
entries = NULL;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
Vector(size_t size, double *content, bool share_ptr=false)
|
||||||
|
{
|
||||||
|
_size = size;
|
||||||
|
if (share_ptr) {
|
||||||
|
entries = content;
|
||||||
|
shared_ptr = true;
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
shared_ptr = false;
|
||||||
|
entries = (double *) malloc(sizeof(double) * _size);
|
||||||
|
memcpy(entries, content, sizeof(double) * _size);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
~Vector() { if (!shared_ptr) free(entries); }
|
||||||
|
|
||||||
|
const double & operator [] (size_t index) const
|
||||||
|
{
|
||||||
|
ASSERT(index < _size);
|
||||||
|
return *(entries + index);
|
||||||
|
}
|
||||||
|
|
||||||
|
double &operator [] (size_t index)
|
||||||
|
{
|
||||||
|
ASSERT(index < _size);
|
||||||
|
return *(entries + index);
|
||||||
|
}
|
||||||
|
|
||||||
|
bool operator == (const Vector &v) const
|
||||||
|
{
|
||||||
|
if (v.size() != _size)
|
||||||
|
return false;
|
||||||
|
|
||||||
|
for (int i = 0; i < _size; i++)
|
||||||
|
if (entries[i] != v[i])
|
||||||
|
return false;
|
||||||
|
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
size_t size() const {return _size; }
|
||||||
|
|
||||||
|
double dot (const Vector &b) const
|
||||||
|
{
|
||||||
|
ASSERT(b.size() == this->size());
|
||||||
|
return ispc::vector_dot(entries, b.entries, size());
|
||||||
|
}
|
||||||
|
|
||||||
|
double dot (const double * const b) const
|
||||||
|
{
|
||||||
|
return ispc::vector_dot(entries, b, size());
|
||||||
|
}
|
||||||
|
|
||||||
|
void zero ()
|
||||||
|
{
|
||||||
|
ispc::zero(entries, size());
|
||||||
|
}
|
||||||
|
|
||||||
|
double norm () const { return sqrtf(dot(entries)); }
|
||||||
|
|
||||||
|
void normalize () { this->divide(this->norm()); }
|
||||||
|
|
||||||
|
void add (const Vector &a)
|
||||||
|
{
|
||||||
|
ASSERT(size() == a.size());
|
||||||
|
ispc::vector_add(entries, a.entries, size());
|
||||||
|
}
|
||||||
|
|
||||||
|
void subtract (const Vector &s)
|
||||||
|
{
|
||||||
|
ASSERT(size() == s.size());
|
||||||
|
ispc::vector_sub(entries, s.entries, size());
|
||||||
|
}
|
||||||
|
|
||||||
|
void multiply (double scalar)
|
||||||
|
{
|
||||||
|
ispc::vector_mult(entries, scalar, size());
|
||||||
|
}
|
||||||
|
|
||||||
|
void divide (double scalar)
|
||||||
|
{
|
||||||
|
ispc::vector_div(entries, scalar, size());
|
||||||
|
}
|
||||||
|
|
||||||
|
// Note: x may be longer than *(this)
|
||||||
|
void add_ax (double a, const Vector &x) {
|
||||||
|
ASSERT(x.size() >= size());
|
||||||
|
ispc::vector_add_ax(entries, a, x.entries, size());
|
||||||
|
}
|
||||||
|
|
||||||
|
// Note that copy only copies the first size() elements of the
|
||||||
|
// supplied vector, i.e. the supplied vector can be longer than
|
||||||
|
// this one. This is useful in least squares calculations.
|
||||||
|
void copy (const Vector &other) {
|
||||||
|
ASSERT(other.size() >= size());
|
||||||
|
memcpy(entries, other.entries, size() * sizeof(double));
|
||||||
|
}
|
||||||
|
|
||||||
|
friend class DenseMatrix;
|
||||||
|
|
||||||
|
private:
|
||||||
|
size_t _size;
|
||||||
|
bool shared_ptr;
|
||||||
|
double *entries;
|
||||||
|
};
|
||||||
|
|
||||||
|
|
||||||
|
/**************************************************************\
|
||||||
|
| Matrix base class
|
||||||
|
\**************************************************************/
|
||||||
|
class Matrix {
|
||||||
|
friend class Vector;
|
||||||
|
|
||||||
|
public:
|
||||||
|
Matrix(size_t size_r, size_t size_c)
|
||||||
|
{
|
||||||
|
num_rows = size_r;
|
||||||
|
num_cols = size_c;
|
||||||
|
}
|
||||||
|
~Matrix(){}
|
||||||
|
|
||||||
|
size_t rows() const { return num_rows; }
|
||||||
|
size_t cols() const { return num_cols; }
|
||||||
|
|
||||||
|
virtual void multiply (const Vector &v, Vector &r) const = 0;
|
||||||
|
virtual void zero () = 0;
|
||||||
|
|
||||||
|
protected:
|
||||||
|
size_t num_rows;
|
||||||
|
size_t num_cols;
|
||||||
|
};
|
||||||
|
|
||||||
|
/**************************************************************\
|
||||||
|
| DenseMatrix class
|
||||||
|
\**************************************************************/
|
||||||
|
class DenseMatrix : public Matrix {
|
||||||
|
friend class Vector;
|
||||||
|
|
||||||
|
public:
|
||||||
|
DenseMatrix(size_t size_r, size_t size_c) : Matrix(size_r, size_c)
|
||||||
|
{
|
||||||
|
entries = (double *) malloc(size_r * size_c * sizeof(double));
|
||||||
|
}
|
||||||
|
|
||||||
|
DenseMatrix(size_t size_r, size_t size_c, const double *content) : Matrix (size_r, size_c)
|
||||||
|
{
|
||||||
|
entries = (double *) malloc(size_r * size_c * sizeof(double));
|
||||||
|
memcpy(entries, content, size_r * size_c * sizeof(double));
|
||||||
|
}
|
||||||
|
|
||||||
|
virtual void multiply (const Vector &v, Vector &r) const;
|
||||||
|
|
||||||
|
double &operator () (unsigned int r, unsigned int c)
|
||||||
|
{
|
||||||
|
return *(entries + r * num_cols + c);
|
||||||
|
}
|
||||||
|
|
||||||
|
const double &operator () (unsigned int r, unsigned int c) const
|
||||||
|
{
|
||||||
|
return *(entries + r * num_cols + c);
|
||||||
|
}
|
||||||
|
|
||||||
|
const Vector *row(size_t row) const;
|
||||||
|
void row(size_t row, Vector &r);
|
||||||
|
void set_row(size_t row, const Vector &v);
|
||||||
|
|
||||||
|
virtual void zero() { ispc::zero(entries, rows() * cols()); }
|
||||||
|
|
||||||
|
void copy (const DenseMatrix &other)
|
||||||
|
{
|
||||||
|
ASSERT(rows() == other.rows());
|
||||||
|
ASSERT(cols() == other.cols());
|
||||||
|
memcpy(entries, other.entries, rows() * cols() * sizeof(double));
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
double *entries;
|
||||||
|
bool shared_ptr;
|
||||||
|
};
|
||||||
|
|
||||||
|
/**************************************************************\
|
||||||
|
| CSRMatrix (compressed row storage, a sparse matrix format)
|
||||||
|
\**************************************************************/
|
||||||
|
class CRSMatrix : public Matrix {
|
||||||
|
public:
|
||||||
|
CRSMatrix (size_t size_r, size_t size_c, size_t nonzeroes) :
|
||||||
|
Matrix(size_r, size_c)
|
||||||
|
{
|
||||||
|
_nonzeroes = nonzeroes;
|
||||||
|
entries.resize(nonzeroes);
|
||||||
|
columns.resize(nonzeroes);
|
||||||
|
row_offsets.resize(size_r);
|
||||||
|
}
|
||||||
|
|
||||||
|
virtual void multiply(const Vector &v, Vector &r) const;
|
||||||
|
|
||||||
|
virtual void zero();
|
||||||
|
|
||||||
|
static CRSMatrix *matrix_from_mtf (char *path);
|
||||||
|
|
||||||
|
private:
|
||||||
|
unsigned int _nonzeroes;
|
||||||
|
std::vector<double> entries;
|
||||||
|
std::vector<int> row_offsets;
|
||||||
|
std::vector<int> columns;
|
||||||
|
};
|
||||||
|
|
||||||
|
#endif
|
||||||
122
examples_cuda/gmres/matrix.ispc
Normal file
122
examples_cuda/gmres/matrix.ispc
Normal file
@@ -0,0 +1,122 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2012, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
/**************************************************************\
|
||||||
|
| General
|
||||||
|
\**************************************************************/
|
||||||
|
export void zero (uniform double data[],
|
||||||
|
uniform int size)
|
||||||
|
{
|
||||||
|
foreach (i = 0 ... size)
|
||||||
|
data[i] = 0.0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/**************************************************************\
|
||||||
|
| Vector helpers
|
||||||
|
\**************************************************************/
|
||||||
|
export void vector_add (uniform double a[],
|
||||||
|
const uniform double b[],
|
||||||
|
const uniform int size)
|
||||||
|
{
|
||||||
|
foreach (i = 0 ... size)
|
||||||
|
a[i] += b[i];
|
||||||
|
}
|
||||||
|
|
||||||
|
export void vector_sub (uniform double a[],
|
||||||
|
const uniform double b[],
|
||||||
|
const uniform int size)
|
||||||
|
{
|
||||||
|
foreach (i = 0 ... size)
|
||||||
|
a[i] -= b[i];
|
||||||
|
}
|
||||||
|
|
||||||
|
export void vector_mult (uniform double a[],
|
||||||
|
const uniform double b,
|
||||||
|
const uniform int size)
|
||||||
|
{
|
||||||
|
foreach (i = 0 ... size)
|
||||||
|
a[i] *= b;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void vector_div (uniform double a[],
|
||||||
|
const uniform double b,
|
||||||
|
const uniform int size)
|
||||||
|
{
|
||||||
|
foreach (i = 0 ... size)
|
||||||
|
a[i] /= b;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void vector_add_ax (uniform double r[],
|
||||||
|
const uniform double a,
|
||||||
|
const uniform double x[],
|
||||||
|
const uniform int size)
|
||||||
|
{
|
||||||
|
foreach (i = 0 ... size)
|
||||||
|
r[i] += a * x[i];
|
||||||
|
}
|
||||||
|
|
||||||
|
export uniform double vector_dot (const uniform double a[],
|
||||||
|
const uniform double b[],
|
||||||
|
const uniform int size)
|
||||||
|
{
|
||||||
|
varying double sum = 0.0;
|
||||||
|
foreach (i = 0 ... size)
|
||||||
|
sum += a[i] * b[i];
|
||||||
|
return reduce_add(sum);
|
||||||
|
}
|
||||||
|
|
||||||
|
/**************************************************************\
|
||||||
|
| Matrix helpers
|
||||||
|
\**************************************************************/
|
||||||
|
export void sparse_multiply (const uniform double entries[],
|
||||||
|
const uniform double columns[],
|
||||||
|
const uniform double row_offsets[],
|
||||||
|
const uniform int rows,
|
||||||
|
const uniform int cols,
|
||||||
|
const uniform int nonzeroes,
|
||||||
|
const uniform double v[],
|
||||||
|
uniform double r[])
|
||||||
|
{
|
||||||
|
foreach (row = 0 ... rows) {
|
||||||
|
int row_offset = row_offsets[row];
|
||||||
|
int next_offset = ((row + 1 == rows) ? nonzeroes : row_offsets[row+1]);
|
||||||
|
|
||||||
|
double sum = 0;
|
||||||
|
for (int j = row_offset; j < next_offset; j++)
|
||||||
|
sum += v[columns[j]] * entries[j];
|
||||||
|
r[row] = sum;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
511
examples_cuda/gmres/mmio.c
Normal file
511
examples_cuda/gmres/mmio.c
Normal file
@@ -0,0 +1,511 @@
|
|||||||
|
/*
|
||||||
|
* Matrix Market I/O library for ANSI C
|
||||||
|
*
|
||||||
|
* See http://math.nist.gov/MatrixMarket for details.
|
||||||
|
*
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <ctype.h>
|
||||||
|
|
||||||
|
#include "mmio.h"
|
||||||
|
|
||||||
|
int mm_read_unsymmetric_sparse(const char *fname, int *M_, int *N_, int *nz_,
|
||||||
|
double **val_, int **I_, int **J_)
|
||||||
|
{
|
||||||
|
FILE *f;
|
||||||
|
MM_typecode matcode;
|
||||||
|
int M, N, nz;
|
||||||
|
int i;
|
||||||
|
double *val;
|
||||||
|
int *I, *J;
|
||||||
|
|
||||||
|
if ((f = fopen(fname, "r")) == NULL)
|
||||||
|
return -1;
|
||||||
|
|
||||||
|
|
||||||
|
if (mm_read_banner(f, &matcode) != 0)
|
||||||
|
{
|
||||||
|
printf("mm_read_unsymetric: Could not process Matrix Market banner ");
|
||||||
|
printf(" in file [%s]\n", fname);
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
if ( !(mm_is_real(matcode) && mm_is_matrix(matcode) &&
|
||||||
|
mm_is_sparse(matcode)))
|
||||||
|
{
|
||||||
|
fprintf(stderr, "Sorry, this application does not support ");
|
||||||
|
fprintf(stderr, "Market Market type: [%s]\n",
|
||||||
|
mm_typecode_to_str(matcode));
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* find out size of sparse matrix: M, N, nz .... */
|
||||||
|
|
||||||
|
if (mm_read_mtx_crd_size(f, &M, &N, &nz) !=0)
|
||||||
|
{
|
||||||
|
fprintf(stderr, "read_unsymmetric_sparse(): could not parse matrix size.\n");
|
||||||
|
return -1;
|
||||||
|
}
|
||||||
|
|
||||||
|
*M_ = M;
|
||||||
|
*N_ = N;
|
||||||
|
*nz_ = nz;
|
||||||
|
|
||||||
|
/* reseve memory for matrices */
|
||||||
|
|
||||||
|
I = (int *) malloc(nz * sizeof(int));
|
||||||
|
J = (int *) malloc(nz * sizeof(int));
|
||||||
|
val = (double *) malloc(nz * sizeof(double));
|
||||||
|
|
||||||
|
*val_ = val;
|
||||||
|
*I_ = I;
|
||||||
|
*J_ = J;
|
||||||
|
|
||||||
|
/* NOTE: when reading in doubles, ANSI C requires the use of the "l" */
|
||||||
|
/* specifier as in "%lg", "%lf", "%le", otherwise errors will occur */
|
||||||
|
/* (ANSI C X3.159-1989, Sec. 4.9.6.2, p. 136 lines 13-15) */
|
||||||
|
|
||||||
|
for (i=0; i<nz; i++)
|
||||||
|
{
|
||||||
|
fscanf(f, "%d %d %lg\n", &I[i], &J[i], &val[i]);
|
||||||
|
I[i]--; /* adjust from 1-based to 0-based */
|
||||||
|
J[i]--;
|
||||||
|
}
|
||||||
|
fclose(f);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
int mm_is_valid(MM_typecode matcode)
|
||||||
|
{
|
||||||
|
if (!mm_is_matrix(matcode)) return 0;
|
||||||
|
if (mm_is_dense(matcode) && mm_is_pattern(matcode)) return 0;
|
||||||
|
if (mm_is_real(matcode) && mm_is_hermitian(matcode)) return 0;
|
||||||
|
if (mm_is_pattern(matcode) && (mm_is_hermitian(matcode) ||
|
||||||
|
mm_is_skew(matcode))) return 0;
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
int mm_read_banner(FILE *f, MM_typecode *matcode)
|
||||||
|
{
|
||||||
|
char line[MM_MAX_LINE_LENGTH];
|
||||||
|
char banner[MM_MAX_TOKEN_LENGTH];
|
||||||
|
char mtx[MM_MAX_TOKEN_LENGTH];
|
||||||
|
char crd[MM_MAX_TOKEN_LENGTH];
|
||||||
|
char data_type[MM_MAX_TOKEN_LENGTH];
|
||||||
|
char storage_scheme[MM_MAX_TOKEN_LENGTH];
|
||||||
|
char *p;
|
||||||
|
|
||||||
|
|
||||||
|
mm_clear_typecode(matcode);
|
||||||
|
|
||||||
|
if (fgets(line, MM_MAX_LINE_LENGTH, f) == NULL)
|
||||||
|
return MM_PREMATURE_EOF;
|
||||||
|
|
||||||
|
if (sscanf(line, "%s %s %s %s %s", banner, mtx, crd, data_type,
|
||||||
|
storage_scheme) != 5)
|
||||||
|
return MM_PREMATURE_EOF;
|
||||||
|
|
||||||
|
for (p=mtx; *p!='\0'; *p=tolower(*p),p++); /* convert to lower case */
|
||||||
|
for (p=crd; *p!='\0'; *p=tolower(*p),p++);
|
||||||
|
for (p=data_type; *p!='\0'; *p=tolower(*p),p++);
|
||||||
|
for (p=storage_scheme; *p!='\0'; *p=tolower(*p),p++);
|
||||||
|
|
||||||
|
/* check for banner */
|
||||||
|
if (strncmp(banner, MatrixMarketBanner, strlen(MatrixMarketBanner)) != 0)
|
||||||
|
return MM_NO_HEADER;
|
||||||
|
|
||||||
|
/* first field should be "mtx" */
|
||||||
|
if (strcmp(mtx, MM_MTX_STR) != 0)
|
||||||
|
return MM_UNSUPPORTED_TYPE;
|
||||||
|
mm_set_matrix(matcode);
|
||||||
|
|
||||||
|
|
||||||
|
/* second field describes whether this is a sparse matrix (in coordinate
|
||||||
|
storgae) or a dense array */
|
||||||
|
|
||||||
|
|
||||||
|
if (strcmp(crd, MM_SPARSE_STR) == 0)
|
||||||
|
mm_set_sparse(matcode);
|
||||||
|
else
|
||||||
|
if (strcmp(crd, MM_DENSE_STR) == 0)
|
||||||
|
mm_set_dense(matcode);
|
||||||
|
else
|
||||||
|
return MM_UNSUPPORTED_TYPE;
|
||||||
|
|
||||||
|
|
||||||
|
/* third field */
|
||||||
|
|
||||||
|
if (strcmp(data_type, MM_REAL_STR) == 0)
|
||||||
|
mm_set_real(matcode);
|
||||||
|
else
|
||||||
|
if (strcmp(data_type, MM_COMPLEX_STR) == 0)
|
||||||
|
mm_set_complex(matcode);
|
||||||
|
else
|
||||||
|
if (strcmp(data_type, MM_PATTERN_STR) == 0)
|
||||||
|
mm_set_pattern(matcode);
|
||||||
|
else
|
||||||
|
if (strcmp(data_type, MM_INT_STR) == 0)
|
||||||
|
mm_set_integer(matcode);
|
||||||
|
else
|
||||||
|
return MM_UNSUPPORTED_TYPE;
|
||||||
|
|
||||||
|
|
||||||
|
/* fourth field */
|
||||||
|
|
||||||
|
if (strcmp(storage_scheme, MM_GENERAL_STR) == 0)
|
||||||
|
mm_set_general(matcode);
|
||||||
|
else
|
||||||
|
if (strcmp(storage_scheme, MM_SYMM_STR) == 0)
|
||||||
|
mm_set_symmetric(matcode);
|
||||||
|
else
|
||||||
|
if (strcmp(storage_scheme, MM_HERM_STR) == 0)
|
||||||
|
mm_set_hermitian(matcode);
|
||||||
|
else
|
||||||
|
if (strcmp(storage_scheme, MM_SKEW_STR) == 0)
|
||||||
|
mm_set_skew(matcode);
|
||||||
|
else
|
||||||
|
return MM_UNSUPPORTED_TYPE;
|
||||||
|
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
int mm_write_mtx_crd_size(FILE *f, int M, int N, int nz)
|
||||||
|
{
|
||||||
|
if (fprintf(f, "%d %d %d\n", M, N, nz) != 3)
|
||||||
|
return MM_COULD_NOT_WRITE_FILE;
|
||||||
|
else
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
int mm_read_mtx_crd_size(FILE *f, int *M, int *N, int *nz )
|
||||||
|
{
|
||||||
|
char line[MM_MAX_LINE_LENGTH];
|
||||||
|
int num_items_read;
|
||||||
|
|
||||||
|
/* set return null parameter values, in case we exit with errors */
|
||||||
|
*M = *N = *nz = 0;
|
||||||
|
|
||||||
|
/* now continue scanning until you reach the end-of-comments */
|
||||||
|
do
|
||||||
|
{
|
||||||
|
if (fgets(line,MM_MAX_LINE_LENGTH,f) == NULL)
|
||||||
|
return MM_PREMATURE_EOF;
|
||||||
|
}while (line[0] == '%');
|
||||||
|
|
||||||
|
/* line[] is either blank or has M,N, nz */
|
||||||
|
if (sscanf(line, "%d %d %d", M, N, nz) == 3)
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
else
|
||||||
|
do
|
||||||
|
{
|
||||||
|
num_items_read = fscanf(f, "%d %d %d", M, N, nz);
|
||||||
|
if (num_items_read == EOF) return MM_PREMATURE_EOF;
|
||||||
|
}
|
||||||
|
while (num_items_read != 3);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
int mm_read_mtx_array_size(FILE *f, int *M, int *N)
|
||||||
|
{
|
||||||
|
char line[MM_MAX_LINE_LENGTH];
|
||||||
|
int num_items_read;
|
||||||
|
/* set return null parameter values, in case we exit with errors */
|
||||||
|
*M = *N = 0;
|
||||||
|
|
||||||
|
/* now continue scanning until you reach the end-of-comments */
|
||||||
|
do
|
||||||
|
{
|
||||||
|
if (fgets(line,MM_MAX_LINE_LENGTH,f) == NULL)
|
||||||
|
return MM_PREMATURE_EOF;
|
||||||
|
}while (line[0] == '%');
|
||||||
|
|
||||||
|
/* line[] is either blank or has M,N, nz */
|
||||||
|
if (sscanf(line, "%d %d", M, N) == 2)
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
else /* we have a blank line */
|
||||||
|
do
|
||||||
|
{
|
||||||
|
num_items_read = fscanf(f, "%d %d", M, N);
|
||||||
|
if (num_items_read == EOF) return MM_PREMATURE_EOF;
|
||||||
|
}
|
||||||
|
while (num_items_read != 2);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
int mm_write_mtx_array_size(FILE *f, int M, int N)
|
||||||
|
{
|
||||||
|
if (fprintf(f, "%d %d\n", M, N) != 2)
|
||||||
|
return MM_COULD_NOT_WRITE_FILE;
|
||||||
|
else
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
/*-------------------------------------------------------------------------*/
|
||||||
|
|
||||||
|
/******************************************************************/
|
||||||
|
/* use when I[], J[], and val[]J, and val[] are already allocated */
|
||||||
|
/******************************************************************/
|
||||||
|
|
||||||
|
int mm_read_mtx_crd_data(FILE *f, int M, int N, int nz, int I[], int J[],
|
||||||
|
double val[], MM_typecode matcode)
|
||||||
|
{
|
||||||
|
int i;
|
||||||
|
if (mm_is_complex(matcode))
|
||||||
|
{
|
||||||
|
for (i=0; i<nz; i++)
|
||||||
|
if (fscanf(f, "%d %d %lg %lg", &I[i], &J[i], &val[2*i], &val[2*i+1])
|
||||||
|
!= 4) return MM_PREMATURE_EOF;
|
||||||
|
}
|
||||||
|
else if (mm_is_real(matcode))
|
||||||
|
{
|
||||||
|
for (i=0; i<nz; i++)
|
||||||
|
{
|
||||||
|
if (fscanf(f, "%d %d %lg\n", &I[i], &J[i], &val[i])
|
||||||
|
!= 3) return MM_PREMATURE_EOF;
|
||||||
|
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
else if (mm_is_pattern(matcode))
|
||||||
|
{
|
||||||
|
for (i=0; i<nz; i++)
|
||||||
|
if (fscanf(f, "%d %d", &I[i], &J[i])
|
||||||
|
!= 2) return MM_PREMATURE_EOF;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
return MM_UNSUPPORTED_TYPE;
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
int mm_read_mtx_crd_entry(FILE *f, int *I, int *J,
|
||||||
|
double *real, double *imag, MM_typecode matcode)
|
||||||
|
{
|
||||||
|
if (mm_is_complex(matcode))
|
||||||
|
{
|
||||||
|
if (fscanf(f, "%d %d %lg %lg", I, J, real, imag)
|
||||||
|
!= 4) return MM_PREMATURE_EOF;
|
||||||
|
}
|
||||||
|
else if (mm_is_real(matcode))
|
||||||
|
{
|
||||||
|
if (fscanf(f, "%d %d %lg\n", I, J, real)
|
||||||
|
!= 3) return MM_PREMATURE_EOF;
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
else if (mm_is_pattern(matcode))
|
||||||
|
{
|
||||||
|
if (fscanf(f, "%d %d", I, J) != 2) return MM_PREMATURE_EOF;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
return MM_UNSUPPORTED_TYPE;
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/************************************************************************
|
||||||
|
mm_read_mtx_crd() fills M, N, nz, array of values, and return
|
||||||
|
type code, e.g. 'MCRS'
|
||||||
|
|
||||||
|
if matrix is complex, values[] is of size 2*nz,
|
||||||
|
(nz pairs of real/imaginary values)
|
||||||
|
************************************************************************/
|
||||||
|
|
||||||
|
int mm_read_mtx_crd(char *fname, int *M, int *N, int *nz, int **I, int **J,
|
||||||
|
double **val, MM_typecode *matcode)
|
||||||
|
{
|
||||||
|
int ret_code;
|
||||||
|
FILE *f;
|
||||||
|
|
||||||
|
if (strcmp(fname, "stdin") == 0) f=stdin;
|
||||||
|
else
|
||||||
|
if ((f = fopen(fname, "r")) == NULL)
|
||||||
|
return MM_COULD_NOT_READ_FILE;
|
||||||
|
|
||||||
|
|
||||||
|
if ((ret_code = mm_read_banner(f, matcode)) != 0)
|
||||||
|
return ret_code;
|
||||||
|
|
||||||
|
if (!(mm_is_valid(*matcode) && mm_is_sparse(*matcode) &&
|
||||||
|
mm_is_matrix(*matcode)))
|
||||||
|
return MM_UNSUPPORTED_TYPE;
|
||||||
|
|
||||||
|
if ((ret_code = mm_read_mtx_crd_size(f, M, N, nz)) != 0)
|
||||||
|
return ret_code;
|
||||||
|
|
||||||
|
|
||||||
|
*I = (int *) malloc(*nz * sizeof(int));
|
||||||
|
*J = (int *) malloc(*nz * sizeof(int));
|
||||||
|
*val = NULL;
|
||||||
|
|
||||||
|
if (mm_is_complex(*matcode))
|
||||||
|
{
|
||||||
|
*val = (double *) malloc(*nz * 2 * sizeof(double));
|
||||||
|
ret_code = mm_read_mtx_crd_data(f, *M, *N, *nz, *I, *J, *val,
|
||||||
|
*matcode);
|
||||||
|
if (ret_code != 0) return ret_code;
|
||||||
|
}
|
||||||
|
else if (mm_is_real(*matcode))
|
||||||
|
{
|
||||||
|
*val = (double *) malloc(*nz * sizeof(double));
|
||||||
|
ret_code = mm_read_mtx_crd_data(f, *M, *N, *nz, *I, *J, *val,
|
||||||
|
*matcode);
|
||||||
|
if (ret_code != 0) return ret_code;
|
||||||
|
}
|
||||||
|
|
||||||
|
else if (mm_is_pattern(*matcode))
|
||||||
|
{
|
||||||
|
ret_code = mm_read_mtx_crd_data(f, *M, *N, *nz, *I, *J, *val,
|
||||||
|
*matcode);
|
||||||
|
if (ret_code != 0) return ret_code;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (f != stdin) fclose(f);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
int mm_write_banner(FILE *f, MM_typecode matcode)
|
||||||
|
{
|
||||||
|
char *str = mm_typecode_to_str(matcode);
|
||||||
|
int ret_code;
|
||||||
|
|
||||||
|
ret_code = fprintf(f, "%s %s\n", MatrixMarketBanner, str);
|
||||||
|
free(str);
|
||||||
|
if (ret_code !=2 )
|
||||||
|
return MM_COULD_NOT_WRITE_FILE;
|
||||||
|
else
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
int mm_write_mtx_crd(char fname[], int M, int N, int nz, int I[], int J[],
|
||||||
|
double val[], MM_typecode matcode)
|
||||||
|
{
|
||||||
|
FILE *f;
|
||||||
|
int i;
|
||||||
|
|
||||||
|
if (strcmp(fname, "stdout") == 0)
|
||||||
|
f = stdout;
|
||||||
|
else
|
||||||
|
if ((f = fopen(fname, "w")) == NULL)
|
||||||
|
return MM_COULD_NOT_WRITE_FILE;
|
||||||
|
|
||||||
|
/* print banner followed by typecode */
|
||||||
|
fprintf(f, "%s ", MatrixMarketBanner);
|
||||||
|
fprintf(f, "%s\n", mm_typecode_to_str(matcode));
|
||||||
|
|
||||||
|
/* print matrix sizes and nonzeros */
|
||||||
|
fprintf(f, "%d %d %d\n", M, N, nz);
|
||||||
|
|
||||||
|
/* print values */
|
||||||
|
if (mm_is_pattern(matcode))
|
||||||
|
for (i=0; i<nz; i++)
|
||||||
|
fprintf(f, "%d %d\n", I[i], J[i]);
|
||||||
|
else
|
||||||
|
if (mm_is_real(matcode))
|
||||||
|
for (i=0; i<nz; i++)
|
||||||
|
fprintf(f, "%d %d %20.16g\n", I[i], J[i], val[i]);
|
||||||
|
else
|
||||||
|
if (mm_is_complex(matcode))
|
||||||
|
for (i=0; i<nz; i++)
|
||||||
|
fprintf(f, "%d %d %20.16g %20.16g\n", I[i], J[i], val[2*i],
|
||||||
|
val[2*i+1]);
|
||||||
|
else
|
||||||
|
{
|
||||||
|
if (f != stdout) fclose(f);
|
||||||
|
return MM_UNSUPPORTED_TYPE;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (f !=stdout) fclose(f);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Create a new copy of a string s. mm_strdup() is a common routine, but
|
||||||
|
* not part of ANSI C, so it is included here. Used by mm_typecode_to_str().
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
char *mm_strdup(const char *s)
|
||||||
|
{
|
||||||
|
int len = strlen(s);
|
||||||
|
char *s2 = (char *) malloc((len+1)*sizeof(char));
|
||||||
|
return strcpy(s2, s);
|
||||||
|
}
|
||||||
|
|
||||||
|
char *mm_typecode_to_str(MM_typecode matcode)
|
||||||
|
{
|
||||||
|
char buffer[MM_MAX_LINE_LENGTH];
|
||||||
|
char *types[4];
|
||||||
|
char *mm_strdup(const char *);
|
||||||
|
int error =0;
|
||||||
|
|
||||||
|
/* check for MTX type */
|
||||||
|
if (mm_is_matrix(matcode))
|
||||||
|
types[0] = MM_MTX_STR;
|
||||||
|
else
|
||||||
|
error=1;
|
||||||
|
|
||||||
|
/* check for CRD or ARR matrix */
|
||||||
|
if (mm_is_sparse(matcode))
|
||||||
|
types[1] = MM_SPARSE_STR;
|
||||||
|
else
|
||||||
|
if (mm_is_dense(matcode))
|
||||||
|
types[1] = MM_DENSE_STR;
|
||||||
|
else
|
||||||
|
return NULL;
|
||||||
|
|
||||||
|
/* check for element data type */
|
||||||
|
if (mm_is_real(matcode))
|
||||||
|
types[2] = MM_REAL_STR;
|
||||||
|
else
|
||||||
|
if (mm_is_complex(matcode))
|
||||||
|
types[2] = MM_COMPLEX_STR;
|
||||||
|
else
|
||||||
|
if (mm_is_pattern(matcode))
|
||||||
|
types[2] = MM_PATTERN_STR;
|
||||||
|
else
|
||||||
|
if (mm_is_integer(matcode))
|
||||||
|
types[2] = MM_INT_STR;
|
||||||
|
else
|
||||||
|
return NULL;
|
||||||
|
|
||||||
|
|
||||||
|
/* check for symmetry type */
|
||||||
|
if (mm_is_general(matcode))
|
||||||
|
types[3] = MM_GENERAL_STR;
|
||||||
|
else
|
||||||
|
if (mm_is_symmetric(matcode))
|
||||||
|
types[3] = MM_SYMM_STR;
|
||||||
|
else
|
||||||
|
if (mm_is_hermitian(matcode))
|
||||||
|
types[3] = MM_HERM_STR;
|
||||||
|
else
|
||||||
|
if (mm_is_skew(matcode))
|
||||||
|
types[3] = MM_SKEW_STR;
|
||||||
|
else
|
||||||
|
return NULL;
|
||||||
|
|
||||||
|
sprintf(buffer,"%s %s %s %s", types[0], types[1], types[2], types[3]);
|
||||||
|
return mm_strdup(buffer);
|
||||||
|
|
||||||
|
}
|
||||||
135
examples_cuda/gmres/mmio.h
Normal file
135
examples_cuda/gmres/mmio.h
Normal file
@@ -0,0 +1,135 @@
|
|||||||
|
/*
|
||||||
|
* Matrix Market I/O library for ANSI C
|
||||||
|
*
|
||||||
|
* See http://math.nist.gov/MatrixMarket for details.
|
||||||
|
*
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef MM_IO_H
|
||||||
|
#define MM_IO_H
|
||||||
|
|
||||||
|
#define MM_MAX_LINE_LENGTH 1025
|
||||||
|
#define MatrixMarketBanner "%%MatrixMarket"
|
||||||
|
#define MM_MAX_TOKEN_LENGTH 64
|
||||||
|
|
||||||
|
typedef char MM_typecode[4];
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
|
||||||
|
char *mm_typecode_to_str(MM_typecode matcode);
|
||||||
|
|
||||||
|
int mm_read_banner(FILE *f, MM_typecode *matcode);
|
||||||
|
int mm_read_mtx_crd_size(FILE *f, int *M, int *N, int *nz);
|
||||||
|
int mm_read_mtx_array_size(FILE *f, int *M, int *N);
|
||||||
|
|
||||||
|
int mm_write_banner(FILE *f, MM_typecode matcode);
|
||||||
|
int mm_write_mtx_crd_size(FILE *f, int M, int N, int nz);
|
||||||
|
int mm_write_mtx_array_size(FILE *f, int M, int N);
|
||||||
|
|
||||||
|
|
||||||
|
/********************* MM_typecode query fucntions ***************************/
|
||||||
|
|
||||||
|
#define mm_is_matrix(typecode) ((typecode)[0]=='M')
|
||||||
|
|
||||||
|
#define mm_is_sparse(typecode) ((typecode)[1]=='C')
|
||||||
|
#define mm_is_coordinate(typecode)((typecode)[1]=='C')
|
||||||
|
#define mm_is_dense(typecode) ((typecode)[1]=='A')
|
||||||
|
#define mm_is_array(typecode) ((typecode)[1]=='A')
|
||||||
|
|
||||||
|
#define mm_is_complex(typecode) ((typecode)[2]=='C')
|
||||||
|
#define mm_is_real(typecode) ((typecode)[2]=='R')
|
||||||
|
#define mm_is_pattern(typecode) ((typecode)[2]=='P')
|
||||||
|
#define mm_is_integer(typecode) ((typecode)[2]=='I')
|
||||||
|
|
||||||
|
#define mm_is_symmetric(typecode)((typecode)[3]=='S')
|
||||||
|
#define mm_is_general(typecode) ((typecode)[3]=='G')
|
||||||
|
#define mm_is_skew(typecode) ((typecode)[3]=='K')
|
||||||
|
#define mm_is_hermitian(typecode)((typecode)[3]=='H')
|
||||||
|
|
||||||
|
int mm_is_valid(MM_typecode matcode); /* too complex for a macro */
|
||||||
|
|
||||||
|
|
||||||
|
/********************* MM_typecode modify fucntions ***************************/
|
||||||
|
|
||||||
|
#define mm_set_matrix(typecode) ((*typecode)[0]='M')
|
||||||
|
#define mm_set_coordinate(typecode) ((*typecode)[1]='C')
|
||||||
|
#define mm_set_array(typecode) ((*typecode)[1]='A')
|
||||||
|
#define mm_set_dense(typecode) mm_set_array(typecode)
|
||||||
|
#define mm_set_sparse(typecode) mm_set_coordinate(typecode)
|
||||||
|
|
||||||
|
#define mm_set_complex(typecode)((*typecode)[2]='C')
|
||||||
|
#define mm_set_real(typecode) ((*typecode)[2]='R')
|
||||||
|
#define mm_set_pattern(typecode)((*typecode)[2]='P')
|
||||||
|
#define mm_set_integer(typecode)((*typecode)[2]='I')
|
||||||
|
|
||||||
|
|
||||||
|
#define mm_set_symmetric(typecode)((*typecode)[3]='S')
|
||||||
|
#define mm_set_general(typecode)((*typecode)[3]='G')
|
||||||
|
#define mm_set_skew(typecode) ((*typecode)[3]='K')
|
||||||
|
#define mm_set_hermitian(typecode)((*typecode)[3]='H')
|
||||||
|
|
||||||
|
#define mm_clear_typecode(typecode) ((*typecode)[0]=(*typecode)[1]= \
|
||||||
|
(*typecode)[2]=' ',(*typecode)[3]='G')
|
||||||
|
|
||||||
|
#define mm_initialize_typecode(typecode) mm_clear_typecode(typecode)
|
||||||
|
|
||||||
|
|
||||||
|
/********************* Matrix Market error codes ***************************/
|
||||||
|
|
||||||
|
|
||||||
|
#define MM_COULD_NOT_READ_FILE 11
|
||||||
|
#define MM_PREMATURE_EOF 12
|
||||||
|
#define MM_NOT_MTX 13
|
||||||
|
#define MM_NO_HEADER 14
|
||||||
|
#define MM_UNSUPPORTED_TYPE 15
|
||||||
|
#define MM_LINE_TOO_LONG 16
|
||||||
|
#define MM_COULD_NOT_WRITE_FILE 17
|
||||||
|
|
||||||
|
|
||||||
|
/******************** Matrix Market internal definitions ********************
|
||||||
|
|
||||||
|
MM_matrix_typecode: 4-character sequence
|
||||||
|
|
||||||
|
ojbect sparse/ data storage
|
||||||
|
dense type scheme
|
||||||
|
|
||||||
|
string position: [0] [1] [2] [3]
|
||||||
|
|
||||||
|
Matrix typecode: M(atrix) C(oord) R(eal) G(eneral)
|
||||||
|
A(array) C(omplex) H(ermitian)
|
||||||
|
P(attern) S(ymmetric)
|
||||||
|
I(nteger) K(kew)
|
||||||
|
|
||||||
|
***********************************************************************/
|
||||||
|
|
||||||
|
#define MM_MTX_STR "matrix"
|
||||||
|
#define MM_ARRAY_STR "array"
|
||||||
|
#define MM_DENSE_STR "array"
|
||||||
|
#define MM_COORDINATE_STR "coordinate"
|
||||||
|
#define MM_SPARSE_STR "coordinate"
|
||||||
|
#define MM_COMPLEX_STR "complex"
|
||||||
|
#define MM_REAL_STR "real"
|
||||||
|
#define MM_INT_STR "integer"
|
||||||
|
#define MM_GENERAL_STR "general"
|
||||||
|
#define MM_SYMM_STR "symmetric"
|
||||||
|
#define MM_HERM_STR "hermitian"
|
||||||
|
#define MM_SKEW_STR "skew-symmetric"
|
||||||
|
#define MM_PATTERN_STR "pattern"
|
||||||
|
|
||||||
|
|
||||||
|
/* high level routines */
|
||||||
|
|
||||||
|
int mm_write_mtx_crd(char fname[], int M, int N, int nz, int I[], int J[],
|
||||||
|
double val[], MM_typecode matcode);
|
||||||
|
int mm_read_mtx_crd_data(FILE *f, int M, int N, int nz, int I[], int J[],
|
||||||
|
double val[], MM_typecode matcode);
|
||||||
|
int mm_read_mtx_crd_entry(FILE *f, int *I, int *J, double *real, double *img,
|
||||||
|
MM_typecode matcode);
|
||||||
|
|
||||||
|
int mm_read_unsymmetric_sparse(const char *fname, int *M_, int *N_, int *nz_,
|
||||||
|
double **val_, int **I_, int **J_);
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#endif
|
||||||
53
examples_cuda/gmres/util.h
Normal file
53
examples_cuda/gmres/util.h
Normal file
@@ -0,0 +1,53 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2012, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
#ifndef __UTIL_H__
|
||||||
|
#define __UTIL_H__
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include "matrix.h"
|
||||||
|
|
||||||
|
|
||||||
|
inline void printMatrix (DenseMatrix &M, const char *name) {
|
||||||
|
printf("Matrix %s:\n", name);
|
||||||
|
for (int row = 0; row < M.rows(); row++) {
|
||||||
|
printf("row %2d: ", row + 1);
|
||||||
|
for (int col = 0; col < M.cols(); col++)
|
||||||
|
printf("%6f ", M(row, col));
|
||||||
|
printf("\n");
|
||||||
|
}
|
||||||
|
printf("\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif
|
||||||
1798
examples_cuda/intrinsics/generic-16.h
Normal file
1798
examples_cuda/intrinsics/generic-16.h
Normal file
File diff suppressed because it is too large
Load Diff
1849
examples_cuda/intrinsics/generic-32.h
Normal file
1849
examples_cuda/intrinsics/generic-32.h
Normal file
File diff suppressed because it is too large
Load Diff
1982
examples_cuda/intrinsics/generic-64.h
Normal file
1982
examples_cuda/intrinsics/generic-64.h
Normal file
File diff suppressed because it is too large
Load Diff
2762
examples_cuda/intrinsics/knc-i1x16.h
Normal file
2762
examples_cuda/intrinsics/knc-i1x16.h
Normal file
File diff suppressed because it is too large
Load Diff
2818
examples_cuda/intrinsics/knc-i1x8.h
Normal file
2818
examples_cuda/intrinsics/knc-i1x8.h
Normal file
File diff suppressed because it is too large
Load Diff
86
examples_cuda/intrinsics/knc-i1x8unsafe_fast.h
Normal file
86
examples_cuda/intrinsics/knc-i1x8unsafe_fast.h
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
#define __ZMM64BIT__
|
||||||
|
#include "knc-i1x8.h"
|
||||||
|
|
||||||
|
/* the following tests fails because on KNC native vec8_i32 and vec8_float are 512 and not 256 bit in size.
|
||||||
|
*
|
||||||
|
* Using test compiler: Intel(r) SPMD Program Compiler (ispc), 1.4.5dev (build commit d68dbbc7bce74803 @ 20130919, LLVM 3.3)
|
||||||
|
* Using C/C++ compiler: icpc (ICC) 14.0.0 20130728
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
|
||||||
|
/* knc-i1x8unsafe_fast.h fails:
|
||||||
|
* ----------------------------
|
||||||
|
1 / 1206 tests FAILED compilation:
|
||||||
|
./tests/ptr-assign-lhs-math-1.ispc
|
||||||
|
33 / 1206 tests FAILED execution:
|
||||||
|
./tests/array-gather-simple.ispc
|
||||||
|
./tests/array-gather-vary.ispc
|
||||||
|
./tests/array-multidim-gather-scatter.ispc
|
||||||
|
./tests/array-scatter-vary.ispc
|
||||||
|
./tests/atomics-5.ispc
|
||||||
|
./tests/atomics-swap.ispc
|
||||||
|
./tests/cfor-array-gather-vary.ispc
|
||||||
|
./tests/cfor-gs-improve-varying-1.ispc
|
||||||
|
./tests/cfor-struct-gather-2.ispc
|
||||||
|
./tests/cfor-struct-gather-3.ispc
|
||||||
|
./tests/cfor-struct-gather.ispc
|
||||||
|
./tests/gather-struct-vector.ispc
|
||||||
|
./tests/global-array-4.ispc
|
||||||
|
./tests/gs-improve-varying-1.ispc
|
||||||
|
./tests/half-1.ispc
|
||||||
|
./tests/half-3.ispc
|
||||||
|
./tests/half.ispc
|
||||||
|
./tests/launch-3.ispc
|
||||||
|
./tests/launch-4.ispc
|
||||||
|
./tests/masked-scatter-vector.ispc
|
||||||
|
./tests/masked-struct-scatter-varying.ispc
|
||||||
|
./tests/new-delete-6.ispc
|
||||||
|
./tests/ptr-24.ispc
|
||||||
|
./tests/ptr-25.ispc
|
||||||
|
./tests/short-vec-15.ispc
|
||||||
|
./tests/struct-gather-2.ispc
|
||||||
|
./tests/struct-gather-3.ispc
|
||||||
|
./tests/struct-gather.ispc
|
||||||
|
./tests/struct-ref-lvalue.ispc
|
||||||
|
./tests/struct-test-118.ispc
|
||||||
|
./tests/struct-vary-index-expr.ispc
|
||||||
|
./tests/typedef-2.ispc
|
||||||
|
./tests/vector-varying-scatter.ispc
|
||||||
|
*/
|
||||||
|
|
||||||
|
/* knc-i1x8.h fails:
|
||||||
|
* ----------------------------
|
||||||
|
1 / 1206 tests FAILED compilation:
|
||||||
|
./tests/ptr-assign-lhs-math-1.ispc
|
||||||
|
3 / 1206 tests FAILED execution:
|
||||||
|
./tests/half-1.ispc
|
||||||
|
./tests/half-3.ispc
|
||||||
|
./tests/half.ispc
|
||||||
|
*/
|
||||||
|
|
||||||
|
/* knc-i1x8.h fails:
|
||||||
|
* ----------------------------
|
||||||
|
1 / 1206 tests FAILED compilation:
|
||||||
|
./tests/ptr-assign-lhs-math-1.ispc
|
||||||
|
4 / 1206 tests FAILED execution:
|
||||||
|
./tests/half-1.ispc
|
||||||
|
./tests/half-3.ispc
|
||||||
|
./tests/half.ispc
|
||||||
|
./tests/test-141.ispc
|
||||||
|
*/
|
||||||
|
|
||||||
|
/* generic-16.h fails: (from these knc-i1x8.h & knc-i1x16.h are derived
|
||||||
|
* ----------------------------
|
||||||
|
1 / 1206 tests FAILED compilation:
|
||||||
|
./tests/ptr-assign-lhs-math-1.ispc
|
||||||
|
6 / 1206 tests FAILED execution:
|
||||||
|
./tests/func-overload-max.ispc
|
||||||
|
./tests/half-1.ispc
|
||||||
|
./tests/half-3.ispc
|
||||||
|
./tests/half.ispc
|
||||||
|
./tests/test-141.ispc
|
||||||
|
./tests/test-143.ispc
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
2144
examples_cuda/intrinsics/knc.h
Normal file
2144
examples_cuda/intrinsics/knc.h
Normal file
File diff suppressed because it is too large
Load Diff
2078
examples_cuda/intrinsics/knc2x.h
Normal file
2078
examples_cuda/intrinsics/knc2x.h
Normal file
File diff suppressed because it is too large
Load Diff
4114
examples_cuda/intrinsics/sse4.h
Normal file
4114
examples_cuda/intrinsics/sse4.h
Normal file
File diff suppressed because it is too large
Load Diff
3
examples_cuda/mandelbrot/.gitignore
vendored
Normal file
3
examples_cuda/mandelbrot/.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
mandelbrot
|
||||||
|
*.ppm
|
||||||
|
objs
|
||||||
8
examples_cuda/mandelbrot/Makefile
Normal file
8
examples_cuda/mandelbrot/Makefile
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
|
||||||
|
EXAMPLE=mandelbrot
|
||||||
|
CPP_SRC=mandelbrot.cpp mandelbrot_serial.cpp
|
||||||
|
ISPC_SRC=mandelbrot.ispc
|
||||||
|
ISPC_IA_TARGETS=sse2,sse4-x2,avx-x2
|
||||||
|
ISPC_ARM_TARGETS=neon
|
||||||
|
|
||||||
|
include ../common.mk
|
||||||
BIN
examples_cuda/mandelbrot/avx.out
Executable file
BIN
examples_cuda/mandelbrot/avx.out
Executable file
Binary file not shown.
BIN
examples_cuda/mandelbrot/avx1.out
Executable file
BIN
examples_cuda/mandelbrot/avx1.out
Executable file
Binary file not shown.
118
examples_cuda/mandelbrot/mandelbrot.cpp
Normal file
118
examples_cuda/mandelbrot/mandelbrot.cpp
Normal file
@@ -0,0 +1,118 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2010-2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef _MSC_VER
|
||||||
|
#define _CRT_SECURE_NO_WARNINGS
|
||||||
|
#define NOMINMAX
|
||||||
|
#pragma warning (disable: 4244)
|
||||||
|
#pragma warning (disable: 4305)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <algorithm>
|
||||||
|
#include "../timing.h"
|
||||||
|
#include "mandelbrot_ispc.h"
|
||||||
|
using namespace ispc;
|
||||||
|
|
||||||
|
extern void mandelbrot_serial(float x0, float y0, float x1, float y1,
|
||||||
|
int width, int height, int maxIterations,
|
||||||
|
int output[]);
|
||||||
|
|
||||||
|
/* Write a PPM image file with the image of the Mandelbrot set */
|
||||||
|
static void
|
||||||
|
writePPM(int *buf, int width, int height, const char *fn) {
|
||||||
|
FILE *fp = fopen(fn, "wb");
|
||||||
|
fprintf(fp, "P6\n");
|
||||||
|
fprintf(fp, "%d %d\n", width, height);
|
||||||
|
fprintf(fp, "255\n");
|
||||||
|
for (int i = 0; i < width*height; ++i) {
|
||||||
|
// Map the iteration count to colors by just alternating between
|
||||||
|
// two greys.
|
||||||
|
char c = (buf[i] & 0x1) ? 240 : 20;
|
||||||
|
for (int j = 0; j < 3; ++j)
|
||||||
|
fputc(c, fp);
|
||||||
|
}
|
||||||
|
fclose(fp);
|
||||||
|
printf("Wrote image file %s\n", fn);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
int main() {
|
||||||
|
unsigned int width = 768;
|
||||||
|
unsigned int height = 512;
|
||||||
|
float x0 = -2;
|
||||||
|
float x1 = 1;
|
||||||
|
float y0 = -1;
|
||||||
|
float y1 = 1;
|
||||||
|
|
||||||
|
int maxIterations = 256;
|
||||||
|
int *buf = new int[width*height];
|
||||||
|
|
||||||
|
//
|
||||||
|
// Compute the image using the ispc implementation; report the minimum
|
||||||
|
// time of three runs.
|
||||||
|
//
|
||||||
|
double minISPC = 1e30;
|
||||||
|
for (int i = 0; i < 3; ++i) {
|
||||||
|
reset_and_start_timer();
|
||||||
|
mandelbrot_ispc(x0, y0, x1, y1, width, height, maxIterations, buf);
|
||||||
|
double dt = get_elapsed_mcycles();
|
||||||
|
minISPC = std::min(minISPC, dt);
|
||||||
|
}
|
||||||
|
|
||||||
|
printf("[mandelbrot ispc]:\t\t[%.3f] million cycles\n", minISPC);
|
||||||
|
writePPM(buf, width, height, "mandelbrot-ispc.ppm");
|
||||||
|
|
||||||
|
// Clear out the buffer
|
||||||
|
for (unsigned int i = 0; i < width * height; ++i)
|
||||||
|
buf[i] = 0;
|
||||||
|
|
||||||
|
//
|
||||||
|
// And run the serial implementation 3 times, again reporting the
|
||||||
|
// minimum time.
|
||||||
|
//
|
||||||
|
double minSerial = 1e30;
|
||||||
|
for (int i = 0; i < 3; ++i) {
|
||||||
|
reset_and_start_timer();
|
||||||
|
mandelbrot_serial(x0, y0, x1, y1, width, height, maxIterations, buf);
|
||||||
|
double dt = get_elapsed_mcycles();
|
||||||
|
minSerial = std::min(minSerial, dt);
|
||||||
|
}
|
||||||
|
|
||||||
|
printf("[mandelbrot serial]:\t\t[%.3f] million cycles\n", minSerial);
|
||||||
|
writePPM(buf, width, height, "mandelbrot-serial.ppm");
|
||||||
|
|
||||||
|
printf("\t\t\t\t(%.2fx speedup from ISPC)\n", minSerial/minISPC);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
78
examples_cuda/mandelbrot/mandelbrot.ispc
Normal file
78
examples_cuda/mandelbrot/mandelbrot.ispc
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2010-2012, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
static inline int mandel(float c_re, float c_im, int count) {
|
||||||
|
float z_re = c_re, z_im = c_im;
|
||||||
|
int i;
|
||||||
|
for (i = 0; i < count; ++i) {
|
||||||
|
if (z_re * z_re + z_im * z_im > 4.)
|
||||||
|
break;
|
||||||
|
|
||||||
|
float new_re = z_re*z_re - z_im*z_im;
|
||||||
|
float new_im = 2.f * z_re * z_im;
|
||||||
|
unmasked {
|
||||||
|
z_re = c_re + new_re;
|
||||||
|
z_im = c_im + new_im;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return i;
|
||||||
|
}
|
||||||
|
|
||||||
|
export void mandelbrot_ispc(uniform float x0, uniform float y0,
|
||||||
|
uniform float x1, uniform float y1,
|
||||||
|
uniform int width, uniform int height,
|
||||||
|
uniform int maxIterations,
|
||||||
|
uniform int output[])
|
||||||
|
{
|
||||||
|
float dx = (x1 - x0) / width;
|
||||||
|
float dy = (y1 - y0) / height;
|
||||||
|
|
||||||
|
for (uniform int j = 0; j < height; j++) {
|
||||||
|
// Note that we'll be doing programCount computations in parallel,
|
||||||
|
// so increment i by that much. This assumes that width evenly
|
||||||
|
// divides programCount.
|
||||||
|
foreach (i = 0 ... width) {
|
||||||
|
// Figure out the position on the complex plane to compute the
|
||||||
|
// number of iterations at. Note that the x values are
|
||||||
|
// different across different program instances, since its
|
||||||
|
// initializer incorporates the value of the programIndex
|
||||||
|
// variable.
|
||||||
|
float x = x0 + i * dx;
|
||||||
|
float y = y0 + j * dy;
|
||||||
|
|
||||||
|
int index = j * width + i;
|
||||||
|
output[index] = mandel(x, y, maxIterations);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
175
examples_cuda/mandelbrot/mandelbrot.vcxproj
Normal file
175
examples_cuda/mandelbrot/mandelbrot.vcxproj
Normal file
@@ -0,0 +1,175 @@
|
|||||||
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
|
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
|
||||||
|
<ItemGroup Label="ProjectConfigurations">
|
||||||
|
<ProjectConfiguration Include="Debug|Win32">
|
||||||
|
<Configuration>Debug</Configuration>
|
||||||
|
<Platform>Win32</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Debug|x64">
|
||||||
|
<Configuration>Debug</Configuration>
|
||||||
|
<Platform>x64</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Release|Win32">
|
||||||
|
<Configuration>Release</Configuration>
|
||||||
|
<Platform>Win32</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Release|x64">
|
||||||
|
<Configuration>Release</Configuration>
|
||||||
|
<Platform>x64</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
</ItemGroup>
|
||||||
|
<PropertyGroup Label="Globals">
|
||||||
|
<ProjectGuid>{6D3EF8C5-AE26-407B-9ECE-C27CB988D9C1}</ProjectGuid>
|
||||||
|
<Keyword>Win32Proj</Keyword>
|
||||||
|
<RootNamespace>mandelbrot</RootNamespace>
|
||||||
|
</PropertyGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>true</UseDebugLibraries>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>true</UseDebugLibraries>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>false</UseDebugLibraries>
|
||||||
|
<WholeProgramOptimization>true</WholeProgramOptimization>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>false</UseDebugLibraries>
|
||||||
|
<WholeProgramOptimization>true</WholeProgramOptimization>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
|
||||||
|
<ImportGroup Label="ExtensionSettings">
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="PropertySheets">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="PropertySheets">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<PropertyGroup Label="UserMacros" />
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<LinkIncremental>true</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
|
||||||
|
<LinkIncremental>true</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<LinkIncremental>false</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
|
||||||
|
<LinkIncremental>false</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
</PropertyGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<ClCompile>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<Optimization>Disabled</Optimization>
|
||||||
|
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
|
||||||
|
<ClCompile>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<Optimization>Disabled</Optimization>
|
||||||
|
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<ClCompile>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<Optimization>MaxSpeed</Optimization>
|
||||||
|
<FunctionLevelLinking>true</FunctionLevelLinking>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
<EnableCOMDATFolding>true</EnableCOMDATFolding>
|
||||||
|
<OptimizeReferences>true</OptimizeReferences>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
|
||||||
|
<ClCompile>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<Optimization>MaxSpeed</Optimization>
|
||||||
|
<FunctionLevelLinking>true</FunctionLevelLinking>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
<EnableCOMDATFolding>true</EnableCOMDATFolding>
|
||||||
|
<OptimizeReferences>true</OptimizeReferences>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemGroup>
|
||||||
|
<ClCompile Include="mandelbrot.cpp" />
|
||||||
|
<ClCompile Include="mandelbrot_serial.cpp" />
|
||||||
|
</ItemGroup>
|
||||||
|
<ItemGroup>
|
||||||
|
<CustomBuild Include="mandelbrot.ispc">
|
||||||
|
<FileType>Document</FileType>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --arch=x86 --target=sse2,sse4-x2,avx-x2
|
||||||
|
</Command>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --target=sse2,sse4-x2,avx-x2
|
||||||
|
</Command>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --arch=x86 --target=sse2,sse4-x2,avx-x2
|
||||||
|
</Command>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Release|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --target=sse2,sse4-x2,avx-x2
|
||||||
|
</Command>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|x64'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
</CustomBuild>
|
||||||
|
</ItemGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
|
||||||
|
<ImportGroup Label="ExtensionTargets">
|
||||||
|
</ImportGroup>
|
||||||
|
</Project>
|
||||||
68
examples_cuda/mandelbrot/mandelbrot_serial.cpp
Normal file
68
examples_cuda/mandelbrot/mandelbrot_serial.cpp
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2010-2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
static int mandel(float c_re, float c_im, int count) {
|
||||||
|
float z_re = c_re, z_im = c_im;
|
||||||
|
int i;
|
||||||
|
for (i = 0; i < count; ++i) {
|
||||||
|
if (z_re * z_re + z_im * z_im > 4.f)
|
||||||
|
break;
|
||||||
|
|
||||||
|
float new_re = z_re*z_re - z_im*z_im;
|
||||||
|
float new_im = 2.f * z_re * z_im;
|
||||||
|
z_re = c_re + new_re;
|
||||||
|
z_im = c_im + new_im;
|
||||||
|
}
|
||||||
|
|
||||||
|
return i;
|
||||||
|
}
|
||||||
|
|
||||||
|
void mandelbrot_serial(float x0, float y0, float x1, float y1,
|
||||||
|
int width, int height, int maxIterations,
|
||||||
|
int output[])
|
||||||
|
{
|
||||||
|
float dx = (x1 - x0) / width;
|
||||||
|
float dy = (y1 - y0) / height;
|
||||||
|
|
||||||
|
for (int j = 0; j < height; j++) {
|
||||||
|
for (int i = 0; i < width; ++i) {
|
||||||
|
float x = x0 + i * dx;
|
||||||
|
float y = y0 + j * dy;
|
||||||
|
|
||||||
|
int index = (j * width + i);
|
||||||
|
output[index] = mandel(x, y, maxIterations);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
BIN
examples_cuda/mandelbrot/out.o
Normal file
BIN
examples_cuda/mandelbrot/out.o
Normal file
Binary file not shown.
843
examples_cuda/mandelbrot/out.ptx
Normal file
843
examples_cuda/mandelbrot/out.ptx
Normal file
@@ -0,0 +1,843 @@
|
|||||||
|
//
|
||||||
|
// Generated by LLVM NVPTX Back-End
|
||||||
|
//
|
||||||
|
|
||||||
|
.version 3.1
|
||||||
|
.target sm_35, texmode_independent
|
||||||
|
.address_size 64
|
||||||
|
|
||||||
|
// .globl __vselect_i8
|
||||||
|
// @__vselect_i8
|
||||||
|
.func (.param .align 1 .b8 func_retval0[1]) __vselect_i8(
|
||||||
|
.param .align 1 .b8 __vselect_i8_param_0[1],
|
||||||
|
.param .align 1 .b8 __vselect_i8_param_1[1],
|
||||||
|
.param .align 4 .b8 __vselect_i8_param_2[4]
|
||||||
|
)
|
||||||
|
{
|
||||||
|
.reg .pred %p<396>;
|
||||||
|
.reg .s16 %rc<396>;
|
||||||
|
.reg .s16 %rs<396>;
|
||||||
|
.reg .s32 %r<396>;
|
||||||
|
.reg .s64 %rl<396>;
|
||||||
|
.reg .f32 %f<396>;
|
||||||
|
.reg .f64 %fl<396>;
|
||||||
|
|
||||||
|
// BB#0:
|
||||||
|
ld.param.u32 %r0, [__vselect_i8_param_2];
|
||||||
|
setp.eq.s32 %p0, %r0, 0;
|
||||||
|
ld.param.u8 %rc0, [__vselect_i8_param_0];
|
||||||
|
ld.param.u8 %rc1, [__vselect_i8_param_1];
|
||||||
|
selp.b16 %rc0, %rc0, %rc1, %p0;
|
||||||
|
st.param.b8 [func_retval0+0], %rc0;
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
// .globl __vselect_i16
|
||||||
|
.func (.param .align 2 .b8 func_retval0[2]) __vselect_i16(
|
||||||
|
.param .align 2 .b8 __vselect_i16_param_0[2],
|
||||||
|
.param .align 2 .b8 __vselect_i16_param_1[2],
|
||||||
|
.param .align 4 .b8 __vselect_i16_param_2[4]
|
||||||
|
) // @__vselect_i16
|
||||||
|
{
|
||||||
|
.reg .pred %p<396>;
|
||||||
|
.reg .s16 %rc<396>;
|
||||||
|
.reg .s16 %rs<396>;
|
||||||
|
.reg .s32 %r<396>;
|
||||||
|
.reg .s64 %rl<396>;
|
||||||
|
.reg .f32 %f<396>;
|
||||||
|
.reg .f64 %fl<396>;
|
||||||
|
|
||||||
|
// BB#0:
|
||||||
|
ld.param.u32 %r0, [__vselect_i16_param_2];
|
||||||
|
setp.eq.s32 %p0, %r0, 0;
|
||||||
|
ld.param.u16 %rs0, [__vselect_i16_param_0];
|
||||||
|
ld.param.u16 %rs1, [__vselect_i16_param_1];
|
||||||
|
selp.b16 %rs0, %rs0, %rs1, %p0;
|
||||||
|
st.param.b16 [func_retval0+0], %rs0;
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
// .globl __vselect_i64
|
||||||
|
.func (.param .align 8 .b8 func_retval0[8]) __vselect_i64(
|
||||||
|
.param .align 8 .b8 __vselect_i64_param_0[8],
|
||||||
|
.param .align 8 .b8 __vselect_i64_param_1[8],
|
||||||
|
.param .align 4 .b8 __vselect_i64_param_2[4]
|
||||||
|
) // @__vselect_i64
|
||||||
|
{
|
||||||
|
.reg .pred %p<396>;
|
||||||
|
.reg .s16 %rc<396>;
|
||||||
|
.reg .s16 %rs<396>;
|
||||||
|
.reg .s32 %r<396>;
|
||||||
|
.reg .s64 %rl<396>;
|
||||||
|
.reg .f32 %f<396>;
|
||||||
|
.reg .f64 %fl<396>;
|
||||||
|
|
||||||
|
// BB#0:
|
||||||
|
ld.param.u32 %r0, [__vselect_i64_param_2];
|
||||||
|
setp.eq.s32 %p0, %r0, 0;
|
||||||
|
ld.param.u64 %rl0, [__vselect_i64_param_0];
|
||||||
|
ld.param.u64 %rl1, [__vselect_i64_param_1];
|
||||||
|
selp.b64 %rl0, %rl0, %rl1, %p0;
|
||||||
|
st.param.b64 [func_retval0+0], %rl0;
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
// .globl __aos_to_soa4_float1
|
||||||
|
.func __aos_to_soa4_float1(
|
||||||
|
.param .align 4 .b8 __aos_to_soa4_float1_param_0[4],
|
||||||
|
.param .align 4 .b8 __aos_to_soa4_float1_param_1[4],
|
||||||
|
.param .align 4 .b8 __aos_to_soa4_float1_param_2[4],
|
||||||
|
.param .align 4 .b8 __aos_to_soa4_float1_param_3[4],
|
||||||
|
.param .b64 __aos_to_soa4_float1_param_4,
|
||||||
|
.param .b64 __aos_to_soa4_float1_param_5,
|
||||||
|
.param .b64 __aos_to_soa4_float1_param_6,
|
||||||
|
.param .b64 __aos_to_soa4_float1_param_7
|
||||||
|
) // @__aos_to_soa4_float1
|
||||||
|
{
|
||||||
|
.reg .pred %p<396>;
|
||||||
|
.reg .s16 %rc<396>;
|
||||||
|
.reg .s16 %rs<396>;
|
||||||
|
.reg .s32 %r<396>;
|
||||||
|
.reg .s64 %rl<396>;
|
||||||
|
.reg .f32 %f<396>;
|
||||||
|
.reg .f64 %fl<396>;
|
||||||
|
|
||||||
|
// BB#0:
|
||||||
|
ld.param.u64 %rl0, [__aos_to_soa4_float1_param_4];
|
||||||
|
ld.param.u64 %rl1, [__aos_to_soa4_float1_param_5];
|
||||||
|
ld.param.u64 %rl2, [__aos_to_soa4_float1_param_6];
|
||||||
|
ld.param.u64 %rl3, [__aos_to_soa4_float1_param_7];
|
||||||
|
ld.param.f32 %f0, [__aos_to_soa4_float1_param_0];
|
||||||
|
ld.param.f32 %f1, [__aos_to_soa4_float1_param_1];
|
||||||
|
ld.param.f32 %f2, [__aos_to_soa4_float1_param_2];
|
||||||
|
ld.param.f32 %f3, [__aos_to_soa4_float1_param_3];
|
||||||
|
st.f32 [%rl0], %f0;
|
||||||
|
st.f32 [%rl1], %f1;
|
||||||
|
st.f32 [%rl2], %f2;
|
||||||
|
st.f32 [%rl3], %f3;
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
// .globl __soa_to_aos4_float1
|
||||||
|
.func __soa_to_aos4_float1(
|
||||||
|
.param .align 4 .b8 __soa_to_aos4_float1_param_0[4],
|
||||||
|
.param .align 4 .b8 __soa_to_aos4_float1_param_1[4],
|
||||||
|
.param .align 4 .b8 __soa_to_aos4_float1_param_2[4],
|
||||||
|
.param .align 4 .b8 __soa_to_aos4_float1_param_3[4],
|
||||||
|
.param .b64 __soa_to_aos4_float1_param_4,
|
||||||
|
.param .b64 __soa_to_aos4_float1_param_5,
|
||||||
|
.param .b64 __soa_to_aos4_float1_param_6,
|
||||||
|
.param .b64 __soa_to_aos4_float1_param_7
|
||||||
|
) // @__soa_to_aos4_float1
|
||||||
|
{
|
||||||
|
.reg .pred %p<396>;
|
||||||
|
.reg .s16 %rc<396>;
|
||||||
|
.reg .s16 %rs<396>;
|
||||||
|
.reg .s32 %r<396>;
|
||||||
|
.reg .s64 %rl<396>;
|
||||||
|
.reg .f32 %f<396>;
|
||||||
|
.reg .f64 %fl<396>;
|
||||||
|
|
||||||
|
// BB#0:
|
||||||
|
ld.param.u64 %rl0, [__soa_to_aos4_float1_param_4];
|
||||||
|
ld.param.u64 %rl1, [__soa_to_aos4_float1_param_5];
|
||||||
|
ld.param.u64 %rl2, [__soa_to_aos4_float1_param_6];
|
||||||
|
ld.param.u64 %rl3, [__soa_to_aos4_float1_param_7];
|
||||||
|
ld.param.f32 %f0, [__soa_to_aos4_float1_param_0];
|
||||||
|
ld.param.f32 %f1, [__soa_to_aos4_float1_param_1];
|
||||||
|
ld.param.f32 %f2, [__soa_to_aos4_float1_param_2];
|
||||||
|
ld.param.f32 %f3, [__soa_to_aos4_float1_param_3];
|
||||||
|
st.f32 [%rl0], %f0;
|
||||||
|
st.f32 [%rl1], %f1;
|
||||||
|
st.f32 [%rl2], %f2;
|
||||||
|
st.f32 [%rl3], %f3;
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
// .globl __aos_to_soa3_float1
|
||||||
|
.func __aos_to_soa3_float1(
|
||||||
|
.param .align 4 .b8 __aos_to_soa3_float1_param_0[4],
|
||||||
|
.param .align 4 .b8 __aos_to_soa3_float1_param_1[4],
|
||||||
|
.param .align 4 .b8 __aos_to_soa3_float1_param_2[4],
|
||||||
|
.param .b64 __aos_to_soa3_float1_param_3,
|
||||||
|
.param .b64 __aos_to_soa3_float1_param_4,
|
||||||
|
.param .b64 __aos_to_soa3_float1_param_5
|
||||||
|
) // @__aos_to_soa3_float1
|
||||||
|
{
|
||||||
|
.reg .pred %p<396>;
|
||||||
|
.reg .s16 %rc<396>;
|
||||||
|
.reg .s16 %rs<396>;
|
||||||
|
.reg .s32 %r<396>;
|
||||||
|
.reg .s64 %rl<396>;
|
||||||
|
.reg .f32 %f<396>;
|
||||||
|
.reg .f64 %fl<396>;
|
||||||
|
|
||||||
|
// BB#0:
|
||||||
|
ld.param.u64 %rl0, [__aos_to_soa3_float1_param_3];
|
||||||
|
ld.param.u64 %rl1, [__aos_to_soa3_float1_param_4];
|
||||||
|
ld.param.u64 %rl2, [__aos_to_soa3_float1_param_5];
|
||||||
|
ld.param.f32 %f0, [__aos_to_soa3_float1_param_0];
|
||||||
|
ld.param.f32 %f1, [__aos_to_soa3_float1_param_1];
|
||||||
|
ld.param.f32 %f2, [__aos_to_soa3_float1_param_2];
|
||||||
|
st.f32 [%rl0], %f0;
|
||||||
|
st.f32 [%rl1], %f1;
|
||||||
|
st.f32 [%rl2], %f2;
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
// .globl __soa_to_aos3_float1
|
||||||
|
.func __soa_to_aos3_float1(
|
||||||
|
.param .align 4 .b8 __soa_to_aos3_float1_param_0[4],
|
||||||
|
.param .align 4 .b8 __soa_to_aos3_float1_param_1[4],
|
||||||
|
.param .align 4 .b8 __soa_to_aos3_float1_param_2[4],
|
||||||
|
.param .b64 __soa_to_aos3_float1_param_3,
|
||||||
|
.param .b64 __soa_to_aos3_float1_param_4,
|
||||||
|
.param .b64 __soa_to_aos3_float1_param_5
|
||||||
|
) // @__soa_to_aos3_float1
|
||||||
|
{
|
||||||
|
.reg .pred %p<396>;
|
||||||
|
.reg .s16 %rc<396>;
|
||||||
|
.reg .s16 %rs<396>;
|
||||||
|
.reg .s32 %r<396>;
|
||||||
|
.reg .s64 %rl<396>;
|
||||||
|
.reg .f32 %f<396>;
|
||||||
|
.reg .f64 %fl<396>;
|
||||||
|
|
||||||
|
// BB#0:
|
||||||
|
ld.param.u64 %rl0, [__soa_to_aos3_float1_param_3];
|
||||||
|
ld.param.u64 %rl1, [__soa_to_aos3_float1_param_4];
|
||||||
|
ld.param.u64 %rl2, [__soa_to_aos3_float1_param_5];
|
||||||
|
ld.param.f32 %f0, [__soa_to_aos3_float1_param_0];
|
||||||
|
ld.param.f32 %f1, [__soa_to_aos3_float1_param_1];
|
||||||
|
ld.param.f32 %f2, [__soa_to_aos3_float1_param_2];
|
||||||
|
st.f32 [%rl0], %f0;
|
||||||
|
st.f32 [%rl1], %f1;
|
||||||
|
st.f32 [%rl2], %f2;
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
// .globl __rsqrt_varying_double
|
||||||
|
.func (.param .align 8 .b8 func_retval0[8]) __rsqrt_varying_double(
|
||||||
|
.param .align 8 .b8 __rsqrt_varying_double_param_0[8]
|
||||||
|
) // @__rsqrt_varying_double
|
||||||
|
{
|
||||||
|
.reg .pred %p<396>;
|
||||||
|
.reg .s16 %rc<396>;
|
||||||
|
.reg .s16 %rs<396>;
|
||||||
|
.reg .s32 %r<396>;
|
||||||
|
.reg .s64 %rl<396>;
|
||||||
|
.reg .f32 %f<396>;
|
||||||
|
.reg .f64 %fl<396>;
|
||||||
|
|
||||||
|
// BB#0:
|
||||||
|
ld.param.f64 %fl0, [__rsqrt_varying_double_param_0];
|
||||||
|
rsqrt.approx.f64 %fl0, %fl0;
|
||||||
|
st.param.f64 [func_retval0+0], %fl0;
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
// .globl mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E_
|
||||||
|
.func mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E_(
|
||||||
|
.param .b32 mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_0,
|
||||||
|
.param .b32 mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_1,
|
||||||
|
.param .b32 mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_2,
|
||||||
|
.param .b32 mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_3,
|
||||||
|
.param .b32 mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_4,
|
||||||
|
.param .b32 mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_5,
|
||||||
|
.param .b32 mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_6,
|
||||||
|
.param .b64 mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_7,
|
||||||
|
.param .align 4 .b8 mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_8[4]
|
||||||
|
) // @mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E_
|
||||||
|
{
|
||||||
|
.reg .pred %p<396>;
|
||||||
|
.reg .s16 %rc<396>;
|
||||||
|
.reg .s16 %rs<396>;
|
||||||
|
.reg .s32 %r<396>;
|
||||||
|
.reg .s64 %rl<396>;
|
||||||
|
.reg .f32 %f<396>;
|
||||||
|
.reg .f64 %fl<396>;
|
||||||
|
|
||||||
|
// BB#0: // %allocas
|
||||||
|
ld.param.f32 %f0, [mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_0];
|
||||||
|
ld.param.f32 %f1, [mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_1];
|
||||||
|
ld.param.f32 %f3, [mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_2];
|
||||||
|
ld.param.f32 %f2, [mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_3];
|
||||||
|
ld.param.u32 %r0, [mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_4];
|
||||||
|
ld.param.u32 %r1, [mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_5];
|
||||||
|
ld.param.u32 %r2, [mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_6];
|
||||||
|
ld.param.u64 %rl0, [mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_7];
|
||||||
|
ld.param.u32 %r3, [mandelbrot_ispc___unfunfunfunfuniuniuniun_3C_uni_3E__param_8];
|
||||||
|
setp.lt.s32 %p0, %r3, 0;
|
||||||
|
sub.f32 %f3, %f3, %f0;
|
||||||
|
cvt.rn.f32.s32 %f4, %r0;
|
||||||
|
sub.f32 %f2, %f2, %f1;
|
||||||
|
cvt.rn.f32.s32 %f5, %r1;
|
||||||
|
div.rn.f32 %f2, %f2, %f5;
|
||||||
|
div.rn.f32 %f3, %f3, %f4;
|
||||||
|
@%p0 bra BB8_9;
|
||||||
|
// BB#1: // %for_test110.preheader
|
||||||
|
setp.lt.s32 %p0, %r1, 1;
|
||||||
|
@%p0 bra BB8_45;
|
||||||
|
// BB#2: // %outer_not_in_extras140.preheader.lr.ph
|
||||||
|
setp.gt.s32 %p0, %r2, 0;
|
||||||
|
mov.u32 %r3, 0;
|
||||||
|
selp.b32 %r4, -1, 0, %p0;
|
||||||
|
shl.b32 %r5, %r0, 2;
|
||||||
|
mov.u32 %r6, %r3;
|
||||||
|
BB8_3: // %outer_not_in_extras140.preheader
|
||||||
|
// =>This Loop Header: Depth=1
|
||||||
|
// Child Loop BB8_41 Depth 2
|
||||||
|
// Child Loop BB8_43 Depth 2
|
||||||
|
// Child Loop BB8_38 Depth 2
|
||||||
|
// Child Loop BB8_33 Depth 3
|
||||||
|
setp.lt.s32 %p0, %r0, 1;
|
||||||
|
@%p0 bra BB8_4;
|
||||||
|
// BB#31: // %foreach_full_body120.lr.ph
|
||||||
|
// in Loop: Header=BB8_3 Depth=1
|
||||||
|
setp.lt.s32 %p0, %r4, 0;
|
||||||
|
mov.u32 %r7, %r0;
|
||||||
|
mov.u32 %r8, %r3;
|
||||||
|
@%p0 bra BB8_32;
|
||||||
|
bra.uni BB8_43;
|
||||||
|
BB8_32: // in Loop: Header=BB8_3 Depth=1
|
||||||
|
mov.u64 %rl1, 0;
|
||||||
|
cvt.rn.f32.s32 %f4, %r6;
|
||||||
|
fma.rn.f32 %f4, %f2, %f4, %f1;
|
||||||
|
mul.lo.s32 %r7, %r6, %r0;
|
||||||
|
BB8_38: // %for_loop.i380.lr.ph.us
|
||||||
|
// Parent Loop BB8_3 Depth=1
|
||||||
|
// => This Loop Header: Depth=2
|
||||||
|
// Child Loop BB8_33 Depth 3
|
||||||
|
cvt.u32.u64 %r8, %rl1;
|
||||||
|
cvt.rn.f32.s32 %f5, %r8;
|
||||||
|
fma.rn.f32 %f5, %f3, %f5, %f0;
|
||||||
|
mov.u32 %r10, 0;
|
||||||
|
mov.u32 %r12, %r4;
|
||||||
|
mov.u32 %r11, %r10;
|
||||||
|
mov.u32 %r9, %r10;
|
||||||
|
mov.f32 %f7, %f5;
|
||||||
|
mov.f32 %f6, %f4;
|
||||||
|
BB8_33: // %for_loop.i380.us
|
||||||
|
// Parent Loop BB8_3 Depth=1
|
||||||
|
// Parent Loop BB8_38 Depth=2
|
||||||
|
// => This Inner Loop Header: Depth=3
|
||||||
|
mul.f32 %f8, %f7, %f7;
|
||||||
|
fma.rn.f32 %f9, %f6, %f6, %f8;
|
||||||
|
setp.gtu.f32 %p0, %f9, 0f40800000;
|
||||||
|
selp.b32 %r13, %r12, 0, %p0;
|
||||||
|
or.b32 %r11, %r13, %r11;
|
||||||
|
shr.u32 %r13, %r11, 31;
|
||||||
|
shr.u32 %r14, %r12, 31;
|
||||||
|
setp.eq.s32 %p0, %r13, %r14;
|
||||||
|
@%p0 bra BB8_34;
|
||||||
|
bra.uni BB8_35;
|
||||||
|
BB8_34: // in Loop: Header=BB8_33 Depth=3
|
||||||
|
mov.u32 %r12, %r10;
|
||||||
|
bra.uni BB8_36;
|
||||||
|
BB8_35: // %not_all_continued_or_breaked.i394.us
|
||||||
|
// in Loop: Header=BB8_33 Depth=3
|
||||||
|
mul.f32 %f9, %f6, %f6;
|
||||||
|
not.b32 %r13, %r11;
|
||||||
|
and.b32 %r12, %r12, %r13;
|
||||||
|
sub.f32 %f8, %f8, %f9;
|
||||||
|
add.f32 %f8, %f5, %f8;
|
||||||
|
add.f32 %f7, %f7, %f7;
|
||||||
|
fma.rn.f32 %f6, %f6, %f7, %f4;
|
||||||
|
mov.f32 %f7, %f8;
|
||||||
|
BB8_36: // %for_step.i363.us
|
||||||
|
// in Loop: Header=BB8_33 Depth=3
|
||||||
|
setp.ne.s32 %p0, %r12, 0;
|
||||||
|
selp.u32 %r13, 1, 0, %p0;
|
||||||
|
add.s32 %r9, %r9, %r13;
|
||||||
|
setp.lt.s32 %p0, %r9, %r2;
|
||||||
|
selp.b32 %r12, %r12, 0, %p0;
|
||||||
|
setp.lt.s32 %p0, %r12, 0;
|
||||||
|
@%p0 bra BB8_33;
|
||||||
|
// BB#37: // %mandel___vyfvyfvyi.exit395.us
|
||||||
|
// in Loop: Header=BB8_38 Depth=2
|
||||||
|
add.s32 %r8, %r8, %r7;
|
||||||
|
shl.b32 %r8, %r8, 2;
|
||||||
|
cvt.s64.s32 %rl2, %r8;
|
||||||
|
add.s64 %rl2, %rl2, %rl0;
|
||||||
|
st.u32 [%rl2], %r9;
|
||||||
|
add.s64 %rl1, %rl1, 1;
|
||||||
|
cvt.u32.u64 %r8, %rl1;
|
||||||
|
setp.eq.s32 %p0, %r8, %r0;
|
||||||
|
@%p0 bra BB8_44;
|
||||||
|
bra.uni BB8_38;
|
||||||
|
BB8_43: // %mandel___vyfvyfvyi.exit395
|
||||||
|
// Parent Loop BB8_3 Depth=1
|
||||||
|
// => This Inner Loop Header: Depth=2
|
||||||
|
cvt.s64.s32 %rl1, %r8;
|
||||||
|
add.s64 %rl1, %rl1, %rl0;
|
||||||
|
mov.u32 %r9, 0;
|
||||||
|
st.u32 [%rl1], %r9;
|
||||||
|
add.s32 %r8, %r8, 4;
|
||||||
|
add.s32 %r7, %r7, -1;
|
||||||
|
setp.eq.s32 %p0, %r7, 0;
|
||||||
|
@%p0 bra BB8_44;
|
||||||
|
bra.uni BB8_43;
|
||||||
|
BB8_4: // %partial_inner_all_outer156
|
||||||
|
// in Loop: Header=BB8_3 Depth=1
|
||||||
|
@%p0 bra BB8_44;
|
||||||
|
// BB#5: // %partial_inner_only197
|
||||||
|
// in Loop: Header=BB8_3 Depth=1
|
||||||
|
setp.gt.s32 %p0, %r0, 0;
|
||||||
|
mov.u32 %r8, 0;
|
||||||
|
fma.rn.f32 %f4, %f3, 0f00000000, %f0;
|
||||||
|
cvt.rn.f32.s32 %f5, %r6;
|
||||||
|
fma.rn.f32 %f5, %f2, %f5, %f1;
|
||||||
|
selp.b32 %r7, %r4, 0, %p0;
|
||||||
|
setp.lt.s32 %p1, %r7, 0;
|
||||||
|
mov.u32 %r10, %r4;
|
||||||
|
mov.u32 %r9, %r8;
|
||||||
|
mov.u32 %r7, %r8;
|
||||||
|
mov.f32 %f7, %f4;
|
||||||
|
mov.f32 %f6, %f5;
|
||||||
|
@%p1 bra BB8_41;
|
||||||
|
bra.uni BB8_6;
|
||||||
|
BB8_41: // %for_loop.i
|
||||||
|
// Parent Loop BB8_3 Depth=1
|
||||||
|
// => This Inner Loop Header: Depth=2
|
||||||
|
selp.b32 %r11, %r10, 0, %p0;
|
||||||
|
mul.f32 %f8, %f7, %f7;
|
||||||
|
fma.rn.f32 %f9, %f6, %f6, %f8;
|
||||||
|
setp.gtu.f32 %p1, %f9, 0f40800000;
|
||||||
|
selp.b32 %r12, %r10, 0, %p1;
|
||||||
|
or.b32 %r9, %r12, %r9;
|
||||||
|
selp.b32 %r12, %r9, 0, %p0;
|
||||||
|
shr.u32 %r12, %r12, 31;
|
||||||
|
shr.u32 %r11, %r11, 31;
|
||||||
|
setp.eq.s32 %p1, %r12, %r11;
|
||||||
|
@%p1 bra BB8_42;
|
||||||
|
bra.uni BB8_39;
|
||||||
|
BB8_42: // in Loop: Header=BB8_41 Depth=2
|
||||||
|
mov.u32 %r10, %r8;
|
||||||
|
bra.uni BB8_40;
|
||||||
|
BB8_39: // %not_all_continued_or_breaked.i
|
||||||
|
// in Loop: Header=BB8_41 Depth=2
|
||||||
|
mul.f32 %f9, %f6, %f6;
|
||||||
|
not.b32 %r11, %r9;
|
||||||
|
and.b32 %r10, %r10, %r11;
|
||||||
|
sub.f32 %f8, %f8, %f9;
|
||||||
|
add.f32 %f8, %f4, %f8;
|
||||||
|
add.f32 %f7, %f7, %f7;
|
||||||
|
fma.rn.f32 %f6, %f6, %f7, %f5;
|
||||||
|
mov.f32 %f7, %f8;
|
||||||
|
BB8_40: // %for_step.i
|
||||||
|
// in Loop: Header=BB8_41 Depth=2
|
||||||
|
setp.ne.s32 %p1, %r10, 0;
|
||||||
|
selp.u32 %r11, 1, 0, %p1;
|
||||||
|
add.s32 %r7, %r7, %r11;
|
||||||
|
setp.lt.s32 %p1, %r7, %r2;
|
||||||
|
selp.b32 %r10, %r10, 0, %p1;
|
||||||
|
selp.b32 %r11, %r10, 0, %p0;
|
||||||
|
setp.gt.s32 %p1, %r11, -1;
|
||||||
|
@%p1 bra BB8_7;
|
||||||
|
bra.uni BB8_41;
|
||||||
|
BB8_6: // in Loop: Header=BB8_3 Depth=1
|
||||||
|
mov.u32 %r7, %r8;
|
||||||
|
BB8_7: // %mandel___vyfvyfvyi.exit
|
||||||
|
// in Loop: Header=BB8_3 Depth=1
|
||||||
|
setp.lt.s32 %p0, %r0, 1;
|
||||||
|
@%p0 bra BB8_44;
|
||||||
|
// BB#8: // %pl_dolane.i
|
||||||
|
// in Loop: Header=BB8_3 Depth=1
|
||||||
|
mul.lo.s32 %r8, %r6, %r0;
|
||||||
|
shl.b32 %r8, %r8, 2;
|
||||||
|
cvt.s64.s32 %rl1, %r8;
|
||||||
|
add.s64 %rl1, %rl1, %rl0;
|
||||||
|
st.u32 [%rl1], %r7;
|
||||||
|
BB8_44: // %foreach_reset128
|
||||||
|
// in Loop: Header=BB8_3 Depth=1
|
||||||
|
add.s32 %r6, %r6, 1;
|
||||||
|
add.s32 %r3, %r3, %r5;
|
||||||
|
setp.eq.s32 %p0, %r6, %r1;
|
||||||
|
@%p0 bra BB8_45;
|
||||||
|
bra.uni BB8_3;
|
||||||
|
BB8_9: // %for_test.preheader
|
||||||
|
setp.lt.s32 %p0, %r1, 1;
|
||||||
|
@%p0 bra BB8_45;
|
||||||
|
// BB#10: // %outer_not_in_extras.preheader.lr.ph
|
||||||
|
setp.gt.s32 %p0, %r2, 0;
|
||||||
|
mov.u32 %r3, 0;
|
||||||
|
selp.b32 %r4, -1, 0, %p0;
|
||||||
|
shl.b32 %r5, %r0, 2;
|
||||||
|
mov.u32 %r6, %r3;
|
||||||
|
BB8_11: // %outer_not_in_extras.preheader
|
||||||
|
// =>This Loop Header: Depth=1
|
||||||
|
// Child Loop BB8_23 Depth 2
|
||||||
|
// Child Loop BB8_20 Depth 2
|
||||||
|
// Child Loop BB8_19 Depth 2
|
||||||
|
// Child Loop BB8_14 Depth 3
|
||||||
|
setp.lt.s32 %p0, %r0, 1;
|
||||||
|
@%p0 bra BB8_28;
|
||||||
|
// BB#12: // %foreach_full_body.lr.ph
|
||||||
|
// in Loop: Header=BB8_11 Depth=1
|
||||||
|
setp.lt.s32 %p0, %r4, 0;
|
||||||
|
mov.u32 %r7, %r0;
|
||||||
|
mov.u32 %r8, %r3;
|
||||||
|
@%p0 bra BB8_13;
|
||||||
|
bra.uni BB8_20;
|
||||||
|
BB8_13: // in Loop: Header=BB8_11 Depth=1
|
||||||
|
mov.u64 %rl1, 0;
|
||||||
|
cvt.rn.f32.s32 %f4, %r6;
|
||||||
|
fma.rn.f32 %f4, %f2, %f4, %f1;
|
||||||
|
mul.lo.s32 %r7, %r6, %r0;
|
||||||
|
BB8_19: // %for_loop.i281.lr.ph.us
|
||||||
|
// Parent Loop BB8_11 Depth=1
|
||||||
|
// => This Loop Header: Depth=2
|
||||||
|
// Child Loop BB8_14 Depth 3
|
||||||
|
cvt.u32.u64 %r8, %rl1;
|
||||||
|
cvt.rn.f32.s32 %f5, %r8;
|
||||||
|
fma.rn.f32 %f5, %f3, %f5, %f0;
|
||||||
|
mov.u32 %r10, 0;
|
||||||
|
mov.u32 %r12, %r4;
|
||||||
|
mov.u32 %r11, %r10;
|
||||||
|
mov.u32 %r9, %r10;
|
||||||
|
mov.f32 %f7, %f5;
|
||||||
|
mov.f32 %f6, %f4;
|
||||||
|
BB8_14: // %for_loop.i281.us
|
||||||
|
// Parent Loop BB8_11 Depth=1
|
||||||
|
// Parent Loop BB8_19 Depth=2
|
||||||
|
// => This Inner Loop Header: Depth=3
|
||||||
|
mul.f32 %f8, %f7, %f7;
|
||||||
|
fma.rn.f32 %f9, %f6, %f6, %f8;
|
||||||
|
setp.gtu.f32 %p0, %f9, 0f40800000;
|
||||||
|
selp.b32 %r13, %r12, 0, %p0;
|
||||||
|
or.b32 %r11, %r13, %r11;
|
||||||
|
shr.u32 %r13, %r11, 31;
|
||||||
|
shr.u32 %r14, %r12, 31;
|
||||||
|
setp.eq.s32 %p0, %r13, %r14;
|
||||||
|
@%p0 bra BB8_15;
|
||||||
|
bra.uni BB8_16;
|
||||||
|
BB8_15: // in Loop: Header=BB8_14 Depth=3
|
||||||
|
mov.u32 %r12, %r10;
|
||||||
|
bra.uni BB8_17;
|
||||||
|
BB8_16: // %not_all_continued_or_breaked.i295.us
|
||||||
|
// in Loop: Header=BB8_14 Depth=3
|
||||||
|
mul.f32 %f9, %f6, %f6;
|
||||||
|
not.b32 %r13, %r11;
|
||||||
|
and.b32 %r12, %r12, %r13;
|
||||||
|
sub.f32 %f8, %f8, %f9;
|
||||||
|
add.f32 %f8, %f5, %f8;
|
||||||
|
add.f32 %f7, %f7, %f7;
|
||||||
|
fma.rn.f32 %f6, %f6, %f7, %f4;
|
||||||
|
mov.f32 %f7, %f8;
|
||||||
|
BB8_17: // %for_step.i264.us
|
||||||
|
// in Loop: Header=BB8_14 Depth=3
|
||||||
|
setp.ne.s32 %p0, %r12, 0;
|
||||||
|
selp.u32 %r13, 1, 0, %p0;
|
||||||
|
add.s32 %r9, %r9, %r13;
|
||||||
|
setp.lt.s32 %p0, %r9, %r2;
|
||||||
|
selp.b32 %r12, %r12, 0, %p0;
|
||||||
|
setp.lt.s32 %p0, %r12, 0;
|
||||||
|
@%p0 bra BB8_14;
|
||||||
|
// BB#18: // %mandel___vyfvyfvyi.exit296.us
|
||||||
|
// in Loop: Header=BB8_19 Depth=2
|
||||||
|
add.s32 %r8, %r8, %r7;
|
||||||
|
shl.b32 %r8, %r8, 2;
|
||||||
|
cvt.s64.s32 %rl2, %r8;
|
||||||
|
add.s64 %rl2, %rl2, %rl0;
|
||||||
|
st.u32 [%rl2], %r9;
|
||||||
|
add.s64 %rl1, %rl1, 1;
|
||||||
|
cvt.u32.u64 %r8, %rl1;
|
||||||
|
setp.eq.s32 %p0, %r8, %r0;
|
||||||
|
@%p0 bra BB8_27;
|
||||||
|
bra.uni BB8_19;
|
||||||
|
BB8_20: // %mandel___vyfvyfvyi.exit296
|
||||||
|
// Parent Loop BB8_11 Depth=1
|
||||||
|
// => This Inner Loop Header: Depth=2
|
||||||
|
cvt.s64.s32 %rl1, %r8;
|
||||||
|
add.s64 %rl1, %rl1, %rl0;
|
||||||
|
mov.u32 %r9, 0;
|
||||||
|
st.u32 [%rl1], %r9;
|
||||||
|
add.s32 %r8, %r8, 4;
|
||||||
|
add.s32 %r7, %r7, -1;
|
||||||
|
setp.eq.s32 %p0, %r7, 0;
|
||||||
|
@%p0 bra BB8_27;
|
||||||
|
bra.uni BB8_20;
|
||||||
|
BB8_28: // %partial_inner_all_outer
|
||||||
|
// in Loop: Header=BB8_11 Depth=1
|
||||||
|
@%p0 bra BB8_27;
|
||||||
|
// BB#29: // %partial_inner_only
|
||||||
|
// in Loop: Header=BB8_11 Depth=1
|
||||||
|
setp.gt.s32 %p0, %r0, 0;
|
||||||
|
mov.u32 %r8, 0;
|
||||||
|
fma.rn.f32 %f4, %f3, 0f00000000, %f0;
|
||||||
|
cvt.rn.f32.s32 %f5, %r6;
|
||||||
|
fma.rn.f32 %f5, %f2, %f5, %f1;
|
||||||
|
selp.b32 %r7, %r4, 0, %p0;
|
||||||
|
setp.lt.s32 %p1, %r7, 0;
|
||||||
|
mov.u32 %r10, %r4;
|
||||||
|
mov.u32 %r9, %r8;
|
||||||
|
mov.u32 %r7, %r8;
|
||||||
|
mov.f32 %f7, %f4;
|
||||||
|
mov.f32 %f6, %f5;
|
||||||
|
@%p1 bra BB8_23;
|
||||||
|
bra.uni BB8_30;
|
||||||
|
BB8_23: // %for_loop.i332
|
||||||
|
// Parent Loop BB8_11 Depth=1
|
||||||
|
// => This Inner Loop Header: Depth=2
|
||||||
|
selp.b32 %r11, %r10, 0, %p0;
|
||||||
|
mul.f32 %f8, %f7, %f7;
|
||||||
|
fma.rn.f32 %f9, %f6, %f6, %f8;
|
||||||
|
setp.gtu.f32 %p1, %f9, 0f40800000;
|
||||||
|
selp.b32 %r12, %r10, 0, %p1;
|
||||||
|
or.b32 %r9, %r12, %r9;
|
||||||
|
selp.b32 %r12, %r9, 0, %p0;
|
||||||
|
shr.u32 %r12, %r12, 31;
|
||||||
|
shr.u32 %r11, %r11, 31;
|
||||||
|
setp.eq.s32 %p1, %r12, %r11;
|
||||||
|
@%p1 bra BB8_24;
|
||||||
|
bra.uni BB8_21;
|
||||||
|
BB8_24: // in Loop: Header=BB8_23 Depth=2
|
||||||
|
mov.u32 %r10, %r8;
|
||||||
|
bra.uni BB8_22;
|
||||||
|
BB8_21: // %not_all_continued_or_breaked.i346
|
||||||
|
// in Loop: Header=BB8_23 Depth=2
|
||||||
|
mul.f32 %f9, %f6, %f6;
|
||||||
|
not.b32 %r11, %r9;
|
||||||
|
and.b32 %r10, %r10, %r11;
|
||||||
|
sub.f32 %f8, %f8, %f9;
|
||||||
|
add.f32 %f8, %f4, %f8;
|
||||||
|
add.f32 %f7, %f7, %f7;
|
||||||
|
fma.rn.f32 %f6, %f6, %f7, %f5;
|
||||||
|
mov.f32 %f7, %f8;
|
||||||
|
BB8_22: // %for_step.i313
|
||||||
|
// in Loop: Header=BB8_23 Depth=2
|
||||||
|
setp.ne.s32 %p1, %r10, 0;
|
||||||
|
selp.u32 %r11, 1, 0, %p1;
|
||||||
|
add.s32 %r7, %r7, %r11;
|
||||||
|
setp.lt.s32 %p1, %r7, %r2;
|
||||||
|
selp.b32 %r10, %r10, 0, %p1;
|
||||||
|
selp.b32 %r11, %r10, 0, %p0;
|
||||||
|
setp.gt.s32 %p1, %r11, -1;
|
||||||
|
@%p1 bra BB8_25;
|
||||||
|
bra.uni BB8_23;
|
||||||
|
BB8_30: // in Loop: Header=BB8_11 Depth=1
|
||||||
|
mov.u32 %r7, %r8;
|
||||||
|
BB8_25: // %mandel___vyfvyfvyi.exit347
|
||||||
|
// in Loop: Header=BB8_11 Depth=1
|
||||||
|
setp.lt.s32 %p0, %r0, 1;
|
||||||
|
@%p0 bra BB8_27;
|
||||||
|
// BB#26: // %pl_dolane.i452
|
||||||
|
// in Loop: Header=BB8_11 Depth=1
|
||||||
|
mul.lo.s32 %r8, %r6, %r0;
|
||||||
|
shl.b32 %r8, %r8, 2;
|
||||||
|
cvt.s64.s32 %rl1, %r8;
|
||||||
|
add.s64 %rl1, %rl1, %rl0;
|
||||||
|
st.u32 [%rl1], %r7;
|
||||||
|
BB8_27: // %foreach_reset
|
||||||
|
// in Loop: Header=BB8_11 Depth=1
|
||||||
|
add.s32 %r6, %r6, 1;
|
||||||
|
add.s32 %r3, %r3, %r5;
|
||||||
|
setp.eq.s32 %p0, %r6, %r1;
|
||||||
|
@%p0 bra BB8_45;
|
||||||
|
bra.uni BB8_11;
|
||||||
|
BB8_45: // %for_exit
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
// .globl mandelbrot_ispc
|
||||||
|
.func mandelbrot_ispc(
|
||||||
|
.param .b32 mandelbrot_ispc_param_0,
|
||||||
|
.param .b32 mandelbrot_ispc_param_1,
|
||||||
|
.param .b32 mandelbrot_ispc_param_2,
|
||||||
|
.param .b32 mandelbrot_ispc_param_3,
|
||||||
|
.param .b32 mandelbrot_ispc_param_4,
|
||||||
|
.param .b32 mandelbrot_ispc_param_5,
|
||||||
|
.param .b32 mandelbrot_ispc_param_6,
|
||||||
|
.param .b64 mandelbrot_ispc_param_7
|
||||||
|
) // @mandelbrot_ispc
|
||||||
|
{
|
||||||
|
.reg .pred %p<396>;
|
||||||
|
.reg .s16 %rc<396>;
|
||||||
|
.reg .s16 %rs<396>;
|
||||||
|
.reg .s32 %r<396>;
|
||||||
|
.reg .s64 %rl<396>;
|
||||||
|
.reg .f32 %f<396>;
|
||||||
|
.reg .f64 %fl<396>;
|
||||||
|
|
||||||
|
// BB#0: // %allocas
|
||||||
|
ld.param.u32 %r0, [mandelbrot_ispc_param_5];
|
||||||
|
setp.lt.s32 %p0, %r0, 1;
|
||||||
|
@%p0 bra BB9_18;
|
||||||
|
// BB#1: // %outer_not_in_extras.preheader.lr.ph
|
||||||
|
ld.param.f32 %f0, [mandelbrot_ispc_param_0];
|
||||||
|
ld.param.f32 %f1, [mandelbrot_ispc_param_1];
|
||||||
|
ld.param.f32 %f3, [mandelbrot_ispc_param_2];
|
||||||
|
ld.param.f32 %f2, [mandelbrot_ispc_param_3];
|
||||||
|
ld.param.u32 %r1, [mandelbrot_ispc_param_4];
|
||||||
|
ld.param.u32 %r2, [mandelbrot_ispc_param_6];
|
||||||
|
ld.param.u64 %rl0, [mandelbrot_ispc_param_7];
|
||||||
|
sub.f32 %f3, %f3, %f0;
|
||||||
|
cvt.rn.f32.s32 %f4, %r1;
|
||||||
|
sub.f32 %f2, %f2, %f1;
|
||||||
|
cvt.rn.f32.s32 %f5, %r0;
|
||||||
|
div.rn.f32 %f2, %f2, %f5;
|
||||||
|
div.rn.f32 %f3, %f3, %f4;
|
||||||
|
setp.gt.s32 %p0, %r2, 0;
|
||||||
|
mov.u32 %r3, 0;
|
||||||
|
selp.b32 %r4, -1, 0, %p0;
|
||||||
|
BB9_2: // %outer_not_in_extras.preheader
|
||||||
|
// =>This Loop Header: Depth=1
|
||||||
|
// Child Loop BB9_13 Depth 2
|
||||||
|
// Child Loop BB9_4 Depth 2
|
||||||
|
// Child Loop BB9_9 Depth 3
|
||||||
|
setp.lt.s32 %p0, %r1, 1;
|
||||||
|
@%p0 bra BB9_19;
|
||||||
|
// BB#3: // %foreach_full_body.lr.ph
|
||||||
|
// in Loop: Header=BB9_2 Depth=1
|
||||||
|
mov.u64 %rl1, 0;
|
||||||
|
cvt.rn.f32.s32 %f4, %r3;
|
||||||
|
fma.rn.f32 %f4, %f2, %f4, %f1;
|
||||||
|
mul.lo.s32 %r5, %r3, %r1;
|
||||||
|
BB9_4: // %foreach_full_body
|
||||||
|
// Parent Loop BB9_2 Depth=1
|
||||||
|
// => This Loop Header: Depth=2
|
||||||
|
// Child Loop BB9_9 Depth 3
|
||||||
|
setp.lt.s32 %p0, %r4, 0;
|
||||||
|
cvt.u32.u64 %r6, %rl1;
|
||||||
|
cvt.rn.f32.s32 %f5, %r6;
|
||||||
|
fma.rn.f32 %f5, %f3, %f5, %f0;
|
||||||
|
mov.u32 %r8, 0;
|
||||||
|
mov.u32 %r10, %r4;
|
||||||
|
mov.u32 %r9, %r8;
|
||||||
|
mov.u32 %r7, %r8;
|
||||||
|
mov.f32 %f7, %f5;
|
||||||
|
mov.f32 %f6, %f4;
|
||||||
|
@%p0 bra BB9_9;
|
||||||
|
bra.uni BB9_5;
|
||||||
|
BB9_9: // %for_loop.i281
|
||||||
|
// Parent Loop BB9_2 Depth=1
|
||||||
|
// Parent Loop BB9_4 Depth=2
|
||||||
|
// => This Inner Loop Header: Depth=3
|
||||||
|
mul.f32 %f8, %f7, %f7;
|
||||||
|
fma.rn.f32 %f9, %f6, %f6, %f8;
|
||||||
|
setp.gtu.f32 %p0, %f9, 0f40800000;
|
||||||
|
selp.b32 %r11, %r10, 0, %p0;
|
||||||
|
or.b32 %r9, %r11, %r9;
|
||||||
|
shr.u32 %r11, %r9, 31;
|
||||||
|
shr.u32 %r12, %r10, 31;
|
||||||
|
setp.eq.s32 %p0, %r11, %r12;
|
||||||
|
@%p0 bra BB9_10;
|
||||||
|
bra.uni BB9_7;
|
||||||
|
BB9_10: // in Loop: Header=BB9_9 Depth=3
|
||||||
|
mov.u32 %r10, %r8;
|
||||||
|
bra.uni BB9_8;
|
||||||
|
BB9_7: // %not_all_continued_or_breaked.i295
|
||||||
|
// in Loop: Header=BB9_9 Depth=3
|
||||||
|
mul.f32 %f9, %f6, %f6;
|
||||||
|
not.b32 %r11, %r9;
|
||||||
|
and.b32 %r10, %r10, %r11;
|
||||||
|
sub.f32 %f8, %f8, %f9;
|
||||||
|
add.f32 %f8, %f5, %f8;
|
||||||
|
add.f32 %f7, %f7, %f7;
|
||||||
|
fma.rn.f32 %f6, %f6, %f7, %f4;
|
||||||
|
mov.f32 %f7, %f8;
|
||||||
|
BB9_8: // %for_step.i264
|
||||||
|
// in Loop: Header=BB9_9 Depth=3
|
||||||
|
setp.ne.s32 %p0, %r10, 0;
|
||||||
|
selp.u32 %r11, 1, 0, %p0;
|
||||||
|
add.s32 %r7, %r7, %r11;
|
||||||
|
setp.lt.s32 %p0, %r7, %r2;
|
||||||
|
selp.b32 %r10, %r10, 0, %p0;
|
||||||
|
setp.gt.s32 %p0, %r10, -1;
|
||||||
|
@%p0 bra BB9_6;
|
||||||
|
bra.uni BB9_9;
|
||||||
|
BB9_5: // in Loop: Header=BB9_4 Depth=2
|
||||||
|
mov.u32 %r7, %r8;
|
||||||
|
BB9_6: // %mandel___vyfvyfvyi.exit296
|
||||||
|
// in Loop: Header=BB9_4 Depth=2
|
||||||
|
add.s32 %r6, %r6, %r5;
|
||||||
|
shl.b32 %r6, %r6, 2;
|
||||||
|
cvt.s64.s32 %rl2, %r6;
|
||||||
|
add.s64 %rl2, %rl2, %rl0;
|
||||||
|
st.u32 [%rl2], %r7;
|
||||||
|
add.s64 %rl1, %rl1, 1;
|
||||||
|
cvt.u32.u64 %r6, %rl1;
|
||||||
|
setp.eq.s32 %p0, %r6, %r1;
|
||||||
|
@%p0 bra BB9_17;
|
||||||
|
bra.uni BB9_4;
|
||||||
|
BB9_19: // %partial_inner_all_outer
|
||||||
|
// in Loop: Header=BB9_2 Depth=1
|
||||||
|
@%p0 bra BB9_17;
|
||||||
|
// BB#20: // %partial_inner_only
|
||||||
|
// in Loop: Header=BB9_2 Depth=1
|
||||||
|
setp.gt.s32 %p0, %r1, 0;
|
||||||
|
mov.u32 %r6, 0;
|
||||||
|
fma.rn.f32 %f4, %f3, 0f00000000, %f0;
|
||||||
|
cvt.rn.f32.s32 %f5, %r3;
|
||||||
|
fma.rn.f32 %f5, %f2, %f5, %f1;
|
||||||
|
selp.b32 %r5, %r4, 0, %p0;
|
||||||
|
setp.lt.s32 %p1, %r5, 0;
|
||||||
|
mov.u32 %r8, %r4;
|
||||||
|
mov.u32 %r7, %r6;
|
||||||
|
mov.u32 %r5, %r6;
|
||||||
|
mov.f32 %f7, %f4;
|
||||||
|
mov.f32 %f6, %f5;
|
||||||
|
@%p1 bra BB9_13;
|
||||||
|
bra.uni BB9_21;
|
||||||
|
BB9_13: // %for_loop.i332
|
||||||
|
// Parent Loop BB9_2 Depth=1
|
||||||
|
// => This Inner Loop Header: Depth=2
|
||||||
|
selp.b32 %r9, %r8, 0, %p0;
|
||||||
|
mul.f32 %f8, %f7, %f7;
|
||||||
|
fma.rn.f32 %f9, %f6, %f6, %f8;
|
||||||
|
setp.gtu.f32 %p1, %f9, 0f40800000;
|
||||||
|
selp.b32 %r10, %r8, 0, %p1;
|
||||||
|
or.b32 %r7, %r10, %r7;
|
||||||
|
selp.b32 %r10, %r7, 0, %p0;
|
||||||
|
shr.u32 %r10, %r10, 31;
|
||||||
|
shr.u32 %r9, %r9, 31;
|
||||||
|
setp.eq.s32 %p1, %r10, %r9;
|
||||||
|
@%p1 bra BB9_14;
|
||||||
|
bra.uni BB9_11;
|
||||||
|
BB9_14: // in Loop: Header=BB9_13 Depth=2
|
||||||
|
mov.u32 %r8, %r6;
|
||||||
|
bra.uni BB9_12;
|
||||||
|
BB9_11: // %not_all_continued_or_breaked.i346
|
||||||
|
// in Loop: Header=BB9_13 Depth=2
|
||||||
|
mul.f32 %f9, %f6, %f6;
|
||||||
|
not.b32 %r9, %r7;
|
||||||
|
and.b32 %r8, %r8, %r9;
|
||||||
|
sub.f32 %f8, %f8, %f9;
|
||||||
|
add.f32 %f8, %f4, %f8;
|
||||||
|
add.f32 %f7, %f7, %f7;
|
||||||
|
fma.rn.f32 %f6, %f6, %f7, %f5;
|
||||||
|
mov.f32 %f7, %f8;
|
||||||
|
BB9_12: // %for_step.i313
|
||||||
|
// in Loop: Header=BB9_13 Depth=2
|
||||||
|
setp.ne.s32 %p1, %r8, 0;
|
||||||
|
selp.u32 %r9, 1, 0, %p1;
|
||||||
|
add.s32 %r5, %r5, %r9;
|
||||||
|
setp.lt.s32 %p1, %r5, %r2;
|
||||||
|
selp.b32 %r8, %r8, 0, %p1;
|
||||||
|
selp.b32 %r9, %r8, 0, %p0;
|
||||||
|
setp.gt.s32 %p1, %r9, -1;
|
||||||
|
@%p1 bra BB9_15;
|
||||||
|
bra.uni BB9_13;
|
||||||
|
BB9_21: // in Loop: Header=BB9_2 Depth=1
|
||||||
|
mov.u32 %r5, %r6;
|
||||||
|
BB9_15: // %mandel___vyfvyfvyi.exit347
|
||||||
|
// in Loop: Header=BB9_2 Depth=1
|
||||||
|
setp.lt.s32 %p0, %r1, 1;
|
||||||
|
@%p0 bra BB9_17;
|
||||||
|
// BB#16: // %pl_dolane.i
|
||||||
|
// in Loop: Header=BB9_2 Depth=1
|
||||||
|
mul.lo.s32 %r6, %r3, %r1;
|
||||||
|
shl.b32 %r6, %r6, 2;
|
||||||
|
cvt.s64.s32 %rl1, %r6;
|
||||||
|
add.s64 %rl1, %rl1, %rl0;
|
||||||
|
st.u32 [%rl1], %r5;
|
||||||
|
BB9_17: // %foreach_reset
|
||||||
|
// in Loop: Header=BB9_2 Depth=1
|
||||||
|
add.s32 %r3, %r3, 1;
|
||||||
|
setp.eq.s32 %p0, %r3, %r0;
|
||||||
|
@%p0 bra BB9_18;
|
||||||
|
bra.uni BB9_2;
|
||||||
|
BB9_18: // %for_exit
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
BIN
examples_cuda/mandelbrot/out.s
Normal file
BIN
examples_cuda/mandelbrot/out.s
Normal file
Binary file not shown.
BIN
examples_cuda/mandelbrot/out1.o
Normal file
BIN
examples_cuda/mandelbrot/out1.o
Normal file
Binary file not shown.
2
examples_cuda/mandelbrot_tasks/.gitignore
vendored
Normal file
2
examples_cuda/mandelbrot_tasks/.gitignore
vendored
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
mandelbrot
|
||||||
|
*.ppm
|
||||||
8
examples_cuda/mandelbrot_tasks/Makefile
Normal file
8
examples_cuda/mandelbrot_tasks/Makefile
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
|
||||||
|
EXAMPLE=mandelbrot_tasks
|
||||||
|
CPP_SRC=mandelbrot_tasks.cpp mandelbrot_tasks_serial.cpp
|
||||||
|
ISPC_SRC=mandelbrot_tasks.ispc
|
||||||
|
ISPC_IA_TARGETS=sse2,sse4-x2,avx-x2
|
||||||
|
ISPC_ARM_TARGETS=neon
|
||||||
|
|
||||||
|
include ../common.mk
|
||||||
146
examples_cuda/mandelbrot_tasks/mandelbrot_tasks.cpp
Normal file
146
examples_cuda/mandelbrot_tasks/mandelbrot_tasks.cpp
Normal file
@@ -0,0 +1,146 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2010-2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifdef _MSC_VER
|
||||||
|
#define _CRT_SECURE_NO_WARNINGS
|
||||||
|
#define NOMINMAX
|
||||||
|
#pragma warning (disable: 4244)
|
||||||
|
#pragma warning (disable: 4305)
|
||||||
|
#endif
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <algorithm>
|
||||||
|
#include <string.h>
|
||||||
|
#include "../timing.h"
|
||||||
|
#include "mandelbrot_tasks_ispc.h"
|
||||||
|
using namespace ispc;
|
||||||
|
|
||||||
|
extern void mandelbrot_serial(float x0, float y0, float x1, float y1,
|
||||||
|
int width, int height, int maxIterations,
|
||||||
|
int output[]);
|
||||||
|
|
||||||
|
/* Write a PPM image file with the image of the Mandelbrot set */
|
||||||
|
static void
|
||||||
|
writePPM(int *buf, int width, int height, const char *fn) {
|
||||||
|
FILE *fp = fopen(fn, "wb");
|
||||||
|
fprintf(fp, "P6\n");
|
||||||
|
fprintf(fp, "%d %d\n", width, height);
|
||||||
|
fprintf(fp, "255\n");
|
||||||
|
for (int i = 0; i < width*height; ++i) {
|
||||||
|
// Map the iteration count to colors by just alternating between
|
||||||
|
// two greys.
|
||||||
|
char c = (buf[i] & 0x1) ? 240 : 20;
|
||||||
|
for (int j = 0; j < 3; ++j)
|
||||||
|
fputc(c, fp);
|
||||||
|
}
|
||||||
|
fclose(fp);
|
||||||
|
printf("Wrote image file %s\n", fn);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void usage() {
|
||||||
|
fprintf(stderr, "usage: mandelbrot [--scale=<factor>]\n");
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char *argv[]) {
|
||||||
|
unsigned int width = 1536;
|
||||||
|
unsigned int height = 1024;
|
||||||
|
float x0 = -2;
|
||||||
|
float x1 = 1;
|
||||||
|
float y0 = -1;
|
||||||
|
float y1 = 1;
|
||||||
|
|
||||||
|
if (argc == 1)
|
||||||
|
;
|
||||||
|
else if (argc == 2) {
|
||||||
|
if (strncmp(argv[1], "--scale=", 8) == 0) {
|
||||||
|
float scale = atof(argv[1] + 8);
|
||||||
|
if (scale == 0.f)
|
||||||
|
usage();
|
||||||
|
width *= scale;
|
||||||
|
height *= scale;
|
||||||
|
// round up to multiples of 16
|
||||||
|
width = (width + 0xf) & ~0xf;
|
||||||
|
height = (height + 0xf) & ~0xf;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
usage();
|
||||||
|
}
|
||||||
|
else
|
||||||
|
usage();
|
||||||
|
|
||||||
|
int maxIterations = 512;
|
||||||
|
int *buf = new int[width*height];
|
||||||
|
|
||||||
|
//
|
||||||
|
// Compute the image using the ispc implementation; report the minimum
|
||||||
|
// time of three runs.
|
||||||
|
//
|
||||||
|
double minISPC = 1e30;
|
||||||
|
for (int i = 0; i < 3; ++i) {
|
||||||
|
// Clear out the buffer
|
||||||
|
for (unsigned int i = 0; i < width * height; ++i)
|
||||||
|
buf[i] = 0;
|
||||||
|
reset_and_start_timer();
|
||||||
|
mandelbrot_ispc(x0, y0, x1, y1, width, height, maxIterations, buf);
|
||||||
|
double dt = get_elapsed_mcycles();
|
||||||
|
minISPC = std::min(minISPC, dt);
|
||||||
|
}
|
||||||
|
|
||||||
|
printf("[mandelbrot ispc+tasks]:\t[%.3f] million cycles\n", minISPC);
|
||||||
|
writePPM(buf, width, height, "mandelbrot-ispc.ppm");
|
||||||
|
|
||||||
|
|
||||||
|
//
|
||||||
|
// And run the serial implementation 3 times, again reporting the
|
||||||
|
// minimum time.
|
||||||
|
//
|
||||||
|
double minSerial = 1e30;
|
||||||
|
for (int i = 0; i < 3; ++i) {
|
||||||
|
// Clear out the buffer
|
||||||
|
for (unsigned int i = 0; i < width * height; ++i)
|
||||||
|
buf[i] = 0;
|
||||||
|
reset_and_start_timer();
|
||||||
|
mandelbrot_serial(x0, y0, x1, y1, width, height, maxIterations, buf);
|
||||||
|
double dt = get_elapsed_mcycles();
|
||||||
|
minSerial = std::min(minSerial, dt);
|
||||||
|
}
|
||||||
|
|
||||||
|
printf("[mandelbrot serial]:\t\t[%.3f] million cycles\n", minSerial);
|
||||||
|
writePPM(buf, width, height, "mandelbrot-serial.ppm");
|
||||||
|
|
||||||
|
printf("\t\t\t\t(%.2fx speedup from ISPC + tasks)\n", minSerial/minISPC);
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
86
examples_cuda/mandelbrot_tasks/mandelbrot_tasks.ispc
Normal file
86
examples_cuda/mandelbrot_tasks/mandelbrot_tasks.ispc
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2010-2012, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
static inline int
|
||||||
|
mandel(float c_re, float c_im, int count) {
|
||||||
|
float z_re = c_re, z_im = c_im;
|
||||||
|
int i;
|
||||||
|
for (i = 0; i < count; ++i) {
|
||||||
|
if (z_re * z_re + z_im * z_im > 4.)
|
||||||
|
break;
|
||||||
|
|
||||||
|
float new_re = z_re*z_re - z_im*z_im;
|
||||||
|
float new_im = 2.f * z_re * z_im;
|
||||||
|
unmasked {
|
||||||
|
z_re = c_re + new_re;
|
||||||
|
z_im = c_im + new_im;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return i;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/* Task to compute the Mandelbrot iterations for a single scanline.
|
||||||
|
*/
|
||||||
|
task void
|
||||||
|
mandelbrot_scanline(uniform float x0, uniform float dx,
|
||||||
|
uniform float y0, uniform float dy,
|
||||||
|
uniform int width, uniform int height,
|
||||||
|
uniform int span,
|
||||||
|
uniform int maxIterations, uniform int output[]) {
|
||||||
|
uniform int ystart = taskIndex * span;
|
||||||
|
uniform int yend = min((taskIndex+1) * span, (unsigned int)height);
|
||||||
|
|
||||||
|
foreach (yi = ystart ... yend, xi = 0 ... width) {
|
||||||
|
float x = x0 + xi * dx;
|
||||||
|
float y = y0 + yi * dy;
|
||||||
|
|
||||||
|
int index = yi * width + xi;
|
||||||
|
output[index] = mandel(x, y, maxIterations);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
export void
|
||||||
|
mandelbrot_ispc(uniform float x0, uniform float y0,
|
||||||
|
uniform float x1, uniform float y1,
|
||||||
|
uniform int width, uniform int height,
|
||||||
|
uniform int maxIterations, uniform int output[]) {
|
||||||
|
uniform float dx = (x1 - x0) / width;
|
||||||
|
uniform float dy = (y1 - y0) / height;
|
||||||
|
uniform int span = 4;
|
||||||
|
|
||||||
|
launch[height/span] mandelbrot_scanline(x0, dx, y0, dy, width, height, span,
|
||||||
|
maxIterations, output);
|
||||||
|
}
|
||||||
180
examples_cuda/mandelbrot_tasks/mandelbrot_tasks.vcxproj
Normal file
180
examples_cuda/mandelbrot_tasks/mandelbrot_tasks.vcxproj
Normal file
@@ -0,0 +1,180 @@
|
|||||||
|
<?xml version="1.0" encoding="utf-8"?>
|
||||||
|
<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
|
||||||
|
<ItemGroup Label="ProjectConfigurations">
|
||||||
|
<ProjectConfiguration Include="Debug|Win32">
|
||||||
|
<Configuration>Debug</Configuration>
|
||||||
|
<Platform>Win32</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Debug|x64">
|
||||||
|
<Configuration>Debug</Configuration>
|
||||||
|
<Platform>x64</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Release|Win32">
|
||||||
|
<Configuration>Release</Configuration>
|
||||||
|
<Platform>Win32</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
<ProjectConfiguration Include="Release|x64">
|
||||||
|
<Configuration>Release</Configuration>
|
||||||
|
<Platform>x64</Platform>
|
||||||
|
</ProjectConfiguration>
|
||||||
|
</ItemGroup>
|
||||||
|
<PropertyGroup Label="Globals">
|
||||||
|
<ProjectGuid>{E80DA7D4-AB22-4648-A068-327307156BE6}</ProjectGuid>
|
||||||
|
<Keyword>Win32Proj</Keyword>
|
||||||
|
<RootNamespace>mandelbrot_tasks</RootNamespace>
|
||||||
|
</PropertyGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>true</UseDebugLibraries>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>true</UseDebugLibraries>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>false</UseDebugLibraries>
|
||||||
|
<WholeProgramOptimization>true</WholeProgramOptimization>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
|
||||||
|
<ConfigurationType>Application</ConfigurationType>
|
||||||
|
<UseDebugLibraries>false</UseDebugLibraries>
|
||||||
|
<WholeProgramOptimization>true</WholeProgramOptimization>
|
||||||
|
<CharacterSet>Unicode</CharacterSet>
|
||||||
|
</PropertyGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
|
||||||
|
<ImportGroup Label="ExtensionSettings">
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="PropertySheets">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<ImportGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="PropertySheets">
|
||||||
|
<Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
|
||||||
|
</ImportGroup>
|
||||||
|
<PropertyGroup Label="UserMacros" />
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<LinkIncremental>true</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
<TargetName>mandelbrot_tasks</TargetName>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
|
||||||
|
<LinkIncremental>true</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
<TargetName>mandelbrot_tasks</TargetName>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<LinkIncremental>false</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
<TargetName>mandelbrot_tasks</TargetName>
|
||||||
|
</PropertyGroup>
|
||||||
|
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
|
||||||
|
<LinkIncremental>false</LinkIncremental>
|
||||||
|
<ExecutablePath>$(ProjectDir)..\..;$(ExecutablePath)</ExecutablePath>
|
||||||
|
<TargetName>mandelbrot_tasks</TargetName>
|
||||||
|
</PropertyGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
|
||||||
|
<ClCompile>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<Optimization>Disabled</Optimization>
|
||||||
|
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">
|
||||||
|
<ClCompile>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<Optimization>Disabled</Optimization>
|
||||||
|
<PreprocessorDefinitions>WIN32;_DEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
|
||||||
|
<ClCompile>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<Optimization>MaxSpeed</Optimization>
|
||||||
|
<FunctionLevelLinking>true</FunctionLevelLinking>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
<EnableCOMDATFolding>true</EnableCOMDATFolding>
|
||||||
|
<OptimizeReferences>true</OptimizeReferences>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'">
|
||||||
|
<ClCompile>
|
||||||
|
<WarningLevel>Level3</WarningLevel>
|
||||||
|
<PrecompiledHeader>
|
||||||
|
</PrecompiledHeader>
|
||||||
|
<Optimization>MaxSpeed</Optimization>
|
||||||
|
<FunctionLevelLinking>true</FunctionLevelLinking>
|
||||||
|
<IntrinsicFunctions>true</IntrinsicFunctions>
|
||||||
|
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
|
||||||
|
<AdditionalIncludeDirectories>$(TargetDir)</AdditionalIncludeDirectories>
|
||||||
|
<FloatingPointModel>Fast</FloatingPointModel>
|
||||||
|
</ClCompile>
|
||||||
|
<Link>
|
||||||
|
<SubSystem>Console</SubSystem>
|
||||||
|
<GenerateDebugInformation>true</GenerateDebugInformation>
|
||||||
|
<EnableCOMDATFolding>true</EnableCOMDATFolding>
|
||||||
|
<OptimizeReferences>true</OptimizeReferences>
|
||||||
|
</Link>
|
||||||
|
</ItemDefinitionGroup>
|
||||||
|
<ItemGroup>
|
||||||
|
<ClCompile Include="mandelbrot_tasks.cpp" />
|
||||||
|
<ClCompile Include="mandelbrot_tasks_serial.cpp" />
|
||||||
|
<ClCompile Include="../tasksys.cpp" />
|
||||||
|
</ItemGroup>
|
||||||
|
<ItemGroup>
|
||||||
|
<CustomBuild Include="mandelbrot_tasks.ispc">
|
||||||
|
<FileType>Document</FileType>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --arch=x86 --target=sse2,sse4-x2,avx-x2
|
||||||
|
</Command>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --target=sse2,sse4-x2,avx-x2
|
||||||
|
</Command>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|x64'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --arch=x86 --target=sse2,sse4-x2,avx-x2
|
||||||
|
</Command>
|
||||||
|
<Command Condition="'$(Configuration)|$(Platform)'=='Release|x64'">ispc -O2 %(Filename).ispc -o $(TargetDir)%(Filename).obj -h $(TargetDir)%(Filename)_ispc.h --target=sse2,sse4-x2,avx-x2
|
||||||
|
</Command>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
<Outputs Condition="'$(Configuration)|$(Platform)'=='Release|x64'">$(TargetDir)%(Filename).obj;$(TargetDir)%(Filename)_sse2.obj;$(TargetDir)%(Filename)_sse4.obj;$(TargetDir)%(Filename)_avx.obj;$(TargetDir)%(Filename)_ispc.h</Outputs>
|
||||||
|
</CustomBuild>
|
||||||
|
</ItemGroup>
|
||||||
|
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
|
||||||
|
<ImportGroup Label="ExtensionTargets">
|
||||||
|
</ImportGroup>
|
||||||
|
</Project>
|
||||||
68
examples_cuda/mandelbrot_tasks/mandelbrot_tasks_serial.cpp
Normal file
68
examples_cuda/mandelbrot_tasks/mandelbrot_tasks_serial.cpp
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
/*
|
||||||
|
Copyright (c) 2010-2011, Intel Corporation
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are
|
||||||
|
met:
|
||||||
|
|
||||||
|
* Redistributions of source code must retain the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer.
|
||||||
|
|
||||||
|
* Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
* Neither the name of Intel Corporation nor the names of its
|
||||||
|
contributors may be used to endorse or promote products derived from
|
||||||
|
this software without specific prior written permission.
|
||||||
|
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
|
||||||
|
IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
||||||
|
TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
|
||||||
|
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER
|
||||||
|
OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
|
||||||
|
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
|
||||||
|
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
|
||||||
|
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
|
||||||
|
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
|
||||||
|
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
|
||||||
|
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||||
|
*/
|
||||||
|
|
||||||
|
|
||||||
|
static int mandel(float c_re, float c_im, int count) {
|
||||||
|
float z_re = c_re, z_im = c_im;
|
||||||
|
int i;
|
||||||
|
for (i = 0; i < count; ++i) {
|
||||||
|
if (z_re * z_re + z_im * z_im > 4.f)
|
||||||
|
break;
|
||||||
|
|
||||||
|
float new_re = z_re*z_re - z_im*z_im;
|
||||||
|
float new_im = 2.f * z_re * z_im;
|
||||||
|
z_re = c_re + new_re;
|
||||||
|
z_im = c_im + new_im;
|
||||||
|
}
|
||||||
|
|
||||||
|
return i;
|
||||||
|
}
|
||||||
|
|
||||||
|
void mandelbrot_serial(float x0, float y0, float x1, float y1,
|
||||||
|
int width, int height, int maxIterations,
|
||||||
|
int output[])
|
||||||
|
{
|
||||||
|
float dx = (x1 - x0) / width;
|
||||||
|
float dy = (y1 - y0) / height;
|
||||||
|
|
||||||
|
for (int j = 0; j < height; j++) {
|
||||||
|
for (int i = 0; i < width; ++i) {
|
||||||
|
float x = x0 + i * dx;
|
||||||
|
float y = y0 + j * dy;
|
||||||
|
|
||||||
|
int index = (j * width + i);
|
||||||
|
output[index] = mandel(x, y, maxIterations);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
2
examples_cuda/mandelbrot_tasks3d/.gitignore
vendored
Normal file
2
examples_cuda/mandelbrot_tasks3d/.gitignore
vendored
Normal file
@@ -0,0 +1,2 @@
|
|||||||
|
mandelbrot
|
||||||
|
*.ppm
|
||||||
127
examples_cuda/mandelbrot_tasks3d/1.s
Normal file
127
examples_cuda/mandelbrot_tasks3d/1.s
Normal file
@@ -0,0 +1,127 @@
|
|||||||
|
|
||||||
|
code for sm_35
|
||||||
|
Function : mandelbrot_scanline
|
||||||
|
.headerflags @"EF_CUDA_SM35 EF_CUDA_PTX_SM(EF_CUDA_SM35)"
|
||||||
|
/* 0x08a0b010a0a01000 */
|
||||||
|
/*0008*/ MOV R1, c[0x0][0x44]; /* 0x64c03c00089c0006 */
|
||||||
|
/*0010*/ S2R R2, SR_CTAID.Y; /* 0x86400000131c000a */
|
||||||
|
/*0018*/ MOV R3, c[0x0][0x15c]; /* 0x64c03c002b9c000e */
|
||||||
|
/*0020*/ IMAD R3, R2, c[0x0][0x15c], R3; /* 0x51080c002b9c080e */
|
||||||
|
/*0028*/ ISETP.LT.AND P0, PT, R3, c[0x0][0x154], PT; /* 0x5b181c002a9c0c1e */
|
||||||
|
/*0030*/ IMUL R0, R2, c[0x0][0x15c]; /* 0x61c018002b9c0802 */
|
||||||
|
/*0038*/ SEL R3, R3, c[0x0][0x154], P0; /* 0x650000002a9c0c0e */
|
||||||
|
/* 0x089c8010a01000b0 */
|
||||||
|
/*0048*/ ISETP.GE.AND P0, PT, R0, R3, PT; /* 0xdb681c00019c001e */
|
||||||
|
/*0050*/ @P0 EXIT ; /* 0x180000000000003c */
|
||||||
|
/*0058*/ IADD R2, R2, 0x1; /* 0xc0800000009c0809 */
|
||||||
|
/*0060*/ MOV R3, c[0x0][0x158]; /* 0x64c03c002b1c000e */
|
||||||
|
/*0068*/ IMUL R5, R2, c[0x0][0x15c]; /* 0x61c018002b9c0816 */
|
||||||
|
/*0070*/ LOP.PASS_B R4, RZ, ~c[0x0][0x154]; /* 0x620038002a9ffc12 */
|
||||||
|
/*0078*/ S2R R2, SR_CTAID.X; /* 0x86400000129c000a */
|
||||||
|
/* 0x08ac80109c108010 */
|
||||||
|
/*0088*/ LOP.PASS_B R7, RZ, ~R5; /* 0xe2003800029ffc1e */
|
||||||
|
/*0090*/ LOP.PASS_B R6, RZ, ~c[0x0][0x154]; /* 0x620038002a9ffc1a */
|
||||||
|
/*0098*/ LOP.PASS_B R5, RZ, ~R5; /* 0xe2003800029ffc16 */
|
||||||
|
/*00a0*/ IMAD R3, R2, c[0x0][0x158], R3; /* 0x51080c002b1c080e */
|
||||||
|
/*00a8*/ ISETP.GT.AND P0, PT, R4, R7, PT; /* 0xdb481c00039c101e */
|
||||||
|
/*00b0*/ IMUL R2, R2, c[0x0][0x158]; /* 0x61c018002b1c080a */
|
||||||
|
/*00b8*/ ISETP.LT.AND P1, PT, R3, c[0x0][0x150], PT; /* 0x5b181c002a1c0c3e */
|
||||||
|
/* 0x0800b010008010a0 */
|
||||||
|
/*00c8*/ SEL R4, R5, R6, !P0; /* 0xe5002000031c1412 */
|
||||||
|
/*00d0*/ ISETP.LT.AND P0, PT, RZ, c[0x0][0x160], PT; /* 0x5b181c002c1ffc1e */
|
||||||
|
/*00d8*/ LOP.PASS_B R4, RZ, ~R4; /* 0xe2003800021ffc12 */
|
||||||
|
/*00e0*/ SEL R3, R3, c[0x0][0x150], P1; /* 0x650004002a1c0c0e */
|
||||||
|
/*00e8*/ ISETP.GE.AND P1, PT, R2, R3, PT; /* 0xdb681c00019c083e */
|
||||||
|
/*00f0*/ SSY 0x368; /* 0x1480000138000000 */
|
||||||
|
/*00f8*/ @P1 BRA 0x360; /* 0x120000013004003c */
|
||||||
|
/* 0x089c108010001080 */
|
||||||
|
/*0108*/ IMUL R5, R0, c[0x0][0x150]; /* 0x61c018002a1c0016 */
|
||||||
|
/*0110*/ MOV R8, R2; /* 0xe4c03c00011c0022 */
|
||||||
|
/*0118*/ @!P0 BRA 0x2d8; /* 0x12000000dc20003c */
|
||||||
|
/*0120*/ I2F.F32.S32 R6, R0; /* 0xe5c00000001ca81a */
|
||||||
|
/*0128*/ MOV R7, c[0x0][0x148]; /* 0x64c03c00291c001e */
|
||||||
|
/*0130*/ MOV R14, R2; /* 0xe4c03c00011c003a */
|
||||||
|
/*0138*/ MOV R16, c[0x0][0x140]; /* 0x64c03c00281c0042 */
|
||||||
|
/* 0x089c80a010a01000 */
|
||||||
|
/*0148*/ FFMA R6, R6, c[0x0][0x14c], R7; /* 0x4c001c00299c181a */
|
||||||
|
/*0150*/ S2R R10, SR_TID.X; /* 0x86400000109c002a */
|
||||||
|
/*0158*/ MOV R9, R6; /* 0xe4c03c00031c0026 */
|
||||||
|
/*0160*/ LOP.AND R7, R10, 0x1f; /* 0xc20000000f9c281d */
|
||||||
|
/*0168*/ PSETP.AND.AND P2, PT, PT, PT, PT; /* 0x84801c07001dc05e */
|
||||||
|
/*0170*/ IADD R12, R7, R14; /* 0xe0800000071c1c32 */
|
||||||
|
/*0178*/ PSETP.AND.AND P3, PT, P0, PT, PT; /* 0x84801c07001c007e */
|
||||||
|
/* 0x08a00010a010a010 */
|
||||||
|
/*0188*/ I2F.F32.S32 R7, R12; /* 0xe5c00000061ca81e */
|
||||||
|
/*0190*/ PSETP.AND.AND P1, PT, !PT, PT, PT; /* 0x84801c07001fc03e */
|
||||||
|
/*0198*/ FFMA R11, R7, c[0x0][0x144], R16; /* 0x4c004000289c1c2e */
|
||||||
|
/*01a0*/ SSY 0x260; /* 0x148000005c000000 */
|
||||||
|
/*01a8*/ MOV R7, RZ; /* 0xe4c03c007f9c001e */
|
||||||
|
/*01b0*/ MOV R8, R11; /* 0xe4c03c00059c0022 */
|
||||||
|
/*01b8*/ FMUL R15, R8, R8; /* 0xe3400000041c203e */
|
||||||
|
/* 0x08b0b0ac80b0a010 */
|
||||||
|
/*01c8*/ PSETP.AND.AND P3, PT, P2, P3, PT; /* 0x84801c03001c807e */
|
||||||
|
/*01d0*/ FFMA R13, R9, R9, R15; /* 0xcc003c00049c2436 */
|
||||||
|
/*01d8*/ FSETP.GTU.AND P2, PT, R13, 4, PT; /* 0xb5e01e04001c345d */
|
||||||
|
/*01e0*/ PSETP.AND.OR P1, PT, P3, P2, P1; /* 0x84810402001cc03e */
|
||||||
|
/*01e8*/ PSETP.AND.AND P2, PT, !PT, PT, PT; /* 0x84801c07001fc05e */
|
||||||
|
/*01f0*/ PSETP.XOR.AND P5, PT, P1, P3, PT; /* 0x84801c03101c40be */
|
||||||
|
/*01f8*/ @P5 PSETP.AND.AND P2, PT, P3, !P1, PT; /* 0x84801c090014c05e */
|
||||||
|
/* 0x08ac8010b09c1080 */
|
||||||
|
/*0208*/ @P2 IADD R7, R7, 0x1; /* 0xc080000000881c1d */
|
||||||
|
/*0210*/ @P5 FFMA R13, -R9, R9, R15; /* 0xcc083c0004942436 */
|
||||||
|
/*0218*/ @P5 FADD R15, R8, R8; /* 0xe2c000000414203e */
|
||||||
|
/*0220*/ ISETP.LT.AND P3, PT, R7, c[0x0][0x160], PT; /* 0x5b181c002c1c1c7e */
|
||||||
|
/*0228*/ @P5 FADD R13, R11, R13; /* 0xe2c0000006942c36 */
|
||||||
|
/*0230*/ PSETP.AND.AND P4, PT, P2, P3, PT; /* 0x84801c03001c809e */
|
||||||
|
/*0238*/ @P5 FFMA R9, R9, R15, R6; /* 0xcc00180007942426 */
|
||||||
|
/* 0x08a0a0100000b810 */
|
||||||
|
/*0248*/ @P5 MOV R8, R13; /* 0xe4c03c0006940022 */
|
||||||
|
/*0250*/ @P4 BRA 0x1b8; /* 0x12007fffb010003c */
|
||||||
|
/*0258*/ ISETP.GE.AND.S P1, PT, R12, R3, PT; /* 0xdb681c0001dc303e */
|
||||||
|
/*0260*/ @P1 BRA.U 0x2b0; /* 0x120000002404023c */
|
||||||
|
/*0268*/ @!P1 LOP32I.AND R9, R10, 0x4000001f; /* 0x202000000fa42824 */
|
||||||
|
/*0270*/ @!P1 IADD R8, R14, R5; /* 0xe080000002a43822 */
|
||||||
|
/*0278*/ @!P1 IADD R8, R8, R9; /* 0xe080000004a42022 */
|
||||||
|
/* 0x08b0a000a0b010a0 */
|
||||||
|
/*0288*/ @!P1 SHF.L R8, RZ, 0x2, R8; /* 0xb7c020000127fc21 */
|
||||||
|
/*0290*/ @!P1 BFE R9, R8, 0x11f; /* 0xc00800008fa42025 */
|
||||||
|
/*0298*/ @!P1 IADD R8.CC, R8, c[0x0][0x168]; /* 0x608400002d242022 */
|
||||||
|
/*02a0*/ @!P1 IADD.X R9, R9, c[0x0][0x16c]; /* 0x608040002da42426 */
|
||||||
|
/*02a8*/ @!P1 ST.E [R8], R7; /* 0xe48000000024201c */
|
||||||
|
/*02b0*/ IADD R14, R14, 0x20; /* 0xc0800000101c3839 */
|
||||||
|
/*02b8*/ ISETP.LT.AND P1, PT, R14, R3, PT; /* 0xdb181c00019c383e */
|
||||||
|
/* 0x0880b0a0a0a0b8b8 */
|
||||||
|
/*02c8*/ @P1 BRA 0x150; /* 0x12007fff4004003c */
|
||||||
|
/*02d0*/ BRA 0x360; /* 0x12000000441c003c */
|
||||||
|
/*02d8*/ S2R R7, SR_TID.X; /* 0x86400000109c001e */
|
||||||
|
/*02e0*/ LOP.AND R6, R7, 0x1f; /* 0xc20000000f9c1c19 */
|
||||||
|
/*02e8*/ IADD R6, R6, R8; /* 0xe0800000041c181a */
|
||||||
|
/*02f0*/ ISETP.LT.AND P1, PT, R6, R3, PT; /* 0xdb181c00019c183e */
|
||||||
|
/*02f8*/ @P1 LOP32I.AND R7, R7, 0x4000001f; /* 0x202000000f841c1c */
|
||||||
|
/* 0x08a0b010a0a0a010 */
|
||||||
|
/*0308*/ @P1 IADD R6, R8, R5; /* 0xe08000000284201a */
|
||||||
|
/*0310*/ IADD R8, R8, 0x20; /* 0xc0800000101c2021 */
|
||||||
|
/*0318*/ @P1 IADD R6, R6, R7; /* 0xe08000000384181a */
|
||||||
|
/*0320*/ @P1 SHF.L R6, RZ, 0x2, R6; /* 0xb7c018000107fc19 */
|
||||||
|
/*0328*/ @P1 BFE R7, R6, 0x11f; /* 0xc00800008f84181d */
|
||||||
|
/*0330*/ @P1 IADD R6.CC, R6, c[0x0][0x168]; /* 0x608400002d04181a */
|
||||||
|
/*0338*/ @P1 IADD.X R7, R7, c[0x0][0x16c]; /* 0x608040002d841c1e */
|
||||||
|
/* 0x0880b8b000b8b0c8 */
|
||||||
|
/*0348*/ @P1 ST.E [R6], RZ; /* 0xe480000000041bfc */
|
||||||
|
/*0350*/ ISETP.LT.AND P1, PT, R8, R3, PT; /* 0xdb181c00019c203e */
|
||||||
|
/*0358*/ @P1 BRA 0x2d8; /* 0x12007fffbc04003c */
|
||||||
|
/*0360*/ IADD.S R0, R0, 0x1; /* 0xc080000000dc0001 */
|
||||||
|
/*0368*/ ISETP.EQ.AND P1, PT, R0, R4, PT; /* 0xdb281c00021c003e */
|
||||||
|
/*0370*/ @!P1 BRA 0xe8; /* 0x12007ffeb824003c */
|
||||||
|
/*0378*/ MOV RZ, RZ; /* 0xe4c03c007f9c03fe */
|
||||||
|
/* 0x08000000000000b8 */
|
||||||
|
/*0388*/ EXIT ; /* 0x18000000001c003c */
|
||||||
|
/*0390*/ BRA 0x390; /* 0x12007ffffc1c003c */
|
||||||
|
/*0398*/ NOP; /* 0x85800000001c3c02 */
|
||||||
|
/*03a0*/ NOP; /* 0x85800000001c3c02 */
|
||||||
|
/*03a8*/ NOP; /* 0x85800000001c3c02 */
|
||||||
|
/*03b0*/ NOP; /* 0x85800000001c3c02 */
|
||||||
|
/*03b8*/ NOP; /* 0x85800000001c3c02 */
|
||||||
|
....................................
|
||||||
|
|
||||||
|
|
||||||
127
examples_cuda/mandelbrot_tasks3d/1a.s
Normal file
127
examples_cuda/mandelbrot_tasks3d/1a.s
Normal file
@@ -0,0 +1,127 @@
|
|||||||
|
|
||||||
|
code for sm_35
|
||||||
|
Function : mandelbrot_scanline
|
||||||
|
.headerflags @"EF_CUDA_SM35 EF_CUDA_PTX_SM(EF_CUDA_SM35)"
|
||||||
|
/* 0x08a0b010a0a01000 */
|
||||||
|
/*0008*/ MOV R1, c[0x0][0x44]; /* 0x64c03c00089c0006 */
|
||||||
|
/*0010*/ S2R R2, SR_CTAID.Y; /* 0x86400000131c000a */
|
||||||
|
/*0018*/ MOV R3, c[0x0][0x15c]; /* 0x64c03c002b9c000e */
|
||||||
|
/*0020*/ IMAD R3, R2, c[0x0][0x15c], R3; /* 0x51080c002b9c080e */
|
||||||
|
/*0028*/ ISETP.LT.AND P0, PT, R3, c[0x0][0x154], PT; /* 0x5b181c002a9c0c1e */
|
||||||
|
/*0030*/ IMUL R0, R2, c[0x0][0x15c]; /* 0x61c018002b9c0802 */
|
||||||
|
/*0038*/ SEL R3, R3, c[0x0][0x154], P0; /* 0x650000002a9c0c0e */
|
||||||
|
/* 0x089c8010a01000b0 */
|
||||||
|
/*0048*/ ISETP.GE.AND P0, PT, R0, R3, PT; /* 0xdb681c00019c001e */
|
||||||
|
/*0050*/ @P0 EXIT ; /* 0x180000000000003c */
|
||||||
|
/*0058*/ IADD R2, R2, 0x1; /* 0xc0800000009c0809 */
|
||||||
|
/*0060*/ MOV R3, c[0x0][0x158]; /* 0x64c03c002b1c000e */
|
||||||
|
/*0068*/ IMUL R5, R2, c[0x0][0x15c]; /* 0x61c018002b9c0816 */
|
||||||
|
/*0070*/ LOP.PASS_B R4, RZ, ~c[0x0][0x154]; /* 0x620038002a9ffc12 */
|
||||||
|
/*0078*/ S2R R2, SR_CTAID.X; /* 0x86400000129c000a */
|
||||||
|
/* 0x08ac80109c108010 */
|
||||||
|
/*0088*/ LOP.PASS_B R7, RZ, ~R5; /* 0xe2003800029ffc1e */
|
||||||
|
/*0090*/ LOP.PASS_B R6, RZ, ~c[0x0][0x154]; /* 0x620038002a9ffc1a */
|
||||||
|
/*0098*/ LOP.PASS_B R5, RZ, ~R5; /* 0xe2003800029ffc16 */
|
||||||
|
/*00a0*/ IMAD R3, R2, c[0x0][0x158], R3; /* 0x51080c002b1c080e */
|
||||||
|
/*00a8*/ ISETP.GT.AND P0, PT, R4, R7, PT; /* 0xdb481c00039c101e */
|
||||||
|
/*00b0*/ IMUL R2, R2, c[0x0][0x158]; /* 0x61c018002b1c080a */
|
||||||
|
/*00b8*/ ISETP.LT.AND P1, PT, R3, c[0x0][0x150], PT; /* 0x5b181c002a1c0c3e */
|
||||||
|
/* 0x0800b010008010a0 */
|
||||||
|
/*00c8*/ SEL R4, R5, R6, !P0; /* 0xe5002000031c1412 */
|
||||||
|
/*00d0*/ ISETP.LT.AND P0, PT, RZ, c[0x0][0x160], PT; /* 0x5b181c002c1ffc1e */
|
||||||
|
/*00d8*/ LOP.PASS_B R4, RZ, ~R4; /* 0xe2003800021ffc12 */
|
||||||
|
/*00e0*/ SEL R3, R3, c[0x0][0x150], P1; /* 0x650004002a1c0c0e */
|
||||||
|
/*00e8*/ ISETP.GE.AND P1, PT, R2, R3, PT; /* 0xdb681c00019c083e */
|
||||||
|
/*00f0*/ SSY 0x368; /* 0x1480000138000000 */
|
||||||
|
/*00f8*/ @P1 BRA 0x360; /* 0x120000013004003c */
|
||||||
|
/* 0x089c108010001080 */
|
||||||
|
/*0108*/ IMUL R5, R0, c[0x0][0x150]; /* 0x61c018002a1c0016 */
|
||||||
|
/*0110*/ MOV R8, R2; /* 0xe4c03c00011c0022 */
|
||||||
|
/*0118*/ @!P0 BRA 0x2d8; /* 0x12000000dc20003c */
|
||||||
|
/*0120*/ I2F.F32.S32 R6, R0; /* 0xe5c00000001ca81a */
|
||||||
|
/*0128*/ MOV R7, c[0x0][0x148]; /* 0x64c03c00291c001e */
|
||||||
|
/*0130*/ MOV R14, R2; /* 0xe4c03c00011c003a */
|
||||||
|
/*0138*/ MOV R16, c[0x0][0x140]; /* 0x64c03c00281c0042 */
|
||||||
|
/* 0x089c80a010a01000 */
|
||||||
|
/*0148*/ FFMA R6, R6, c[0x0][0x14c], R7; /* 0x4c001c00299c181a */
|
||||||
|
/*0150*/ S2R R10, SR_TID.X; /* 0x86400000109c002a */
|
||||||
|
/*0158*/ MOV R9, R6; /* 0xe4c03c00031c0026 */
|
||||||
|
/*0160*/ LOP.AND R7, R10, 0x1f; /* 0xc20000000f9c281d */
|
||||||
|
/*0168*/ PSETP.AND.AND P2, PT, PT, PT, PT; /* 0x84801c07001dc05e */
|
||||||
|
/*0170*/ IADD R12, R7, R14; /* 0xe0800000071c1c32 */
|
||||||
|
/*0178*/ PSETP.AND.AND P3, PT, P0, PT, PT; /* 0x84801c07001c007e */
|
||||||
|
/* 0x08a00010a010a010 */
|
||||||
|
/*0188*/ I2F.F32.S32 R7, R12; /* 0xe5c00000061ca81e */
|
||||||
|
/*0190*/ PSETP.AND.AND P1, PT, !PT, PT, PT; /* 0x84801c07001fc03e */
|
||||||
|
/*0198*/ FFMA R11, R7, c[0x0][0x144], R16; /* 0x4c004000289c1c2e */
|
||||||
|
/*01a0*/ SSY 0x260; /* 0x148000005c000000 */
|
||||||
|
/*01a8*/ MOV R7, RZ; /* 0xe4c03c007f9c001e */
|
||||||
|
/*01b0*/ MOV R8, R11; /* 0xe4c03c00059c0022 */
|
||||||
|
/*01b8*/ FMUL R15, R8, R8; /* 0xe3400000041c203e */
|
||||||
|
/* 0x08b0b0ac80b0a010 */
|
||||||
|
/*01c8*/ PSETP.AND.AND P3, PT, P2, P3, PT; /* 0x84801c03001c807e */
|
||||||
|
/*01d0*/ FFMA R13, R9, R9, R15; /* 0xcc003c00049c2436 */
|
||||||
|
/*01d8*/ FSETP.GTU.AND P2, PT, R13, 4, PT; /* 0xb5e01e04001c345d */
|
||||||
|
/*01e0*/ PSETP.AND.OR P1, PT, P3, P2, P1; /* 0x84810402001cc03e */
|
||||||
|
/*01e8*/ PSETP.AND.AND P2, PT, !PT, PT, PT; /* 0x84801c07001fc05e */
|
||||||
|
/*01f0*/ PSETP.XOR.AND P5, PT, P1, P3, PT; /* 0x84801c03101c40be */
|
||||||
|
/*01f8*/ @P5 PSETP.AND.AND P2, PT, P3, !P1, PT; /* 0x84801c090014c05e */
|
||||||
|
/* 0x08ac8010b09c1080 */
|
||||||
|
/*0208*/ @P2 IADD R7, R7, 0x1; /* 0xc080000000881c1d */
|
||||||
|
/*0210*/ @P5 FFMA R13, -R9, R9, R15; /* 0xcc083c0004942436 */
|
||||||
|
/*0218*/ @P5 FADD R15, R8, R8; /* 0xe2c000000414203e */
|
||||||
|
/*0220*/ ISETP.LT.AND P3, PT, R7, c[0x0][0x160], PT; /* 0x5b181c002c1c1c7e */
|
||||||
|
/*0228*/ @P5 FADD R13, R11, R13; /* 0xe2c0000006942c36 */
|
||||||
|
/*0230*/ PSETP.AND.AND P4, PT, P2, P3, PT; /* 0x84801c03001c809e */
|
||||||
|
/*0238*/ @P5 FFMA R9, R9, R15, R6; /* 0xcc00180007942426 */
|
||||||
|
/* 0x08a0a0100000b810 */
|
||||||
|
/*0248*/ @P5 MOV R8, R13; /* 0xe4c03c0006940022 */
|
||||||
|
/*0250*/ @P4 BRA 0x1b8; /* 0x12007fffb010003c */
|
||||||
|
/*0258*/ ISETP.GE.AND.S P1, PT, R12, R3, PT; /* 0xdb681c0001dc303e */
|
||||||
|
/*0260*/ @P1 BRA.U 0x2b0; /* 0x120000002404023c */
|
||||||
|
/*0268*/ @!P1 LOP32I.AND R9, R10, 0x4000001f; /* 0x202000000fa42824 */
|
||||||
|
/*0270*/ @!P1 IADD R8, R14, R5; /* 0xe080000002a43822 */
|
||||||
|
/*0278*/ @!P1 IADD R8, R8, R9; /* 0xe080000004a42022 */
|
||||||
|
/* 0x08b0a000a0b010a0 */
|
||||||
|
/*0288*/ @!P1 SHF.L R8, RZ, 0x2, R8; /* 0xb7c020000127fc21 */
|
||||||
|
/*0290*/ @!P1 BFE R9, R8, 0x11f; /* 0xc00800008fa42025 */
|
||||||
|
/*0298*/ @!P1 IADD R8.CC, R8, c[0x0][0x168]; /* 0x608400002d242022 */
|
||||||
|
/*02a0*/ @!P1 IADD.X R9, R9, c[0x0][0x16c]; /* 0x608040002da42426 */
|
||||||
|
/*02a8*/ @!P1 ST.E [R8], R7; /* 0xe48000000024201c */
|
||||||
|
/*02b0*/ IADD R14, R14, 0x20; /* 0xc0800000101c3839 */
|
||||||
|
/*02b8*/ ISETP.LT.AND P1, PT, R14, R3, PT; /* 0xdb181c00019c383e */
|
||||||
|
/* 0x0880b0a0a0a0b8b8 */
|
||||||
|
/*02c8*/ @P1 BRA 0x150; /* 0x12007fff4004003c */
|
||||||
|
/*02d0*/ BRA 0x360; /* 0x12000000441c003c */
|
||||||
|
/*02d8*/ S2R R7, SR_TID.X; /* 0x86400000109c001e */
|
||||||
|
/*02e0*/ LOP.AND R6, R7, 0x1f; /* 0xc20000000f9c1c19 */
|
||||||
|
/*02e8*/ IADD R6, R6, R8; /* 0xe0800000041c181a */
|
||||||
|
/*02f0*/ ISETP.LT.AND P1, PT, R6, R3, PT; /* 0xdb181c00019c183e */
|
||||||
|
/*02f8*/ @P1 LOP32I.AND R7, R7, 0x4000001f; /* 0x202000000f841c1c */
|
||||||
|
/* 0x08a0b010a0a0a010 */
|
||||||
|
/*0308*/ @P1 IADD R6, R8, R5; /* 0xe08000000284201a */
|
||||||
|
/*0310*/ IADD R8, R8, 0x20; /* 0xc0800000101c2021 */
|
||||||
|
/*0318*/ @P1 IADD R6, R6, R7; /* 0xe08000000384181a */
|
||||||
|
/*0320*/ @P1 SHF.L R6, RZ, 0x2, R6; /* 0xb7c018000107fc19 */
|
||||||
|
/*0328*/ @P1 BFE R7, R6, 0x11f; /* 0xc00800008f84181d */
|
||||||
|
/*0330*/ @P1 IADD R6.CC, R6, c[0x0][0x168]; /* 0x608400002d04181a */
|
||||||
|
/*0338*/ @P1 IADD.X R7, R7, c[0x0][0x16c]; /* 0x608040002d841c1e */
|
||||||
|
/* 0x0880b8b000b8b0c8 */
|
||||||
|
/*0348*/ @P1 ST.E [R6], RZ; /* 0xe480000000041bfc */
|
||||||
|
/*0350*/ ISETP.LT.AND P1, PT, R8, R3, PT; /* 0xdb181c00019c203e */
|
||||||
|
/*0358*/ @P1 BRA 0x2d8; /* 0x12007fffbc04003c */
|
||||||
|
/*0360*/ IADD.S R0, R0, 0x1; /* 0xc080000000dc0001 */
|
||||||
|
/*0368*/ ISETP.EQ.AND P1, PT, R0, R4, PT; /* 0xdb281c00021c003e */
|
||||||
|
/*0370*/ @!P1 BRA 0xe8; /* 0x12007ffeb824003c */
|
||||||
|
/*0378*/ MOV RZ, RZ; /* 0xe4c03c007f9c03fe */
|
||||||
|
/* 0x08000000000000b8 */
|
||||||
|
/*0388*/ EXIT ; /* 0x18000000001c003c */
|
||||||
|
/*0390*/ BRA 0x390; /* 0x12007ffffc1c003c */
|
||||||
|
/*0398*/ NOP; /* 0x85800000001c3c02 */
|
||||||
|
/*03a0*/ NOP; /* 0x85800000001c3c02 */
|
||||||
|
/*03a8*/ NOP; /* 0x85800000001c3c02 */
|
||||||
|
/*03b0*/ NOP; /* 0x85800000001c3c02 */
|
||||||
|
/*03b8*/ NOP; /* 0x85800000001c3c02 */
|
||||||
|
....................................
|
||||||
|
|
||||||
|
|
||||||
79
examples_cuda/mandelbrot_tasks3d/2.s
Normal file
79
examples_cuda/mandelbrot_tasks3d/2.s
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
|
||||||
|
code for sm_35
|
||||||
|
Function : _Z19mandelbrot_scanlineffffiiiiiPi
|
||||||
|
.headerflags @"EF_CUDA_SM35 EF_CUDA_PTX_SM(EF_CUDA_SM35)"
|
||||||
|
/* 0x0880a010a0a01000 */
|
||||||
|
/*0008*/ MOV R1, c[0x0][0x44]; /* 0x64c03c00089c0006 */
|
||||||
|
/*0010*/ S2R R0, SR_CTAID.Y; /* 0x86400000131c0002 */
|
||||||
|
/*0018*/ MOV R4, c[0x0][0x158]; /* 0x64c03c002b1c0012 */
|
||||||
|
/*0020*/ IMUL R2, R0, c[0x0][0x15c]; /* 0x61c018002b9c000a */
|
||||||
|
/*0028*/ IADD R0, R2, c[0x0][0x15c]; /* 0x608000002b9c0802 */
|
||||||
|
/*0030*/ S2R R9, SR_CTAID.X; /* 0x86400000129c0026 */
|
||||||
|
/*0038*/ IMNMX R11, R0, c[0x0][0x154], PT; /* 0x61081c002a9c002e */
|
||||||
|
/* 0x08b0a0100010b09c */
|
||||||
|
/*0048*/ IMAD R0, R9, c[0x0][0x158], R4; /* 0x510810002b1c2402 */
|
||||||
|
/*0050*/ ISETP.GE.AND P0, PT, R2, R11, PT; /* 0xdb681c00059c081e */
|
||||||
|
/*0058*/ IMNMX R0, R0, c[0x0][0x150], PT; /* 0x61081c002a1c0002 */
|
||||||
|
/*0060*/ @P0 EXIT ; /* 0x180000000000003c */
|
||||||
|
/*0068*/ IMUL R3, R9, c[0x0][0x158]; /* 0x61c018002b1c240e */
|
||||||
|
/*0070*/ SSY 0x1f8; /* 0x14800000c0000000 */
|
||||||
|
/*0078*/ ISETP.GE.AND P0, PT, R3, R0, PT; /* 0xdb681c00001c0c1e */
|
||||||
|
/* 0x08a0100010a01000 */
|
||||||
|
/*0088*/ @P0 BRA 0x1f0; /* 0x12000000b000003c */
|
||||||
|
/*0090*/ I2F.F32.S32 R4, R2; /* 0xe5c00000011ca812 */
|
||||||
|
/*0098*/ MOV R5, c[0x0][0x148]; /* 0x64c03c00291c0016 */
|
||||||
|
/*00a0*/ MOV R16, c[0x0][0x140]; /* 0x64c03c00281c0042 */
|
||||||
|
/*00a8*/ FFMA R4, R4, c[0x0][0x14c], R5; /* 0x4c001400299c1012 */
|
||||||
|
/*00b0*/ S2R R5, SR_TID.X; /* 0x86400000109c0016 */
|
||||||
|
/*00b8*/ MOV R6, RZ; /* 0xe4c03c007f9c001a */
|
||||||
|
/* 0x08800010a0a0a010 */
|
||||||
|
/*00c8*/ LOP.AND R10, R5, 0x1f; /* 0xc20000000f9c1429 */
|
||||||
|
/*00d0*/ ISETP.LT.AND P0, PT, RZ, c[0x0][0x160], PT; /* 0x5b181c002c1ffc1e */
|
||||||
|
/*00d8*/ IADD R12, R10, R3; /* 0xe0800000019c2832 */
|
||||||
|
/*00e0*/ I2F.F32.U32 R5, R12; /* 0xe5c00000061c2816 */
|
||||||
|
/*00e8*/ FFMA R5, R5, c[0x0][0x144], R16; /* 0x4c004000289c1416 */
|
||||||
|
/*00f0*/ @!P0 BRA 0x190; /* 0x120000004c20003c */
|
||||||
|
/*00f8*/ MOV R7, R4; /* 0xe4c03c00021c001e */
|
||||||
|
/* 0x0800b0a0a0100010 */
|
||||||
|
/*0108*/ MOV R8, R5; /* 0xe4c03c00029c0022 */
|
||||||
|
/*0110*/ PBK 0x190; /* 0x150000003c000000 */
|
||||||
|
/*0118*/ FMUL R13, R7, R7; /* 0xe3400000039c1c36 */
|
||||||
|
/*0120*/ FMUL R14, R8, R8; /* 0xe3400000041c203a */
|
||||||
|
/*0128*/ FADD R15, R14, R13; /* 0xe2c00000069c383e */
|
||||||
|
/*0130*/ FSETP.GT.AND P0, PT, R15, 4, PT; /* 0xb5a01e04001c3c1d */
|
||||||
|
/*0138*/ @P0 BRK ; /* 0x1a0000000000003c */
|
||||||
|
/* 0x080010ac809c8010 */
|
||||||
|
/*0148*/ IADD R6, R6, 0x1; /* 0xc0800000009c1819 */
|
||||||
|
/*0150*/ FADD R8, R8, R8; /* 0xe2c00000041c2022 */
|
||||||
|
/*0158*/ FADD R14, R14, -R13; /* 0xe2c10000069c383a */
|
||||||
|
/*0160*/ ISETP.LT.AND P0, PT, R6, c[0x0][0x160], PT; /* 0x5b181c002c1c181e */
|
||||||
|
/*0168*/ FFMA R7, R8, R7, R4; /* 0xcc001000039c201e */
|
||||||
|
/*0170*/ FADD R8, R5, R14; /* 0xe2c00000071c1422 */
|
||||||
|
/*0178*/ @!P0 BRK ; /* 0x1a0000000020003c */
|
||||||
|
/* 0x08b0a00010ac80b8 */
|
||||||
|
/*0188*/ BRA 0x118; /* 0x12007fffc41c003c */
|
||||||
|
/*0190*/ ISETP.GE.U32.AND P0, PT, R12, R0, PT; /* 0xdb601c00001c301e */
|
||||||
|
/*0198*/ IMAD R5, R2, c[0x0][0x150], R3; /* 0x51080c002a1c0816 */
|
||||||
|
/*01a0*/ IADD R5, R5, R10; /* 0xe0800000051c1416 */
|
||||||
|
/*01a8*/ @P0 BRA.U 0x1d8; /* 0x120000001400023c */
|
||||||
|
/*01b0*/ @!P0 MOV32I R8, 0x4; /* 0x740000000223c022 */
|
||||||
|
/*01b8*/ @!P0 IMAD R12.CC, R5, R8, c[0x0][0x168]; /* 0x910c20002d201432 */
|
||||||
|
/* 0x08b000b8b0a000a0 */
|
||||||
|
/*01c8*/ @!P0 IMAD.HI.X R13, R5, R8, c[0x0][0x16c]; /* 0x931820002da01436 */
|
||||||
|
/*01d0*/ @!P0 ST.E [R12], R6; /* 0xe480000000203018 */
|
||||||
|
/*01d8*/ IADD R3, R3, 0x20; /* 0xc0800000101c0c0d */
|
||||||
|
/*01e0*/ ISETP.LT.AND P0, PT, R3, R0, PT; /* 0xdb181c00001c0c1e */
|
||||||
|
/*01e8*/ @P0 BRA 0xb0; /* 0x12007fff6000003c */
|
||||||
|
/*01f0*/ IADD.S R2, R2, 0x1; /* 0xc080000000dc0809 */
|
||||||
|
/*01f8*/ ISETP.LT.AND P0, PT, R2, R11, PT; /* 0xdb181c00059c081e */
|
||||||
|
/* 0x0800000000b810b8 */
|
||||||
|
/*0208*/ @P0 BRA 0x68; /* 0x12007fff2c00003c */
|
||||||
|
/*0210*/ MOV RZ, RZ; /* 0xe4c03c007f9c03fe */
|
||||||
|
/*0218*/ EXIT ; /* 0x18000000001c003c */
|
||||||
|
/*0220*/ BRA 0x220; /* 0x12007ffffc1c003c */
|
||||||
|
/*0228*/ NOP; /* 0x85800000001c3c02 */
|
||||||
|
/*0230*/ NOP; /* 0x85800000001c3c02 */
|
||||||
|
/*0238*/ NOP; /* 0x85800000001c3c02 */
|
||||||
|
...................................................
|
||||||
|
|
||||||
|
|
||||||
111
examples_cuda/mandelbrot_tasks3d/3.s
Normal file
111
examples_cuda/mandelbrot_tasks3d/3.s
Normal file
@@ -0,0 +1,111 @@
|
|||||||
|
|
||||||
|
code for sm_35
|
||||||
|
Function : mandelbrot_scanline
|
||||||
|
.headerflags @"EF_CUDA_SM35 EF_CUDA_PTX_SM(EF_CUDA_SM35)"
|
||||||
|
/* 0x0880a010a0a01000 */
|
||||||
|
/*0008*/ MOV R1, c[0x0][0x44]; /* 0x64c03c00089c0006 */
|
||||||
|
/*0010*/ S2R R4, SR_CTAID.Y; /* 0x86400000131c0012 */
|
||||||
|
/*0018*/ MOV R6, c[0x0][0x158]; /* 0x64c03c002b1c001a */
|
||||||
|
/*0020*/ IMUL R0, R4, c[0x0][0x15c]; /* 0x61c018002b9c1002 */
|
||||||
|
/*0028*/ IADD R3, R0, c[0x0][0x15c]; /* 0x608000002b9c000e */
|
||||||
|
/*0030*/ S2R R2, SR_CTAID.X; /* 0x86400000129c000a */
|
||||||
|
/*0038*/ IMNMX R5, R3, c[0x0][0x154], PT; /* 0x61081c002a9c0c16 */
|
||||||
|
/* 0x08a010a000b010a0 */
|
||||||
|
/*0048*/ IMAD R3, R2, c[0x0][0x158], R6; /* 0x510818002b1c080e */
|
||||||
|
/*0050*/ ISETP.GE.AND P0, PT, R0, R5, PT; /* 0xdb681c00029c001e */
|
||||||
|
/*0058*/ IMNMX R3, R3, c[0x0][0x150], PT; /* 0x61081c002a1c0c0e */
|
||||||
|
/*0060*/ @P0 EXIT ; /* 0x180000000000003c */
|
||||||
|
/*0068*/ IADD R4, R4, 0x1; /* 0xc0800000009c1011 */
|
||||||
|
/*0070*/ IMUL R5, R4, c[0x0][0x15c]; /* 0x61c018002b9c1016 */
|
||||||
|
/*0078*/ LOP.PASS_B R4, RZ, ~c[0x0][0x154]; /* 0x620038002a9ffc12 */
|
||||||
|
/* 0x0800b0a01000a0a0 */
|
||||||
|
/*0088*/ LOP.PASS_B R5, RZ, ~R5; /* 0xe2003800029ffc16 */
|
||||||
|
/*0090*/ IMNMX R4, R4, R5, !PT; /* 0xe1083c00029c1012 */
|
||||||
|
/*0098*/ LOP.PASS_B R4, RZ, ~R4; /* 0xe2003800021ffc12 */
|
||||||
|
/*00a0*/ IMUL R5, R2, c[0x0][0x158]; /* 0x61c018002b1c0816 */
|
||||||
|
/*00a8*/ SSY 0x318; /* 0x1480000134000000 */
|
||||||
|
/*00b0*/ ISETP.GE.AND P0, PT, R5, R3, PT; /* 0xdb681c00019c141e */
|
||||||
|
/*00b8*/ @P0 BRA 0x310; /* 0x120000012800003c */
|
||||||
|
/* 0x08a0a00010ac8010 */
|
||||||
|
/*00c8*/ ISETP.LT.AND P0, PT, RZ, c[0x0][0x160], PT; /* 0x5b181c002c1ffc1e */
|
||||||
|
/*00d0*/ I2F.F32.S32 R6, R0; /* 0xe5c00000001ca81a */
|
||||||
|
/*00d8*/ MOV R7, c[0x0][0x148]; /* 0x64c03c00291c001e */
|
||||||
|
/*00e0*/ FFMA R6, R6, c[0x0][0x14c], R7; /* 0x4c001c00299c181a */
|
||||||
|
/*00e8*/ @P0 BRA 0x180; /* 0x120000004800003c */
|
||||||
|
/*00f0*/ S2R R7, SR_TID.X; /* 0x86400000109c001e */
|
||||||
|
/*00f8*/ LOP.AND R6, R7, 0x1f; /* 0xc20000000f9c1c19 */
|
||||||
|
/* 0x08a010a0a080b0a0 */
|
||||||
|
/*0108*/ IADD R6, R6, R5; /* 0xe0800000029c181a */
|
||||||
|
/*0110*/ ISETP.GE.AND P0, PT, R6, R3, PT; /* 0xdb681c00019c181e */
|
||||||
|
/*0118*/ @!P0 LOP32I.AND R7, R7, 0x4000001f; /* 0x202000000fa01c1c */
|
||||||
|
/*0120*/ @!P0 IMAD R6, R0, c[0x0][0x150], R5; /* 0x510814002a20001a */
|
||||||
|
/*0128*/ @!P0 IADD R6, R6, R7; /* 0xe080000003a0181a */
|
||||||
|
/*0130*/ @!P0 SHF.L R6, RZ, 0x2, R6; /* 0xb7c018000123fc19 */
|
||||||
|
/*0138*/ IADD R5, R5, 0x20; /* 0xc0800000101c1415 */
|
||||||
|
/* 0x08b8b8b0c8a0b010 */
|
||||||
|
/*0148*/ @!P0 BFE R7, R6, 0x11f; /* 0xc00800008fa0181d */
|
||||||
|
/*0150*/ @!P0 IADD R6.CC, R6, c[0x0][0x168]; /* 0x608400002d20181a */
|
||||||
|
/*0158*/ @!P0 IADD.X R7, R7, c[0x0][0x16c]; /* 0x608040002da01c1e */
|
||||||
|
/*0160*/ @!P0 ST.E [R6], RZ; /* 0xe480000000201bfc */
|
||||||
|
/*0168*/ ISETP.LT.AND P0, PT, R5, R3, PT; /* 0xdb181c00019c141e */
|
||||||
|
/*0170*/ @P0 BRA 0xf0; /* 0x12007fffbc00003c */
|
||||||
|
/*0178*/ BRA 0x310; /* 0x12000000c81c003c */
|
||||||
|
/* 0x08a0a0a010a01000 */
|
||||||
|
/*0188*/ MOV R16, c[0x0][0x140]; /* 0x64c03c00281c0042 */
|
||||||
|
/*0190*/ S2R R10, SR_TID.X; /* 0x86400000109c002a */
|
||||||
|
/*0198*/ SSY 0x2a0; /* 0x1480000080000000 */
|
||||||
|
/*01a0*/ LOP.AND R8, R10, 0x1f; /* 0xc20000000f9c2821 */
|
||||||
|
/*01a8*/ PSETP.AND.AND P2, PT, PT, PT, PT; /* 0x84801c07001dc05e */
|
||||||
|
/*01b0*/ IADD R12, R8, R5; /* 0xe0800000029c2032 */
|
||||||
|
/*01b8*/ I2F.F32.S32 R7, R12; /* 0xe5c00000061ca81e */
|
||||||
|
/* 0x0880009880108010 */
|
||||||
|
/*01c8*/ PSETP.AND.AND P3, PT, P0, PT, PT; /* 0x84801c07001c007e */
|
||||||
|
/*01d0*/ FFMA R11, R7, c[0x0][0x144], R16; /* 0x4c004000289c1c2e */
|
||||||
|
/*01d8*/ PSETP.AND.AND P1, PT, !PT, PT, PT; /* 0x84801c07001fc03e */
|
||||||
|
/*01e0*/ MOV R7, RZ; /* 0xe4c03c007f9c001e */
|
||||||
|
/*01e8*/ MOV R8, R6; /* 0xe4c03c00031c0022 */
|
||||||
|
/*01f0*/ MOV R9, R11; /* 0xe4c03c00059c0026 */
|
||||||
|
/*01f8*/ FMUL R14, R9, R9; /* 0xe3400000049c243a */
|
||||||
|
/* 0x08b0ac80b0a0a010 */
|
||||||
|
/*0208*/ FMUL R15, R8, R8; /* 0xe3400000041c203e */
|
||||||
|
/*0210*/ PSETP.AND.AND P3, PT, P2, P3, PT; /* 0x84801c03001c807e */
|
||||||
|
/*0218*/ FADD R13, R15, R14; /* 0xe2c00000071c3c36 */
|
||||||
|
/*0220*/ FSETP.GTU.AND P2, PT, R13, 4, PT; /* 0xb5e01e04001c345d */
|
||||||
|
/*0228*/ PSETP.AND.OR P1, PT, P3, P2, P1; /* 0x84810402001cc03e */
|
||||||
|
/*0230*/ PSETP.AND.AND P2, PT, !PT, PT, PT; /* 0x84801c07001fc05e */
|
||||||
|
/*0238*/ PSETP.XOR.AND P5, PT, P1, P3, PT; /* 0x84801c03101c40be */
|
||||||
|
/* 0x08ac8010b0a010b0 */
|
||||||
|
/*0248*/ @P5 PSETP.AND.AND P2, PT, P3, !P1, PT; /* 0x84801c090014c05e */
|
||||||
|
/*0250*/ @P2 IADD R7, R7, 0x1; /* 0xc080000000881c1d */
|
||||||
|
/*0258*/ @P5 FADD R13, R9, R9; /* 0xe2c0000004942436 */
|
||||||
|
/*0260*/ ISETP.LT.AND P3, PT, R7, c[0x0][0x160], PT; /* 0x5b181c002c1c1c7e */
|
||||||
|
/*0268*/ @P5 FADD R14, R14, -R15; /* 0xe2c100000794383a */
|
||||||
|
/*0270*/ PSETP.AND.AND P4, PT, P2, P3, PT; /* 0x84801c03001c809e */
|
||||||
|
/*0278*/ @P5 FFMA R8, R8, R13, R6; /* 0xcc00180006942022 */
|
||||||
|
/* 0x08a0a0800000b810 */
|
||||||
|
/*0288*/ @P5 FADD R9, R11, R14; /* 0xe2c0000007142c26 */
|
||||||
|
/*0290*/ @P4 BRA 0x1f8; /* 0x12007fffb010003c */
|
||||||
|
/*0298*/ ISETP.GE.AND.S P1, PT, R12, R3, PT; /* 0xdb681c0001dc303e */
|
||||||
|
/*02a0*/ @P1 BRA.U 0x2f0; /* 0x120000002404023c */
|
||||||
|
/*02a8*/ @!P1 LOP32I.AND R9, R10, 0x4000001f; /* 0x202000000fa42824 */
|
||||||
|
/*02b0*/ @!P1 IMAD R8, R0, c[0x0][0x150], R5; /* 0x510814002a240022 */
|
||||||
|
/*02b8*/ @!P1 IADD R8, R8, R9; /* 0xe080000004a42022 */
|
||||||
|
/* 0x08b0a000a0b010a0 */
|
||||||
|
/*02c8*/ @!P1 SHF.L R8, RZ, 0x2, R8; /* 0xb7c020000127fc21 */
|
||||||
|
/*02d0*/ @!P1 BFE R9, R8, 0x11f; /* 0xc00800008fa42025 */
|
||||||
|
/*02d8*/ @!P1 IADD R8.CC, R8, c[0x0][0x168]; /* 0x608400002d242022 */
|
||||||
|
/*02e0*/ @!P1 IADD.X R9, R9, c[0x0][0x16c]; /* 0x608040002da42426 */
|
||||||
|
/*02e8*/ @!P1 ST.E [R8], R7; /* 0xe48000000024201c */
|
||||||
|
/*02f0*/ IADD R5, R5, 0x20; /* 0xc0800000101c1415 */
|
||||||
|
/*02f8*/ ISETP.LT.AND P1, PT, R5, R3, PT; /* 0xdb181c00019c143e */
|
||||||
|
/* 0x0800b810b8b000b8 */
|
||||||
|
/*0308*/ @P1 BRA 0x190; /* 0x12007fff4004003c */
|
||||||
|
/*0310*/ IADD.S R0, R0, 0x1; /* 0xc080000000dc0001 */
|
||||||
|
/*0318*/ ISETP.NE.AND P0, PT, R0, R4, PT; /* 0xdb581c00021c001e */
|
||||||
|
/*0320*/ @P0 BRA 0xa0; /* 0x12007ffebc00003c */
|
||||||
|
/*0328*/ MOV RZ, RZ; /* 0xe4c03c007f9c03fe */
|
||||||
|
/*0330*/ EXIT ; /* 0x18000000001c003c */
|
||||||
|
/*0338*/ BRA 0x338; /* 0x12007ffffc1c003c */
|
||||||
|
....................................
|
||||||
|
|
||||||
|
|
||||||
8
examples_cuda/mandelbrot_tasks3d/Makefile
Normal file
8
examples_cuda/mandelbrot_tasks3d/Makefile
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
|
||||||
|
EXAMPLE=mandelbrot_tasks3d
|
||||||
|
CPP_SRC=mandelbrot_tasks3d.cpp mandelbrot_tasks_serial.cpp
|
||||||
|
ISPC_SRC=mandelbrot_tasks3d.ispc
|
||||||
|
ISPC_IA_TARGETS=avx,sse2,sse4
|
||||||
|
ISPC_ARM_TARGETS=neon
|
||||||
|
|
||||||
|
include ../common.mk
|
||||||
186
examples_cuda/mandelbrot_tasks3d/crap.s
Normal file
186
examples_cuda/mandelbrot_tasks3d/crap.s
Normal file
@@ -0,0 +1,186 @@
|
|||||||
|
//
|
||||||
|
// Generated by LLVM NVPTX Back-End
|
||||||
|
//
|
||||||
|
|
||||||
|
.version 3.1
|
||||||
|
.target sm_35, texmode_independent
|
||||||
|
.address_size 64
|
||||||
|
|
||||||
|
// .globl mandelbrot_scanline
|
||||||
|
// @mandelbrot_scanline
|
||||||
|
.entry mandelbrot_scanline(
|
||||||
|
.param .f32 mandelbrot_scanline_param_0,
|
||||||
|
.param .f32 mandelbrot_scanline_param_1,
|
||||||
|
.param .f32 mandelbrot_scanline_param_2,
|
||||||
|
.param .f32 mandelbrot_scanline_param_3,
|
||||||
|
.param .u32 mandelbrot_scanline_param_4,
|
||||||
|
.param .u32 mandelbrot_scanline_param_5,
|
||||||
|
.param .u32 mandelbrot_scanline_param_6,
|
||||||
|
.param .u32 mandelbrot_scanline_param_7,
|
||||||
|
.param .u32 mandelbrot_scanline_param_8,
|
||||||
|
.param .u64 .ptr .align 4 mandelbrot_scanline_param_9
|
||||||
|
)
|
||||||
|
{
|
||||||
|
.reg .pred %p<396>;
|
||||||
|
.reg .s16 %rc<396>;
|
||||||
|
.reg .s16 %rs<396>;
|
||||||
|
.reg .s32 %r<396>;
|
||||||
|
.reg .s64 %rl<396>;
|
||||||
|
.reg .f32 %f<396>;
|
||||||
|
.reg .f64 %fl<396>;
|
||||||
|
|
||||||
|
// BB#0: // %allocas
|
||||||
|
ld.param.u32 %r6, [mandelbrot_scanline_param_5];
|
||||||
|
mov.u32 %r5, %ctaid.y;
|
||||||
|
ld.param.u32 %r7, [mandelbrot_scanline_param_7];
|
||||||
|
mul.lo.s32 %r0, %r5, %r7;
|
||||||
|
mad.lo.s32 %r1, %r5, %r7, %r7;
|
||||||
|
setp.lt.s32 %p0, %r1, %r6;
|
||||||
|
selp.b32 %r1, %r1, %r6, %p0;
|
||||||
|
setp.ge.s32 %p0, %r0, %r1;
|
||||||
|
@%p0 bra BB0_13;
|
||||||
|
// BB#1: // %for_test28.preheader.lr.ph
|
||||||
|
ld.param.f32 %f0, [mandelbrot_scanline_param_0];
|
||||||
|
mov.u32 %r2, %ctaid.x;
|
||||||
|
ld.param.u32 %r3, [mandelbrot_scanline_param_6];
|
||||||
|
mul.lo.s32 %r1, %r2, %r3;
|
||||||
|
ld.param.f32 %f1, [mandelbrot_scanline_param_1];
|
||||||
|
mad.lo.s32 %r3, %r2, %r3, %r3;
|
||||||
|
ld.param.f32 %f2, [mandelbrot_scanline_param_2];
|
||||||
|
ld.param.u32 %r2, [mandelbrot_scanline_param_4];
|
||||||
|
setp.lt.s32 %p0, %r3, %r2;
|
||||||
|
ld.param.f32 %f3, [mandelbrot_scanline_param_3];
|
||||||
|
selp.b32 %r3, %r3, %r2, %p0;
|
||||||
|
ld.param.u32 %r4, [mandelbrot_scanline_param_8];
|
||||||
|
ld.param.u64 %rl0, [mandelbrot_scanline_param_9];
|
||||||
|
setp.gt.s32 %p0, %r4, 0;
|
||||||
|
not.b32 %r6, %r6;
|
||||||
|
add.s32 %r5, %r5, 1;
|
||||||
|
mul.lo.s32 %r5, %r5, %r7;
|
||||||
|
not.b32 %r5, %r5;
|
||||||
|
setp.gt.s32 %p1, %r6, %r5;
|
||||||
|
selp.b32 %r5, %r6, %r5, %p1;
|
||||||
|
not.b32 %r5, %r5;
|
||||||
|
BB0_2: // %for_test28.preheader
|
||||||
|
// =>This Loop Header: Depth=1
|
||||||
|
// Child Loop BB0_15 Depth 2
|
||||||
|
// Child Loop BB0_8 Depth 2
|
||||||
|
// Child Loop BB0_11 Depth 3
|
||||||
|
setp.ge.s32 %p1, %r1, %r3;
|
||||||
|
@%p1 bra BB0_12;
|
||||||
|
// BB#3: // %for_loop30.lr.ph
|
||||||
|
// in Loop: Header=BB0_2 Depth=1
|
||||||
|
mul.lo.s32 %r6, %r0, %r2;
|
||||||
|
mov.u32 %r7, %r1;
|
||||||
|
@%p0 bra BB0_4;
|
||||||
|
bra.uni BB0_15;
|
||||||
|
BB0_4: // in Loop: Header=BB0_2 Depth=1
|
||||||
|
cvt.rn.f32.s32 %f4, %r0;
|
||||||
|
fma.rn.f32 %f4, %f4, %f3, %f2;
|
||||||
|
mov.u32 %r7, %r1;
|
||||||
|
BB0_8: // %for_loop.i.lr.ph.us
|
||||||
|
// Parent Loop BB0_2 Depth=1
|
||||||
|
// => This Loop Header: Depth=2
|
||||||
|
// Child Loop BB0_11 Depth 3
|
||||||
|
mov.u32 %r9, %tid.x;
|
||||||
|
mov.u32 %r8, WARP_SZ;
|
||||||
|
add.s32 %r10, %r8, -1;
|
||||||
|
and.b32 %r10, %r10, %r9;
|
||||||
|
add.s32 %r11, %r10, %r7;
|
||||||
|
cvt.rn.f32.s32 %f5, %r11;
|
||||||
|
fma.rn.f32 %f5, %f5, %f1, %f0;
|
||||||
|
mov.u32 %r10, 0;
|
||||||
|
mov.pred %p1, 0;
|
||||||
|
mov.pred %p3, -1;
|
||||||
|
mov.pred %p4, %p0;
|
||||||
|
mov.pred %p2, %p1;
|
||||||
|
mov.f32 %f7, %f5;
|
||||||
|
mov.f32 %f6, %f4;
|
||||||
|
BB0_11: // %for_loop.i.us
|
||||||
|
// Parent Loop BB0_2 Depth=1
|
||||||
|
// Parent Loop BB0_8 Depth=2
|
||||||
|
// => This Inner Loop Header: Depth=3
|
||||||
|
and.pred %p4, %p3, %p4;
|
||||||
|
mul.f32 %f8, %f7, %f7;
|
||||||
|
fma.rn.f32 %f9, %f6, %f6, %f8;
|
||||||
|
setp.gtu.f32 %p3, %f9, 0f40800000;
|
||||||
|
and.pred %p3, %p4, %p3;
|
||||||
|
or.pred %p2, %p3, %p2;
|
||||||
|
xor.pred %p5, %p2, %p4;
|
||||||
|
mov.pred %p3, %p1;
|
||||||
|
@!%p5 bra BB0_10;
|
||||||
|
bra.uni BB0_9;
|
||||||
|
BB0_9: // %not_all_continued_or_breaked.i.us
|
||||||
|
// in Loop: Header=BB0_11 Depth=3
|
||||||
|
mul.f32 %f9, %f6, %f6;
|
||||||
|
not.pred %p3, %p2;
|
||||||
|
and.pred %p3, %p4, %p3;
|
||||||
|
sub.f32 %f8, %f8, %f9;
|
||||||
|
add.f32 %f8, %f5, %f8;
|
||||||
|
add.f32 %f7, %f7, %f7;
|
||||||
|
fma.rn.f32 %f6, %f6, %f7, %f4;
|
||||||
|
mov.f32 %f7, %f8;
|
||||||
|
BB0_10: // %for_step.i.us
|
||||||
|
// in Loop: Header=BB0_11 Depth=3
|
||||||
|
add.s32 %r12, %r10, 1;
|
||||||
|
selp.b32 %r10, %r12, %r10, %p3;
|
||||||
|
setp.lt.s32 %p4, %r10, %r4;
|
||||||
|
and.pred %p5, %p3, %p4;
|
||||||
|
@%p5 bra BB0_11;
|
||||||
|
// BB#5: // %mandel___vyfvyfvyi.exit.us
|
||||||
|
// in Loop: Header=BB0_8 Depth=2
|
||||||
|
setp.ge.s32 %p1, %r11, %r3;
|
||||||
|
@%p1 bra BB0_7;
|
||||||
|
// BB#6: // %if_then.us
|
||||||
|
// in Loop: Header=BB0_8 Depth=2
|
||||||
|
add.s32 %r11, %r8, 1073741823;
|
||||||
|
and.b32 %r9, %r11, %r9;
|
||||||
|
add.s32 %r11, %r7, %r6;
|
||||||
|
add.s32 %r9, %r11, %r9;
|
||||||
|
shl.b32 %r9, %r9, 2;
|
||||||
|
cvt.s64.s32 %rl1, %r9;
|
||||||
|
add.s64 %rl1, %rl1, %rl0;
|
||||||
|
st.u32 [%rl1], %r10;
|
||||||
|
BB0_7: // %if_exit.us
|
||||||
|
// in Loop: Header=BB0_8 Depth=2
|
||||||
|
add.s32 %r7, %r8, %r7;
|
||||||
|
setp.lt.s32 %p1, %r7, %r3;
|
||||||
|
@%p1 bra BB0_8;
|
||||||
|
bra.uni BB0_12;
|
||||||
|
BB0_15: // %mandel___vyfvyfvyi.exit
|
||||||
|
// Parent Loop BB0_2 Depth=1
|
||||||
|
// => This Inner Loop Header: Depth=2
|
||||||
|
mov.u32 %r9, %tid.x;
|
||||||
|
mov.u32 %r8, WARP_SZ;
|
||||||
|
add.s32 %r10, %r8, -1;
|
||||||
|
and.b32 %r10, %r10, %r9;
|
||||||
|
add.s32 %r10, %r10, %r7;
|
||||||
|
setp.lt.s32 %p1, %r10, %r3;
|
||||||
|
@%p1 bra BB0_16;
|
||||||
|
bra.uni BB0_14;
|
||||||
|
BB0_16: // %if_then
|
||||||
|
// in Loop: Header=BB0_15 Depth=2
|
||||||
|
add.s32 %r10, %r8, 1073741823;
|
||||||
|
and.b32 %r9, %r10, %r9;
|
||||||
|
add.s32 %r10, %r7, %r6;
|
||||||
|
add.s32 %r9, %r10, %r9;
|
||||||
|
shl.b32 %r9, %r9, 2;
|
||||||
|
cvt.s64.s32 %rl1, %r9;
|
||||||
|
add.s64 %rl1, %rl1, %rl0;
|
||||||
|
mov.u32 %r9, 0;
|
||||||
|
st.u32 [%rl1], %r9;
|
||||||
|
BB0_14: // %if_exit
|
||||||
|
// in Loop: Header=BB0_15 Depth=2
|
||||||
|
add.s32 %r7, %r8, %r7;
|
||||||
|
setp.lt.s32 %p1, %r7, %r3;
|
||||||
|
@%p1 bra BB0_15;
|
||||||
|
BB0_12: // %for_exit31
|
||||||
|
// in Loop: Header=BB0_2 Depth=1
|
||||||
|
add.s32 %r0, %r0, 1;
|
||||||
|
setp.eq.s32 %p1, %r0, %r5;
|
||||||
|
@%p1 bra BB0_13;
|
||||||
|
bra.uni BB0_2;
|
||||||
|
BB0_13: // %for_exit
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
BIN
examples_cuda/mandelbrot_tasks3d/cuLaunch
Executable file
BIN
examples_cuda/mandelbrot_tasks3d/cuLaunch
Executable file
Binary file not shown.
321
examples_cuda/mandelbrot_tasks3d/cuLaunch.cpp
Normal file
321
examples_cuda/mandelbrot_tasks3d/cuLaunch.cpp
Normal file
@@ -0,0 +1,321 @@
|
|||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <iostream>
|
||||||
|
#include <algorithm>
|
||||||
|
#include <string.h>
|
||||||
|
#include <cuda.h>
|
||||||
|
#include <vector>
|
||||||
|
#include <cassert>
|
||||||
|
#include "drvapi_error_string.h"
|
||||||
|
|
||||||
|
#define checkCudaErrors(err) __checkCudaErrors (err, __FILE__, __LINE__)
|
||||||
|
// These are the inline versions for all of the SDK helper functions
|
||||||
|
void __checkCudaErrors(CUresult err, const char *file, const int line) {
|
||||||
|
if(CUDA_SUCCESS != err) {
|
||||||
|
std::cerr << "checkCudeErrors() Driver API error = " << err << "\""
|
||||||
|
<< getCudaDrvErrorString(err) << "\" from file <" << file
|
||||||
|
<< ", line " << line << "\n";
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/**********************/
|
||||||
|
/* Basic CUDriver API */
|
||||||
|
CUcontext context;
|
||||||
|
|
||||||
|
void createContext(const int deviceId = 0)
|
||||||
|
{
|
||||||
|
CUdevice device;
|
||||||
|
int devCount;
|
||||||
|
checkCudaErrors(cuInit(0));
|
||||||
|
checkCudaErrors(cuDeviceGetCount(&devCount));
|
||||||
|
assert(devCount > 0);
|
||||||
|
checkCudaErrors(cuDeviceGet(&device, deviceId < devCount ? deviceId : 0));
|
||||||
|
|
||||||
|
char name[128];
|
||||||
|
checkCudaErrors(cuDeviceGetName(name, 128, device));
|
||||||
|
std::cout << "Using CUDA Device [0]: " << name << "\n";
|
||||||
|
|
||||||
|
int devMajor, devMinor;
|
||||||
|
checkCudaErrors(cuDeviceComputeCapability(&devMajor, &devMinor, device));
|
||||||
|
std::cout << "Device Compute Capability: "
|
||||||
|
<< devMajor << "." << devMinor << "\n";
|
||||||
|
if (devMajor < 2) {
|
||||||
|
std::cerr << "ERROR: Device 0 is not SM 2.0 or greater\n";
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create driver context
|
||||||
|
checkCudaErrors(cuCtxCreate(&context, 0, device));
|
||||||
|
}
|
||||||
|
void destroyContext()
|
||||||
|
{
|
||||||
|
checkCudaErrors(cuCtxDestroy(context));
|
||||||
|
}
|
||||||
|
|
||||||
|
CUmodule loadModule(const char * module)
|
||||||
|
{
|
||||||
|
CUmodule cudaModule;
|
||||||
|
checkCudaErrors(cuModuleLoadData(&cudaModule, module));
|
||||||
|
return cudaModule;
|
||||||
|
}
|
||||||
|
void unloadModule(CUmodule &cudaModule)
|
||||||
|
{
|
||||||
|
checkCudaErrors(cuModuleUnload(cudaModule));
|
||||||
|
}
|
||||||
|
|
||||||
|
CUfunction getFunction(CUmodule &cudaModule, const char * function)
|
||||||
|
{
|
||||||
|
CUfunction cudaFunction;
|
||||||
|
checkCudaErrors(cuModuleGetFunction(&cudaFunction, cudaModule, function));
|
||||||
|
return cudaFunction;
|
||||||
|
}
|
||||||
|
|
||||||
|
CUdeviceptr deviceMalloc(const size_t size)
|
||||||
|
{
|
||||||
|
CUdeviceptr d_buf;
|
||||||
|
checkCudaErrors(cuMemAlloc(&d_buf, size));
|
||||||
|
return d_buf;
|
||||||
|
}
|
||||||
|
void deviceFree(CUdeviceptr d_buf)
|
||||||
|
{
|
||||||
|
checkCudaErrors(cuMemFree(d_buf));
|
||||||
|
}
|
||||||
|
void memcpyD2H(void * h_buf, CUdeviceptr d_buf, const size_t size)
|
||||||
|
{
|
||||||
|
checkCudaErrors(cuMemcpyDtoH(h_buf, d_buf, size));
|
||||||
|
}
|
||||||
|
void memcpyH2D(CUdeviceptr d_buf, void * h_buf, const size_t size)
|
||||||
|
{
|
||||||
|
checkCudaErrors(cuMemcpyHtoD(d_buf, h_buf, size));
|
||||||
|
}
|
||||||
|
#define deviceLaunch(func,nbx,nby,nbz,params) \
|
||||||
|
checkCudaErrors( \
|
||||||
|
cuLaunchKernel( \
|
||||||
|
(func), \
|
||||||
|
(nbx), (nby), (nbz), \
|
||||||
|
32, 1, 1, \
|
||||||
|
0, NULL, (params), NULL \
|
||||||
|
));
|
||||||
|
|
||||||
|
typedef CUdeviceptr devicePtr;
|
||||||
|
|
||||||
|
|
||||||
|
/**************/
|
||||||
|
|
||||||
|
extern "C"
|
||||||
|
{
|
||||||
|
#if 0
|
||||||
|
struct ModuleManager
|
||||||
|
{
|
||||||
|
private:
|
||||||
|
typedef std::pair<std::string, CUModule> ModulePair;
|
||||||
|
typedef std::map <std::string, CUModule> ModuleMap;
|
||||||
|
ModuleMap module_list;
|
||||||
|
|
||||||
|
ModuleMap::iterator findModule(const char * module_name)
|
||||||
|
{
|
||||||
|
return module_list.find(std::string(module_name));
|
||||||
|
}
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
CUmodule loadModule(const char * module_name, const char * module_data)
|
||||||
|
{
|
||||||
|
const ModuleMap::iterator it = findModule(module_name)
|
||||||
|
if (it != ModuleMap::end)
|
||||||
|
{
|
||||||
|
CUmodule cudaModule = loadModule(module);
|
||||||
|
module_list.insert(std::make_pair(std::string(module_name), cudaModule));
|
||||||
|
return cudaModule
|
||||||
|
}
|
||||||
|
return it->second;
|
||||||
|
}
|
||||||
|
void unloadModule(const char * module_name)
|
||||||
|
{
|
||||||
|
ModuleMap::iterator it = findModule(module_name)
|
||||||
|
if (it != ModuleMap::end)
|
||||||
|
module_list.erase(it);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
#endif
|
||||||
|
|
||||||
|
void *CUDAAlloc(void **handlePtr, int64_t size, int32_t alignment)
|
||||||
|
{
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
void CUDALaunch(
|
||||||
|
void **handlePtr,
|
||||||
|
const char * module_name,
|
||||||
|
const char * module,
|
||||||
|
const char * func_name,
|
||||||
|
void **func_args,
|
||||||
|
int countx, int county, int countz)
|
||||||
|
{
|
||||||
|
CUmodule cudaModule = loadModule(module);
|
||||||
|
CUfunction cudaFunction = getFunction(cudaModule, func_name);
|
||||||
|
deviceLaunch(cudaFunction, countx, county, countz, func_args);
|
||||||
|
unloadModule(cudaModule);
|
||||||
|
}
|
||||||
|
void CUDASync(void *handle)
|
||||||
|
{
|
||||||
|
checkCudaErrors(cuStreamSynchronize(0));
|
||||||
|
}
|
||||||
|
void CUDAFree(void *handle)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/********************/
|
||||||
|
|
||||||
|
|
||||||
|
/* Write a PPM image file with the image of the Mandelbrot set */
|
||||||
|
static void
|
||||||
|
writePPM(int *buf, int width, int height, const char *fn)
|
||||||
|
{
|
||||||
|
FILE *fp = fopen(fn, "wb");
|
||||||
|
fprintf(fp, "P6\n");
|
||||||
|
fprintf(fp, "%d %d\n", width, height);
|
||||||
|
fprintf(fp, "255\n");
|
||||||
|
for (int i = 0; i < width*height; ++i) {
|
||||||
|
// Map the iteration count to colors by just alternating between
|
||||||
|
// two greys.
|
||||||
|
char c = (buf[i] & 0x1) ? 240 : 20;
|
||||||
|
for (int j = 0; j < 3; ++j)
|
||||||
|
fputc(c, fp);
|
||||||
|
}
|
||||||
|
fclose(fp);
|
||||||
|
printf("Wrote image file %s\n", fn);
|
||||||
|
}
|
||||||
|
|
||||||
|
std::vector<char> readBinary(const char * filename)
|
||||||
|
{
|
||||||
|
std::vector<char> buffer;
|
||||||
|
FILE *fp = fopen(filename, "rb");
|
||||||
|
if (!fp )
|
||||||
|
{
|
||||||
|
fprintf(stderr, "file %s not found\n", filename);
|
||||||
|
assert(0);
|
||||||
|
}
|
||||||
|
#if 0
|
||||||
|
char c;
|
||||||
|
while ((c = fgetc(fp)) != EOF)
|
||||||
|
buffer.push_back(c);
|
||||||
|
#else
|
||||||
|
fseek(fp, 0, SEEK_END);
|
||||||
|
const unsigned long long size = ftell(fp); /*calc the size needed*/
|
||||||
|
fseek(fp, 0, SEEK_SET);
|
||||||
|
buffer.resize(size);
|
||||||
|
|
||||||
|
if (fp == NULL){ /*ERROR detection if file == empty*/
|
||||||
|
fprintf(stderr, "Error: There was an Error reading the file %s \n",filename);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
else if (fread(&buffer[0], sizeof(char), size, fp) != size){ /* if count of read bytes != calculated size of .bin file -> ERROR*/
|
||||||
|
fprintf(stderr, "Error: There was an Error reading the file %s \n", filename);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
fprintf(stderr, " read buffer of size= %d bytes \n", (int)buffer.size());
|
||||||
|
return buffer;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void usage()
|
||||||
|
{
|
||||||
|
fprintf(stderr, "usage: mandelbrot [--scale=<factor>]\n");
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
extern "C"
|
||||||
|
void mandelbrot_ispc(
|
||||||
|
float x0, float y0,
|
||||||
|
float x1, float y1,
|
||||||
|
int width, int height,
|
||||||
|
int maxIterations, int output[])
|
||||||
|
{
|
||||||
|
float dx = (x1 - x0) / width;
|
||||||
|
float dy = (y1 - y0) / height;
|
||||||
|
int xspan = 16; /* make sure it is big enough to avoid false-sharing */
|
||||||
|
int yspan = 4;
|
||||||
|
|
||||||
|
const int nbx = width/xspan;
|
||||||
|
const int nby = height/yspan;
|
||||||
|
const int nbz = 1;
|
||||||
|
|
||||||
|
fprintf(stderr ," nbx= %d nby= %d nbtot= %d \n", nbx, nby, nbx*nby);
|
||||||
|
|
||||||
|
#if 0
|
||||||
|
launch [nbx,nby]
|
||||||
|
mandelbrot_scanline(x0, dx, y0, dy, width, height, xspan, yspan,
|
||||||
|
maxIterations, output);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
// const std::vector<char> cubin = readBinary("cuLaunch.cubin");
|
||||||
|
const std::vector<char> cubin = readBinary("cuLaunch.ptx");
|
||||||
|
void *params[] = {&x0, &dx, &y0, &dy, &width, &height, &xspan, &yspan, &maxIterations, &output};
|
||||||
|
CUDALaunch(
|
||||||
|
NULL, //void **handlePtr,
|
||||||
|
"module_01", // const char * module_name,
|
||||||
|
&cubin[0], //const char * module,
|
||||||
|
"mandelbrot_scanline", //const char * func_name,
|
||||||
|
params, //void **func_args,
|
||||||
|
nbx,nby,nbz); //int countx, int county, int countz)
|
||||||
|
CUDASync(NULL);
|
||||||
|
}
|
||||||
|
|
||||||
|
int main(int argc, char *argv[])
|
||||||
|
{
|
||||||
|
unsigned int width = 1536;
|
||||||
|
unsigned int height = 1024;
|
||||||
|
float x0 = -2;
|
||||||
|
float x1 = 1;
|
||||||
|
float y0 = -1;
|
||||||
|
float y1 = 1;
|
||||||
|
|
||||||
|
if (argc == 1)
|
||||||
|
;
|
||||||
|
else if (argc == 2) {
|
||||||
|
if (strncmp(argv[1], "--scale=", 8) == 0) {
|
||||||
|
float scale = atof(argv[1] + 8);
|
||||||
|
if (scale == 0.f)
|
||||||
|
usage();
|
||||||
|
width *= scale;
|
||||||
|
height *= scale;
|
||||||
|
// round up to multiples of 16
|
||||||
|
width = (width + 0xf) & ~0xf;
|
||||||
|
height = (height + 0xf) & ~0xf;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
usage();
|
||||||
|
}
|
||||||
|
else
|
||||||
|
usage();
|
||||||
|
|
||||||
|
/*******************/
|
||||||
|
createContext();
|
||||||
|
/*******************/
|
||||||
|
|
||||||
|
int maxIterations = 512;
|
||||||
|
int *h_buf = new int[width*height];
|
||||||
|
for (unsigned int i = 0; i < width*height; i++)
|
||||||
|
h_buf[i] = 0;
|
||||||
|
|
||||||
|
const size_t bufsize = sizeof(int)*width*height;
|
||||||
|
devicePtr d_buf = deviceMalloc(bufsize);
|
||||||
|
memcpyH2D(d_buf, h_buf, bufsize);
|
||||||
|
|
||||||
|
mandelbrot_ispc(x0,y0,x1,y1,width, height, maxIterations, (int*)d_buf);
|
||||||
|
|
||||||
|
memcpyD2H(h_buf, d_buf, bufsize);
|
||||||
|
deviceFree(d_buf);
|
||||||
|
|
||||||
|
writePPM(h_buf, width, height, "mandelbrot-cuda.ppm");
|
||||||
|
|
||||||
|
/*******************/
|
||||||
|
destroyContext();
|
||||||
|
/*******************/
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
BIN
examples_cuda/mandelbrot_tasks3d/cuLaunch.cubin
Normal file
BIN
examples_cuda/mandelbrot_tasks3d/cuLaunch.cubin
Normal file
Binary file not shown.
186
examples_cuda/mandelbrot_tasks3d/cuLaunch.ptx
Normal file
186
examples_cuda/mandelbrot_tasks3d/cuLaunch.ptx
Normal file
@@ -0,0 +1,186 @@
|
|||||||
|
//
|
||||||
|
// Generated by LLVM NVPTX Back-End
|
||||||
|
//
|
||||||
|
|
||||||
|
.version 3.1
|
||||||
|
.target sm_35, texmode_independent
|
||||||
|
.address_size 64
|
||||||
|
|
||||||
|
// .globl mandelbrot_scanline
|
||||||
|
// @mandelbrot_scanline
|
||||||
|
.entry mandelbrot_scanline(
|
||||||
|
.param .f32 mandelbrot_scanline_param_0,
|
||||||
|
.param .f32 mandelbrot_scanline_param_1,
|
||||||
|
.param .f32 mandelbrot_scanline_param_2,
|
||||||
|
.param .f32 mandelbrot_scanline_param_3,
|
||||||
|
.param .u32 mandelbrot_scanline_param_4,
|
||||||
|
.param .u32 mandelbrot_scanline_param_5,
|
||||||
|
.param .u32 mandelbrot_scanline_param_6,
|
||||||
|
.param .u32 mandelbrot_scanline_param_7,
|
||||||
|
.param .u32 mandelbrot_scanline_param_8,
|
||||||
|
.param .u64 .ptr .align 4 mandelbrot_scanline_param_9
|
||||||
|
)
|
||||||
|
{
|
||||||
|
.reg .pred %p<396>;
|
||||||
|
.reg .s16 %rc<396>;
|
||||||
|
.reg .s16 %rs<396>;
|
||||||
|
.reg .s32 %r<396>;
|
||||||
|
.reg .s64 %rl<396>;
|
||||||
|
.reg .f32 %f<396>;
|
||||||
|
.reg .f64 %fl<396>;
|
||||||
|
|
||||||
|
// BB#0: // %allocas
|
||||||
|
ld.param.u32 %r6, [mandelbrot_scanline_param_5];
|
||||||
|
mov.u32 %r5, %ctaid.y;
|
||||||
|
ld.param.u32 %r7, [mandelbrot_scanline_param_7];
|
||||||
|
mul.lo.s32 %r0, %r5, %r7;
|
||||||
|
mad.lo.s32 %r1, %r5, %r7, %r7;
|
||||||
|
setp.lt.s32 %p0, %r1, %r6;
|
||||||
|
selp.b32 %r1, %r1, %r6, %p0;
|
||||||
|
setp.ge.s32 %p0, %r0, %r1;
|
||||||
|
@%p0 bra BB0_13;
|
||||||
|
// BB#1: // %for_test28.preheader.lr.ph
|
||||||
|
ld.param.f32 %f0, [mandelbrot_scanline_param_0];
|
||||||
|
mov.u32 %r2, %ctaid.x;
|
||||||
|
ld.param.u32 %r3, [mandelbrot_scanline_param_6];
|
||||||
|
mul.lo.s32 %r1, %r2, %r3;
|
||||||
|
ld.param.f32 %f1, [mandelbrot_scanline_param_1];
|
||||||
|
mad.lo.s32 %r3, %r2, %r3, %r3;
|
||||||
|
ld.param.f32 %f2, [mandelbrot_scanline_param_2];
|
||||||
|
ld.param.u32 %r2, [mandelbrot_scanline_param_4];
|
||||||
|
setp.lt.s32 %p0, %r3, %r2;
|
||||||
|
ld.param.f32 %f3, [mandelbrot_scanline_param_3];
|
||||||
|
selp.b32 %r3, %r3, %r2, %p0;
|
||||||
|
ld.param.u32 %r4, [mandelbrot_scanline_param_8];
|
||||||
|
ld.param.u64 %rl0, [mandelbrot_scanline_param_9];
|
||||||
|
setp.gt.s32 %p0, %r4, 0;
|
||||||
|
not.b32 %r6, %r6;
|
||||||
|
add.s32 %r5, %r5, 1;
|
||||||
|
mul.lo.s32 %r5, %r5, %r7;
|
||||||
|
not.b32 %r5, %r5;
|
||||||
|
setp.gt.s32 %p1, %r6, %r5;
|
||||||
|
selp.b32 %r5, %r6, %r5, %p1;
|
||||||
|
not.b32 %r5, %r5;
|
||||||
|
BB0_2: // %for_test28.preheader
|
||||||
|
// =>This Loop Header: Depth=1
|
||||||
|
// Child Loop BB0_15 Depth 2
|
||||||
|
// Child Loop BB0_8 Depth 2
|
||||||
|
// Child Loop BB0_11 Depth 3
|
||||||
|
setp.ge.s32 %p1, %r1, %r3;
|
||||||
|
@%p1 bra BB0_12;
|
||||||
|
// BB#3: // %for_loop30.lr.ph
|
||||||
|
// in Loop: Header=BB0_2 Depth=1
|
||||||
|
mul.lo.s32 %r6, %r0, %r2;
|
||||||
|
mov.u32 %r7, %r1;
|
||||||
|
@%p0 bra BB0_4;
|
||||||
|
bra.uni BB0_15;
|
||||||
|
BB0_4: // in Loop: Header=BB0_2 Depth=1
|
||||||
|
cvt.rn.f32.s32 %f4, %r0;
|
||||||
|
fma.rn.f32 %f4, %f4, %f3, %f2;
|
||||||
|
mov.u32 %r7, %r1;
|
||||||
|
BB0_8: // %for_loop.i.lr.ph.us
|
||||||
|
// Parent Loop BB0_2 Depth=1
|
||||||
|
// => This Loop Header: Depth=2
|
||||||
|
// Child Loop BB0_11 Depth 3
|
||||||
|
mov.u32 %r9, %tid.x;
|
||||||
|
mov.u32 %r8, WARP_SZ;
|
||||||
|
add.s32 %r10, %r8, -1;
|
||||||
|
and.b32 %r10, %r10, %r9;
|
||||||
|
add.s32 %r11, %r10, %r7;
|
||||||
|
cvt.rn.f32.s32 %f5, %r11;
|
||||||
|
fma.rn.f32 %f5, %f5, %f1, %f0;
|
||||||
|
mov.u32 %r10, 0;
|
||||||
|
mov.pred %p1, 0;
|
||||||
|
mov.pred %p3, -1;
|
||||||
|
mov.pred %p4, %p0;
|
||||||
|
mov.pred %p2, %p1;
|
||||||
|
mov.f32 %f7, %f5;
|
||||||
|
mov.f32 %f6, %f4;
|
||||||
|
BB0_11: // %for_loop.i.us
|
||||||
|
// Parent Loop BB0_2 Depth=1
|
||||||
|
// Parent Loop BB0_8 Depth=2
|
||||||
|
// => This Inner Loop Header: Depth=3
|
||||||
|
and.pred %p4, %p3, %p4;
|
||||||
|
mul.f32 %f8, %f7, %f7;
|
||||||
|
fma.rn.f32 %f9, %f6, %f6, %f8;
|
||||||
|
setp.gtu.f32 %p3, %f9, 0f40800000;
|
||||||
|
and.pred %p3, %p4, %p3;
|
||||||
|
or.pred %p2, %p3, %p2;
|
||||||
|
xor.pred %p5, %p2, %p4;
|
||||||
|
mov.pred %p3, %p1;
|
||||||
|
@!%p5 bra BB0_10;
|
||||||
|
bra.uni BB0_9;
|
||||||
|
BB0_9: // %not_all_continued_or_breaked.i.us
|
||||||
|
// in Loop: Header=BB0_11 Depth=3
|
||||||
|
mul.f32 %f9, %f6, %f6;
|
||||||
|
not.pred %p3, %p2;
|
||||||
|
and.pred %p3, %p4, %p3;
|
||||||
|
sub.f32 %f8, %f8, %f9;
|
||||||
|
add.f32 %f8, %f5, %f8;
|
||||||
|
add.f32 %f7, %f7, %f7;
|
||||||
|
fma.rn.f32 %f6, %f6, %f7, %f4;
|
||||||
|
mov.f32 %f7, %f8;
|
||||||
|
BB0_10: // %for_step.i.us
|
||||||
|
// in Loop: Header=BB0_11 Depth=3
|
||||||
|
add.s32 %r12, %r10, 1;
|
||||||
|
selp.b32 %r10, %r12, %r10, %p3;
|
||||||
|
setp.lt.s32 %p4, %r10, %r4;
|
||||||
|
and.pred %p5, %p3, %p4;
|
||||||
|
@%p5 bra BB0_11;
|
||||||
|
// BB#5: // %mandel___vyfvyfvyi.exit.us
|
||||||
|
// in Loop: Header=BB0_8 Depth=2
|
||||||
|
setp.ge.s32 %p1, %r11, %r3;
|
||||||
|
@%p1 bra BB0_7;
|
||||||
|
// BB#6: // %if_then.us
|
||||||
|
// in Loop: Header=BB0_8 Depth=2
|
||||||
|
add.s32 %r11, %r8, 1073741823;
|
||||||
|
and.b32 %r9, %r11, %r9;
|
||||||
|
add.s32 %r11, %r7, %r6;
|
||||||
|
add.s32 %r9, %r11, %r9;
|
||||||
|
shl.b32 %r9, %r9, 2;
|
||||||
|
cvt.s64.s32 %rl1, %r9;
|
||||||
|
add.s64 %rl1, %rl1, %rl0;
|
||||||
|
st.u32 [%rl1], %r10;
|
||||||
|
BB0_7: // %if_exit.us
|
||||||
|
// in Loop: Header=BB0_8 Depth=2
|
||||||
|
add.s32 %r7, %r8, %r7;
|
||||||
|
setp.lt.s32 %p1, %r7, %r3;
|
||||||
|
@%p1 bra BB0_8;
|
||||||
|
bra.uni BB0_12;
|
||||||
|
BB0_15: // %mandel___vyfvyfvyi.exit
|
||||||
|
// Parent Loop BB0_2 Depth=1
|
||||||
|
// => This Inner Loop Header: Depth=2
|
||||||
|
mov.u32 %r9, %tid.x;
|
||||||
|
mov.u32 %r8, WARP_SZ;
|
||||||
|
add.s32 %r10, %r8, -1;
|
||||||
|
and.b32 %r10, %r10, %r9;
|
||||||
|
add.s32 %r10, %r10, %r7;
|
||||||
|
setp.lt.s32 %p1, %r10, %r3;
|
||||||
|
@%p1 bra BB0_16;
|
||||||
|
bra.uni BB0_14;
|
||||||
|
BB0_16: // %if_then
|
||||||
|
// in Loop: Header=BB0_15 Depth=2
|
||||||
|
add.s32 %r10, %r8, 1073741823;
|
||||||
|
and.b32 %r9, %r10, %r9;
|
||||||
|
add.s32 %r10, %r7, %r6;
|
||||||
|
add.s32 %r9, %r10, %r9;
|
||||||
|
shl.b32 %r9, %r9, 2;
|
||||||
|
cvt.s64.s32 %rl1, %r9;
|
||||||
|
add.s64 %rl1, %rl1, %rl0;
|
||||||
|
mov.u32 %r9, 0;
|
||||||
|
st.u32 [%rl1], %r9;
|
||||||
|
BB0_14: // %if_exit
|
||||||
|
// in Loop: Header=BB0_15 Depth=2
|
||||||
|
add.s32 %r7, %r8, %r7;
|
||||||
|
setp.lt.s32 %p1, %r7, %r3;
|
||||||
|
@%p1 bra BB0_15;
|
||||||
|
BB0_12: // %for_exit31
|
||||||
|
// in Loop: Header=BB0_2 Depth=1
|
||||||
|
add.s32 %r0, %r0, 1;
|
||||||
|
setp.eq.s32 %p1, %r0, %r5;
|
||||||
|
@%p1 bra BB0_13;
|
||||||
|
bra.uni BB0_2;
|
||||||
|
BB0_13: // %for_exit
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
76102
examples_cuda/mandelbrot_tasks3d/cuda.hex
Normal file
76102
examples_cuda/mandelbrot_tasks3d/cuda.hex
Normal file
File diff suppressed because it is too large
Load Diff
370
examples_cuda/mandelbrot_tasks3d/drvapi_error_string.h
Normal file
370
examples_cuda/mandelbrot_tasks3d/drvapi_error_string.h
Normal file
@@ -0,0 +1,370 @@
|
|||||||
|
/*
|
||||||
|
* Copyright 1993-2012 NVIDIA Corporation. All rights reserved.
|
||||||
|
*
|
||||||
|
* Please refer to the NVIDIA end user license agreement (EULA) associated
|
||||||
|
* with this source code for terms and conditions that govern your use of
|
||||||
|
* this software. Any use, reproduction, disclosure, or distribution of
|
||||||
|
* this software and related documentation outside the terms of the EULA
|
||||||
|
* is strictly prohibited.
|
||||||
|
*
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef _DRVAPI_ERROR_STRING_H_
|
||||||
|
#define _DRVAPI_ERROR_STRING_H_
|
||||||
|
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
|
||||||
|
// Error Code string definitions here
|
||||||
|
typedef struct
|
||||||
|
{
|
||||||
|
char const *error_string;
|
||||||
|
int error_id;
|
||||||
|
} s_CudaErrorStr;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Error codes
|
||||||
|
*/
|
||||||
|
static s_CudaErrorStr sCudaDrvErrorString[] =
|
||||||
|
{
|
||||||
|
/**
|
||||||
|
* The API call returned with no errors. In the case of query calls, this
|
||||||
|
* can also mean that the operation being queried is complete (see
|
||||||
|
* ::cuEventQuery() and ::cuStreamQuery()).
|
||||||
|
*/
|
||||||
|
{ "CUDA_SUCCESS", 0 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that one or more of the parameters passed to the API call
|
||||||
|
* is not within an acceptable range of values.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_INVALID_VALUE", 1 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* The API call failed because it was unable to allocate enough memory to
|
||||||
|
* perform the requested operation.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_OUT_OF_MEMORY", 2 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that the CUDA driver has not been initialized with
|
||||||
|
* ::cuInit() or that initialization has failed.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_NOT_INITIALIZED", 3 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that the CUDA driver is in the process of shutting down.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_DEINITIALIZED", 4 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates profiling APIs are called while application is running
|
||||||
|
* in visual profiler mode.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_PROFILER_DISABLED", 5 },
|
||||||
|
/**
|
||||||
|
* This indicates profiling has not been initialized for this context.
|
||||||
|
* Call cuProfilerInitialize() to resolve this.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_PROFILER_NOT_INITIALIZED", 6 },
|
||||||
|
/**
|
||||||
|
* This indicates profiler has already been started and probably
|
||||||
|
* cuProfilerStart() is incorrectly called.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_PROFILER_ALREADY_STARTED", 7 },
|
||||||
|
/**
|
||||||
|
* This indicates profiler has already been stopped and probably
|
||||||
|
* cuProfilerStop() is incorrectly called.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_PROFILER_ALREADY_STOPPED", 8 },
|
||||||
|
/**
|
||||||
|
* This indicates that no CUDA-capable devices were detected by the installed
|
||||||
|
* CUDA driver.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_NO_DEVICE (no CUDA-capable devices were detected)", 100 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that the device ordinal supplied by the user does not
|
||||||
|
* correspond to a valid CUDA device.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_INVALID_DEVICE (device specified is not a valid CUDA device)", 101 },
|
||||||
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that the device kernel image is invalid. This can also
|
||||||
|
* indicate an invalid CUDA module.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_INVALID_IMAGE", 200 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This most frequently indicates that there is no context bound to the
|
||||||
|
* current thread. This can also be returned if the context passed to an
|
||||||
|
* API call is not a valid handle (such as a context that has had
|
||||||
|
* ::cuCtxDestroy() invoked on it). This can also be returned if a user
|
||||||
|
* mixes different API versions (i.e. 3010 context with 3020 API calls).
|
||||||
|
* See ::cuCtxGetApiVersion() for more details.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_INVALID_CONTEXT", 201 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicated that the context being supplied as a parameter to the
|
||||||
|
* API call was already the active context.
|
||||||
|
* \deprecated
|
||||||
|
* This error return is deprecated as of CUDA 3.2. It is no longer an
|
||||||
|
* error to attempt to push the active context via ::cuCtxPushCurrent().
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_CONTEXT_ALREADY_CURRENT", 202 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that a map or register operation has failed.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_MAP_FAILED", 205 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that an unmap or unregister operation has failed.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_UNMAP_FAILED", 206 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that the specified array is currently mapped and thus
|
||||||
|
* cannot be destroyed.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_ARRAY_IS_MAPPED", 207 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that the resource is already mapped.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_ALREADY_MAPPED", 208 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that there is no kernel image available that is suitable
|
||||||
|
* for the device. This can occur when a user specifies code generation
|
||||||
|
* options for a particular CUDA source file that do not include the
|
||||||
|
* corresponding device configuration.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_NO_BINARY_FOR_GPU", 209 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that a resource has already been acquired.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_ALREADY_ACQUIRED", 210 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that a resource is not mapped.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_NOT_MAPPED", 211 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that a mapped resource is not available for access as an
|
||||||
|
* array.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_NOT_MAPPED_AS_ARRAY", 212 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that a mapped resource is not available for access as a
|
||||||
|
* pointer.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_NOT_MAPPED_AS_POINTER", 213 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that an uncorrectable ECC error was detected during
|
||||||
|
* execution.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_ECC_UNCORRECTABLE", 214 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that the ::CUlimit passed to the API call is not
|
||||||
|
* supported by the active device.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_UNSUPPORTED_LIMIT", 215 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that the ::CUcontext passed to the API call can
|
||||||
|
* only be bound to a single CPU thread at a time but is already
|
||||||
|
* bound to a CPU thread.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_CONTEXT_ALREADY_IN_USE", 216 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that peer access is not supported across the given
|
||||||
|
* devices.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_PEER_ACCESS_UNSUPPORTED", 217},
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that the device kernel source is invalid.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_INVALID_SOURCE", 300 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that the file specified was not found.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_FILE_NOT_FOUND", 301 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that a link to a shared object failed to resolve.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_SHARED_OBJECT_SYMBOL_NOT_FOUND", 302 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that initialization of a shared object failed.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_SHARED_OBJECT_INIT_FAILED", 303 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that an OS call failed.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_OPERATING_SYSTEM", 304 },
|
||||||
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that a resource handle passed to the API call was not
|
||||||
|
* valid. Resource handles are opaque types like ::CUstream and ::CUevent.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_INVALID_HANDLE", 400 },
|
||||||
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that a named symbol was not found. Examples of symbols
|
||||||
|
* are global/constant variable names, texture names }, and surface names.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_NOT_FOUND", 500 },
|
||||||
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that asynchronous operations issued previously have not
|
||||||
|
* completed yet. This result is not actually an error, but must be indicated
|
||||||
|
* differently than ::CUDA_SUCCESS (which indicates completion). Calls that
|
||||||
|
* may return this value include ::cuEventQuery() and ::cuStreamQuery().
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_NOT_READY", 600 },
|
||||||
|
|
||||||
|
|
||||||
|
/**
|
||||||
|
* An exception occurred on the device while executing a kernel. Common
|
||||||
|
* causes include dereferencing an invalid device pointer and accessing
|
||||||
|
* out of bounds shared memory. The context cannot be used }, so it must
|
||||||
|
* be destroyed (and a new one should be created). All existing device
|
||||||
|
* memory allocations from this context are invalid and must be
|
||||||
|
* reconstructed if the program is to continue using CUDA.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_LAUNCH_FAILED", 700 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that a launch did not occur because it did not have
|
||||||
|
* appropriate resources. This error usually indicates that the user has
|
||||||
|
* attempted to pass too many arguments to the device kernel, or the
|
||||||
|
* kernel launch specifies too many threads for the kernel's register
|
||||||
|
* count. Passing arguments of the wrong size (i.e. a 64-bit pointer
|
||||||
|
* when a 32-bit int is expected) is equivalent to passing too many
|
||||||
|
* arguments and can also result in this error.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES", 701 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that the device kernel took too long to execute. This can
|
||||||
|
* only occur if timeouts are enabled - see the device attribute
|
||||||
|
* ::CU_DEVICE_ATTRIBUTE_KERNEL_EXEC_TIMEOUT for more information. The
|
||||||
|
* context cannot be used (and must be destroyed similar to
|
||||||
|
* ::CUDA_ERROR_LAUNCH_FAILED). All existing device memory allocations from
|
||||||
|
* this context are invalid and must be reconstructed if the program is to
|
||||||
|
* continue using CUDA.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_LAUNCH_TIMEOUT", 702 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This error indicates a kernel launch that uses an incompatible texturing
|
||||||
|
* mode.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_LAUNCH_INCOMPATIBLE_TEXTURING", 703 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This error indicates that a call to ::cuCtxEnablePeerAccess() is
|
||||||
|
* trying to re-enable peer access to a context which has already
|
||||||
|
* had peer access to it enabled.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED", 704 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This error indicates that ::cuCtxDisablePeerAccess() is
|
||||||
|
* trying to disable peer access which has not been enabled yet
|
||||||
|
* via ::cuCtxEnablePeerAccess().
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_PEER_ACCESS_NOT_ENABLED", 705 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This error indicates that the primary context for the specified device
|
||||||
|
* has already been initialized.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_PRIMARY_CONTEXT_ACTIVE", 708 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This error indicates that the context current to the calling thread
|
||||||
|
* has been destroyed using ::cuCtxDestroy }, or is a primary context which
|
||||||
|
* has not yet been initialized.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_CONTEXT_IS_DESTROYED", 709 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A device-side assert triggered during kernel execution. The context
|
||||||
|
* cannot be used anymore, and must be destroyed. All existing device
|
||||||
|
* memory allocations from this context are invalid and must be
|
||||||
|
* reconstructed if the program is to continue using CUDA.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_ASSERT", 710 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This error indicates that the hardware resources required to enable
|
||||||
|
* peer access have been exhausted for one or more of the devices
|
||||||
|
* passed to ::cuCtxEnablePeerAccess().
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_TOO_MANY_PEERS", 711 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This error indicates that the memory range passed to ::cuMemHostRegister()
|
||||||
|
* has already been registered.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_HOST_MEMORY_ALREADY_REGISTERED", 712 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This error indicates that the pointer passed to ::cuMemHostUnregister()
|
||||||
|
* does not correspond to any currently registered memory region.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_HOST_MEMORY_NOT_REGISTERED", 713 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This error indicates that the attempted operation is not permitted.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_NOT_PERMITTED", 800 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This error indicates that the attempted operation is not supported
|
||||||
|
* on the current system or device.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_NOT_SUPPORTED", 801 },
|
||||||
|
|
||||||
|
/**
|
||||||
|
* This indicates that an unknown internal error has occurred.
|
||||||
|
*/
|
||||||
|
{ "CUDA_ERROR_UNKNOWN", 999 },
|
||||||
|
{ NULL, -1 }
|
||||||
|
};
|
||||||
|
|
||||||
|
// This is just a linear search through the array, since the error_id's are not
|
||||||
|
// always ocurring consecutively
|
||||||
|
const char * getCudaDrvErrorString(CUresult error_id)
|
||||||
|
{
|
||||||
|
int index = 0;
|
||||||
|
while (sCudaDrvErrorString[index].error_id != error_id &&
|
||||||
|
sCudaDrvErrorString[index].error_id != -1)
|
||||||
|
{
|
||||||
|
index++;
|
||||||
|
}
|
||||||
|
if (sCudaDrvErrorString[index].error_id == error_id)
|
||||||
|
return (const char *)sCudaDrvErrorString[index].error_string;
|
||||||
|
else
|
||||||
|
return (const char *)"CUDA_ERROR not found!";
|
||||||
|
}
|
||||||
|
|
||||||
|
#endif
|
||||||
76116
examples_cuda/mandelbrot_tasks3d/ispc.hex
Normal file
76116
examples_cuda/mandelbrot_tasks3d/ispc.hex
Normal file
File diff suppressed because it is too large
Load Diff
BIN
examples_cuda/mandelbrot_tasks3d/mandel
Executable file
BIN
examples_cuda/mandelbrot_tasks3d/mandel
Executable file
Binary file not shown.
352
examples_cuda/mandelbrot_tasks3d/mandel.cpp
Normal file
352
examples_cuda/mandelbrot_tasks3d/mandel.cpp
Normal file
@@ -0,0 +1,352 @@
|
|||||||
|
#include <stdio.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <iostream>
|
||||||
|
#include <algorithm>
|
||||||
|
#include <string.h>
|
||||||
|
#include <cuda.h>
|
||||||
|
#include <vector>
|
||||||
|
#include <cassert>
|
||||||
|
#include "drvapi_error_string.h"
|
||||||
|
|
||||||
|
#define checkCudaErrors(err) __checkCudaErrors (err, __FILE__, __LINE__)
|
||||||
|
// These are the inline versions for all of the SDK helper functions
|
||||||
|
void __checkCudaErrors(CUresult err, const char *file, const int line) {
|
||||||
|
if(CUDA_SUCCESS != err) {
|
||||||
|
std::cerr << "checkCudeErrors() Driver API error = " << err << "\""
|
||||||
|
<< getCudaDrvErrorString(err) << "\" from file <" << file
|
||||||
|
<< ", line " << line << "\n";
|
||||||
|
exit(-1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
/**********************/
|
||||||
|
/* Basic CUDriver API */
|
||||||
|
CUcontext context;
|
||||||
|
|
||||||
|
void createContext(const int deviceId = 0)
|
||||||
|
{
|
||||||
|
CUdevice device;
|
||||||
|
int devCount;
|
||||||
|
checkCudaErrors(cuInit(0));
|
||||||
|
checkCudaErrors(cuDeviceGetCount(&devCount));
|
||||||
|
assert(devCount > 0);
|
||||||
|
checkCudaErrors(cuDeviceGet(&device, deviceId < devCount ? deviceId : 0));
|
||||||
|
|
||||||
|
char name[128];
|
||||||
|
checkCudaErrors(cuDeviceGetName(name, 128, device));
|
||||||
|
std::cout << "Using CUDA Device [0]: " << name << "\n";
|
||||||
|
|
||||||
|
int devMajor, devMinor;
|
||||||
|
checkCudaErrors(cuDeviceComputeCapability(&devMajor, &devMinor, device));
|
||||||
|
std::cout << "Device Compute Capability: "
|
||||||
|
<< devMajor << "." << devMinor << "\n";
|
||||||
|
if (devMajor < 2) {
|
||||||
|
std::cerr << "ERROR: Device 0 is not SM 2.0 or greater\n";
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create driver context
|
||||||
|
checkCudaErrors(cuCtxCreate(&context, 0, device));
|
||||||
|
}
|
||||||
|
void destroyContext()
|
||||||
|
{
|
||||||
|
checkCudaErrors(cuCtxDestroy(context));
|
||||||
|
}
|
||||||
|
|
||||||
|
CUmodule loadModule(const char * module)
|
||||||
|
{
|
||||||
|
CUmodule cudaModule;
|
||||||
|
checkCudaErrors(cuModuleLoadData(&cudaModule, module));
|
||||||
|
return cudaModule;
|
||||||
|
}
|
||||||
|
void unloadModule(CUmodule &cudaModule)
|
||||||
|
{
|
||||||
|
checkCudaErrors(cuModuleUnload(cudaModule));
|
||||||
|
}
|
||||||
|
|
||||||
|
CUfunction getFunction(CUmodule &cudaModule, const char * function)
|
||||||
|
{
|
||||||
|
CUfunction cudaFunction;
|
||||||
|
checkCudaErrors(cuModuleGetFunction(&cudaFunction, cudaModule, function));
|
||||||
|
return cudaFunction;
|
||||||
|
}
|
||||||
|
|
||||||
|
CUdeviceptr deviceMalloc(const size_t size)
|
||||||
|
{
|
||||||
|
CUdeviceptr d_buf;
|
||||||
|
checkCudaErrors(cuMemAlloc(&d_buf, size));
|
||||||
|
return d_buf;
|
||||||
|
}
|
||||||
|
void deviceFree(CUdeviceptr d_buf)
|
||||||
|
{
|
||||||
|
checkCudaErrors(cuMemFree(d_buf));
|
||||||
|
}
|
||||||
|
void memcpyD2H(void * h_buf, CUdeviceptr d_buf, const size_t size)
|
||||||
|
{
|
||||||
|
checkCudaErrors(cuMemcpyDtoH(h_buf, d_buf, size));
|
||||||
|
}
|
||||||
|
void memcpyH2D(CUdeviceptr d_buf, void * h_buf, const size_t size)
|
||||||
|
{
|
||||||
|
checkCudaErrors(cuMemcpyHtoD(d_buf, h_buf, size));
|
||||||
|
}
|
||||||
|
#define deviceLaunch(func,nbx,nby,nbz,params) \
|
||||||
|
checkCudaErrors( \
|
||||||
|
cuLaunchKernel( \
|
||||||
|
(func), \
|
||||||
|
(nbx), (nby), (nbz), \
|
||||||
|
32, 1, 1, \
|
||||||
|
0, NULL, (params), NULL \
|
||||||
|
));
|
||||||
|
|
||||||
|
typedef CUdeviceptr devicePtr;
|
||||||
|
|
||||||
|
|
||||||
|
/**************/
|
||||||
|
|
||||||
|
extern "C"
|
||||||
|
{
|
||||||
|
#if 0
|
||||||
|
struct ModuleManager
|
||||||
|
{
|
||||||
|
private:
|
||||||
|
typedef std::pair<std::string, CUModule> ModulePair;
|
||||||
|
typedef std::map <std::string, CUModule> ModuleMap;
|
||||||
|
ModuleMap module_list;
|
||||||
|
|
||||||
|
ModuleMap::iterator findModule(const char * module_name)
|
||||||
|
{
|
||||||
|
return module_list.find(std::string(module_name));
|
||||||
|
}
|
||||||
|
|
||||||
|
public:
|
||||||
|
|
||||||
|
CUmodule loadModule(const char * module_name, const char * module_data)
|
||||||
|
{
|
||||||
|
const ModuleMap::iterator it = findModule(module_name)
|
||||||
|
if (it != ModuleMap::end)
|
||||||
|
{
|
||||||
|
CUmodule cudaModule = loadModule(module);
|
||||||
|
module_list.insert(std::make_pair(std::string(module_name), cudaModule));
|
||||||
|
return cudaModule
|
||||||
|
}
|
||||||
|
return it->second;
|
||||||
|
}
|
||||||
|
void unloadModule(const char * module_name)
|
||||||
|
{
|
||||||
|
ModuleMap::iterator it = findModule(module_name)
|
||||||
|
if (it != ModuleMap::end)
|
||||||
|
module_list.erase(it);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
#endif
|
||||||
|
|
||||||
|
void *CUDAAlloc(void **handlePtr, int64_t size, int32_t alignment)
|
||||||
|
{
|
||||||
|
#if 0
|
||||||
|
fprintf(stderr, " ptr= %p\n", *handlePtr);
|
||||||
|
fprintf(stderr, " size= %d\n", (int)size);
|
||||||
|
fprintf(stderr, " alignment= %d\n", (int)alignment);
|
||||||
|
fprintf(stderr, " ------- \n\n");
|
||||||
|
#endif
|
||||||
|
return NULL;
|
||||||
|
}
|
||||||
|
void CUDALaunch(
|
||||||
|
void **handlePtr,
|
||||||
|
const char * module_name,
|
||||||
|
const char * module,
|
||||||
|
const char * func_name,
|
||||||
|
void **func_args,
|
||||||
|
int countx, int county, int countz)
|
||||||
|
{
|
||||||
|
assert(module_name != NULL);
|
||||||
|
assert(module != NULL);
|
||||||
|
assert(func_name != NULL);
|
||||||
|
assert(func_args != NULL);
|
||||||
|
#if 1
|
||||||
|
CUmodule cudaModule = loadModule(module);
|
||||||
|
CUfunction cudaFunction = getFunction(cudaModule, func_name);
|
||||||
|
deviceLaunch(cudaFunction, countx, county, countz, func_args);
|
||||||
|
unloadModule(cudaModule);
|
||||||
|
#else
|
||||||
|
fprintf(stderr, " handle= %p\n", *handlePtr);
|
||||||
|
fprintf(stderr, " count= %d %d %d\n", countx, county, countz);
|
||||||
|
|
||||||
|
fprintf(stderr, " module_name= %s \n", module_name);
|
||||||
|
fprintf(stderr, " func_name= %s \n", func_name);
|
||||||
|
// fprintf(stderr, " ptx= %s \n", module);
|
||||||
|
fprintf(stderr, " x0= %g \n", *((float*)(func_args[0])));
|
||||||
|
fprintf(stderr, " dx= %g \n", *((float*)(func_args[1])));
|
||||||
|
fprintf(stderr, " y0= %g \n", *((float*)(func_args[2])));
|
||||||
|
fprintf(stderr, " dy= %g \n", *((float*)(func_args[3])));
|
||||||
|
fprintf(stderr, " w= %d \n", *((int*)(func_args[4])));
|
||||||
|
fprintf(stderr, " h= %d \n", *((int*)(func_args[5])));
|
||||||
|
fprintf(stderr, " xs= %d \n", *((int*)(func_args[6])));
|
||||||
|
fprintf(stderr, " ys= %d \n", *((int*)(func_args[7])));
|
||||||
|
fprintf(stderr, " maxit= %d \n", *((int*)(func_args[8])));
|
||||||
|
fprintf(stderr, " ptr= %p \n", *((int**)(func_args[9])));
|
||||||
|
fprintf(stderr, " ------- \n\n");
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
void CUDASync(void *handle)
|
||||||
|
{
|
||||||
|
checkCudaErrors(cuStreamSynchronize(0));
|
||||||
|
}
|
||||||
|
void ISPCSync(void *handle)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
void CUDAFree(void *handle)
|
||||||
|
{
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/********************/
|
||||||
|
|
||||||
|
|
||||||
|
/* Write a PPM image file with the image of the Mandelbrot set */
|
||||||
|
static void
|
||||||
|
writePPM(int *buf, int width, int height, const char *fn)
|
||||||
|
{
|
||||||
|
FILE *fp = fopen(fn, "wb");
|
||||||
|
fprintf(fp, "P6\n");
|
||||||
|
fprintf(fp, "%d %d\n", width, height);
|
||||||
|
fprintf(fp, "255\n");
|
||||||
|
for (int i = 0; i < width*height; ++i) {
|
||||||
|
// Map the iteration count to colors by just alternating between
|
||||||
|
// two greys.
|
||||||
|
char c = (buf[i] & 0x1) ? 240 : 20;
|
||||||
|
for (int j = 0; j < 3; ++j)
|
||||||
|
fputc(c, fp);
|
||||||
|
}
|
||||||
|
fclose(fp);
|
||||||
|
printf("Wrote image file %s\n", fn);
|
||||||
|
}
|
||||||
|
|
||||||
|
std::vector<char> readBinary(const char * filename)
|
||||||
|
{
|
||||||
|
std::vector<char> buffer;
|
||||||
|
FILE *fp = fopen(filename, "rb");
|
||||||
|
if (!fp )
|
||||||
|
{
|
||||||
|
fprintf(stderr, "file %s not found\n", filename);
|
||||||
|
assert(0);
|
||||||
|
}
|
||||||
|
#if 0
|
||||||
|
char c;
|
||||||
|
while ((c = fgetc(fp)) != EOF)
|
||||||
|
buffer.push_back(c);
|
||||||
|
#else
|
||||||
|
fseek(fp, 0, SEEK_END);
|
||||||
|
const unsigned long long size = ftell(fp); /*calc the size needed*/
|
||||||
|
fseek(fp, 0, SEEK_SET);
|
||||||
|
buffer.resize(size);
|
||||||
|
|
||||||
|
if (fp == NULL){ /*ERROR detection if file == empty*/
|
||||||
|
fprintf(stderr, "Error: There was an Error reading the file %s \n",filename);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
else if (fread(&buffer[0], sizeof(char), size, fp) != size){ /* if count of read bytes != calculated size of .bin file -> ERROR*/
|
||||||
|
fprintf(stderr, "Error: There was an Error reading the file %s \n", filename);
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
fprintf(stderr, " read buffer of size= %d bytes \n", (int)buffer.size());
|
||||||
|
return buffer;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
static void usage()
|
||||||
|
{
|
||||||
|
fprintf(stderr, "usage: mandelbrot [--scale=<factor>]\n");
|
||||||
|
exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
extern "C"
|
||||||
|
void mandelbrot_ispc(
|
||||||
|
float x0, float y0,
|
||||||
|
float x1, float y1,
|
||||||
|
int width, int height,
|
||||||
|
int maxIterations, int output[])
|
||||||
|
#if 1
|
||||||
|
;
|
||||||
|
#else
|
||||||
|
{
|
||||||
|
float dx = (x1 - x0) / width;
|
||||||
|
float dy = (y1 - y0) / height;
|
||||||
|
int xspan = 32; /* make sure it is big enough to avoid false-sharing */
|
||||||
|
int yspan = 4;
|
||||||
|
|
||||||
|
const int nbx = width/xspan;
|
||||||
|
const int nby = width/yspan;
|
||||||
|
const int nbz = 1;
|
||||||
|
|
||||||
|
fprintf(stderr ," nbx= %d nby= %d nbtot= %d \n", nbx, nby, nbx*nby);
|
||||||
|
|
||||||
|
// const std::vector<char> cubin = readBinary("cuLaunch.cubin");
|
||||||
|
const std::vector<char> cubin = readBinary("cuLaunch.ptx");
|
||||||
|
void *params[] = {&x0, &dx, &y0, &dy, &width, &height, &xspan, &yspan, &maxIterations, &output};
|
||||||
|
CUDALaunch(
|
||||||
|
NULL, //void **handlePtr,
|
||||||
|
"module_01", // const char * module_name,
|
||||||
|
&cubin[0], //const char * module,
|
||||||
|
"mandelbrot_scanline", //const char * func_name,
|
||||||
|
params, //void **func_args,
|
||||||
|
nbx,nby,nbz); //int countx, int county, int countz)
|
||||||
|
CUDASync(NULL);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
int main(int argc, char *argv[])
|
||||||
|
{
|
||||||
|
unsigned int width = 1536;
|
||||||
|
unsigned int height = 1024;
|
||||||
|
float x0 = -2;
|
||||||
|
float x1 = 1;
|
||||||
|
float y0 = -1;
|
||||||
|
float y1 = 1;
|
||||||
|
|
||||||
|
if (argc == 1)
|
||||||
|
;
|
||||||
|
else if (argc == 2) {
|
||||||
|
if (strncmp(argv[1], "--scale=", 8) == 0) {
|
||||||
|
float scale = atof(argv[1] + 8);
|
||||||
|
if (scale == 0.f)
|
||||||
|
usage();
|
||||||
|
width *= scale;
|
||||||
|
height *= scale;
|
||||||
|
// round up to multiples of 16
|
||||||
|
width = (width + 0xf) & ~0xf;
|
||||||
|
height = (height + 0xf) & ~0xf;
|
||||||
|
}
|
||||||
|
else
|
||||||
|
usage();
|
||||||
|
}
|
||||||
|
else
|
||||||
|
usage();
|
||||||
|
|
||||||
|
/*******************/
|
||||||
|
createContext();
|
||||||
|
/*******************/
|
||||||
|
|
||||||
|
int maxIterations = 512;
|
||||||
|
int *h_buf = new int[width*height];
|
||||||
|
for (unsigned int i = 0; i < width*height; i++)
|
||||||
|
h_buf[i] = 0;
|
||||||
|
|
||||||
|
const size_t bufsize = sizeof(int)*width*height;
|
||||||
|
devicePtr d_buf = deviceMalloc(bufsize);
|
||||||
|
memcpyH2D(d_buf, h_buf, bufsize);
|
||||||
|
|
||||||
|
mandelbrot_ispc(x0,y0,x1,y1,width, height, maxIterations, (int*)d_buf);
|
||||||
|
|
||||||
|
memcpyD2H(h_buf, d_buf, bufsize);
|
||||||
|
deviceFree(d_buf);
|
||||||
|
|
||||||
|
writePPM(h_buf, width, height, "mandelbrot-cuda.ppm");
|
||||||
|
|
||||||
|
/*******************/
|
||||||
|
destroyContext();
|
||||||
|
/*******************/
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
410
examples_cuda/mandelbrot_tasks3d/mandel.ll
Normal file
410
examples_cuda/mandelbrot_tasks3d/mandel.ll
Normal file
@@ -0,0 +1,410 @@
|
|||||||
|
; ModuleID = 'mandelbrot_task.bc'
|
||||||
|
target datalayout = "e-p:64:64:64-S0-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64-v16:16:16-v32:32:32-n16:32:64"
|
||||||
|
target triple = "nvptx64"
|
||||||
|
|
||||||
|
; Function Attrs: nounwind readnone
|
||||||
|
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #0
|
||||||
|
|
||||||
|
; Function Attrs: nounwind readnone
|
||||||
|
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() #0
|
||||||
|
|
||||||
|
; Function Attrs: nounwind readnone
|
||||||
|
declare i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() #0
|
||||||
|
|
||||||
|
; Function Attrs: nounwind readnone
|
||||||
|
declare i32 @llvm.nvvm.read.ptx.sreg.warpsize() #0
|
||||||
|
|
||||||
|
; Function Attrs: alwaysinline nounwind readnone
|
||||||
|
define <1 x i8> @__vselect_i8(<1 x i8>, <1 x i8>, <1 x i32> %mask) #1 {
|
||||||
|
%m = extractelement <1 x i32> %mask, i32 0
|
||||||
|
%cmp = icmp eq i32 %m, 0
|
||||||
|
%d0 = extractelement <1 x i8> %0, i32 0
|
||||||
|
%d1 = extractelement <1 x i8> %1, i32 0
|
||||||
|
%sel = select i1 %cmp, i8 %d0, i8 %d1
|
||||||
|
%r = insertelement <1 x i8> undef, i8 %sel, i32 0
|
||||||
|
ret <1 x i8> %r
|
||||||
|
}
|
||||||
|
|
||||||
|
; Function Attrs: alwaysinline nounwind readnone
|
||||||
|
define <1 x i16> @__vselect_i16(<1 x i16>, <1 x i16>, <1 x i32> %mask) #1 {
|
||||||
|
%m = extractelement <1 x i32> %mask, i32 0
|
||||||
|
%cmp = icmp eq i32 %m, 0
|
||||||
|
%d0 = extractelement <1 x i16> %0, i32 0
|
||||||
|
%d1 = extractelement <1 x i16> %1, i32 0
|
||||||
|
%sel = select i1 %cmp, i16 %d0, i16 %d1
|
||||||
|
%r = insertelement <1 x i16> undef, i16 %sel, i32 0
|
||||||
|
ret <1 x i16> %r
|
||||||
|
}
|
||||||
|
|
||||||
|
; Function Attrs: alwaysinline nounwind readnone
|
||||||
|
define <1 x i64> @__vselect_i64(<1 x i64>, <1 x i64>, <1 x i32> %mask) #1 {
|
||||||
|
%m = extractelement <1 x i32> %mask, i32 0
|
||||||
|
%cmp = icmp eq i32 %m, 0
|
||||||
|
%d0 = extractelement <1 x i64> %0, i32 0
|
||||||
|
%d1 = extractelement <1 x i64> %1, i32 0
|
||||||
|
%sel = select i1 %cmp, i64 %d0, i64 %d1
|
||||||
|
%r = insertelement <1 x i64> undef, i64 %sel, i32 0
|
||||||
|
ret <1 x i64> %r
|
||||||
|
}
|
||||||
|
|
||||||
|
; Function Attrs: nounwind readnone
|
||||||
|
declare double @llvm.nvvm.rsqrt.approx.d(double) #0
|
||||||
|
|
||||||
|
; Function Attrs: alwaysinline nounwind
|
||||||
|
define void @__aos_to_soa4_float1(<1 x float> %v0, <1 x float> %v1, <1 x float> %v2, <1 x float> %v3, <1 x float>* noalias nocapture %out0, <1 x float>* noalias nocapture %out1, <1 x float>* noalias nocapture %out2, <1 x float>* noalias nocapture %out3) #2 {
|
||||||
|
store <1 x float> %v0, <1 x float>* %out0, align 4
|
||||||
|
store <1 x float> %v1, <1 x float>* %out1, align 4
|
||||||
|
store <1 x float> %v2, <1 x float>* %out2, align 4
|
||||||
|
store <1 x float> %v3, <1 x float>* %out3, align 4
|
||||||
|
ret void
|
||||||
|
}
|
||||||
|
|
||||||
|
; Function Attrs: alwaysinline nounwind
|
||||||
|
define void @__soa_to_aos4_float1(<1 x float> %v0, <1 x float> %v1, <1 x float> %v2, <1 x float> %v3, <1 x float>* noalias nocapture %out0, <1 x float>* noalias nocapture %out1, <1 x float>* noalias nocapture %out2, <1 x float>* noalias nocapture %out3) #2 {
|
||||||
|
store <1 x float> %v0, <1 x float>* %out0, align 4
|
||||||
|
store <1 x float> %v1, <1 x float>* %out1, align 4
|
||||||
|
store <1 x float> %v2, <1 x float>* %out2, align 4
|
||||||
|
store <1 x float> %v3, <1 x float>* %out3, align 4
|
||||||
|
ret void
|
||||||
|
}
|
||||||
|
|
||||||
|
; Function Attrs: nounwind
|
||||||
|
define void @__aos_to_soa3_float1(<1 x float> %v0, <1 x float> %v1, <1 x float> %v2, <1 x float>* nocapture %out0, <1 x float>* nocapture %out1, <1 x float>* nocapture %out2) #3 {
|
||||||
|
store <1 x float> %v0, <1 x float>* %out0, align 4
|
||||||
|
store <1 x float> %v1, <1 x float>* %out1, align 4
|
||||||
|
store <1 x float> %v2, <1 x float>* %out2, align 4
|
||||||
|
ret void
|
||||||
|
}
|
||||||
|
|
||||||
|
; Function Attrs: nounwind
|
||||||
|
define void @__soa_to_aos3_float1(<1 x float> %v0, <1 x float> %v1, <1 x float> %v2, <1 x float>* nocapture %out0, <1 x float>* nocapture %out1, <1 x float>* nocapture %out2) #3 {
|
||||||
|
store <1 x float> %v0, <1 x float>* %out0, align 4
|
||||||
|
store <1 x float> %v1, <1 x float>* %out1, align 4
|
||||||
|
store <1 x float> %v2, <1 x float>* %out2, align 4
|
||||||
|
ret void
|
||||||
|
}
|
||||||
|
|
||||||
|
; Function Attrs: alwaysinline nounwind readonly
|
||||||
|
define <1 x double> @__rsqrt_varying_double(<1 x double> %v) #4 {
|
||||||
|
%vs = extractelement <1 x double> %v, i32 0
|
||||||
|
%rs = tail call double @llvm.nvvm.rsqrt.approx.d(double %vs)
|
||||||
|
%rv = insertelement <1 x double> undef, double %rs, i32 0
|
||||||
|
ret <1 x double> %rv
|
||||||
|
}
|
||||||
|
|
||||||
|
; Function Attrs: nounwind
|
||||||
|
define void @mandelbrot_scanline___unfunfunfunfuniuniuniuniuniun_3C_uni_3E_({ float, float, float, float, i32, i32, i32, i32, i32, i32*, <1 x i32> }* noalias nocapture, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32) #5 {
|
||||||
|
allocas:
|
||||||
|
%x01 = getelementptr { float, float, float, float, i32, i32, i32, i32, i32, i32*, <1 x i32> }* %0, i64 0, i32 0
|
||||||
|
%x02 = load float* %x01, align 4
|
||||||
|
%dx3 = getelementptr { float, float, float, float, i32, i32, i32, i32, i32, i32*, <1 x i32> }* %0, i64 0, i32 1
|
||||||
|
%dx4 = load float* %dx3, align 4
|
||||||
|
%y05 = getelementptr { float, float, float, float, i32, i32, i32, i32, i32, i32*, <1 x i32> }* %0, i64 0, i32 2
|
||||||
|
%y06 = load float* %y05, align 4
|
||||||
|
%dy7 = getelementptr { float, float, float, float, i32, i32, i32, i32, i32, i32*, <1 x i32> }* %0, i64 0, i32 3
|
||||||
|
%dy8 = load float* %dy7, align 4
|
||||||
|
%width9 = getelementptr { float, float, float, float, i32, i32, i32, i32, i32, i32*, <1 x i32> }* %0, i64 0, i32 4
|
||||||
|
%width10 = load i32* %width9, align 4
|
||||||
|
%height11 = getelementptr { float, float, float, float, i32, i32, i32, i32, i32, i32*, <1 x i32> }* %0, i64 0, i32 5
|
||||||
|
%height12 = load i32* %height11, align 4
|
||||||
|
%xspan13 = getelementptr { float, float, float, float, i32, i32, i32, i32, i32, i32*, <1 x i32> }* %0, i64 0, i32 6
|
||||||
|
%xspan14 = load i32* %xspan13, align 4
|
||||||
|
%yspan15 = getelementptr { float, float, float, float, i32, i32, i32, i32, i32, i32*, <1 x i32> }* %0, i64 0, i32 7
|
||||||
|
%yspan16 = load i32* %yspan15, align 4
|
||||||
|
%maxIterations17 = getelementptr { float, float, float, float, i32, i32, i32, i32, i32, i32*, <1 x i32> }* %0, i64 0, i32 8
|
||||||
|
%maxIterations18 = load i32* %maxIterations17, align 4
|
||||||
|
%output19 = getelementptr { float, float, float, float, i32, i32, i32, i32, i32, i32*, <1 x i32> }* %0, i64 0, i32 9
|
||||||
|
%output20 = load i32** %output19, align 8
|
||||||
|
%task_struct_mask = getelementptr { float, float, float, float, i32, i32, i32, i32, i32, i32*, <1 x i32> }* %0, i64 0, i32 10
|
||||||
|
%mask = load <1 x i32>* %task_struct_mask, align 4
|
||||||
|
%item.i = extractelement <1 x i32> %mask, i32 0
|
||||||
|
%cmp.i = icmp slt i32 %item.i, 0
|
||||||
|
%bid.i.i = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.x() #3
|
||||||
|
%mul_calltmp_xspan_load = mul i32 %bid.i.i, %xspan14
|
||||||
|
%add_xstart_load_xspan_load25 = add i32 %mul_calltmp_xspan_load, %xspan14
|
||||||
|
%c.i.i = icmp slt i32 %add_xstart_load_xspan_load25, %width10
|
||||||
|
%r.i.i = select i1 %c.i.i, i32 %add_xstart_load_xspan_load25, i32 %width10
|
||||||
|
%bid.i.i177 = call i32 @llvm.nvvm.read.ptx.sreg.ctaid.y() #3
|
||||||
|
%mul_calltmp31_yspan_load = mul i32 %bid.i.i177, %yspan16
|
||||||
|
%add_ystart_load_yspan_load32 = add i32 %mul_calltmp31_yspan_load, %yspan16
|
||||||
|
%c.i.i178 = icmp slt i32 %add_ystart_load_yspan_load32, %height12
|
||||||
|
%r.i.i179 = select i1 %c.i.i178, i32 %add_ystart_load_yspan_load32, i32 %height12
|
||||||
|
%less_yi_load_yend_load319 = icmp slt i32 %mul_calltmp31_yspan_load, %r.i.i179
|
||||||
|
br i1 %cmp.i, label %for_test.preheader, label %for_test104.preheader
|
||||||
|
|
||||||
|
for_test104.preheader: ; preds = %allocas
|
||||||
|
br i1 %less_yi_load_yend_load319, label %for_test115.preheader.lr.ph, label %for_exit
|
||||||
|
|
||||||
|
for_test115.preheader.lr.ph: ; preds = %for_test104.preheader
|
||||||
|
%less_xi_load122_xend_load123331 = icmp slt i32 %mul_calltmp_xspan_load, %r.i.i
|
||||||
|
%maxIterations_load140_broadcast_init = insertelement <1 x i32> undef, i32 %maxIterations18, i32 0
|
||||||
|
%less_i_load_count_load.i321 = icmp sgt <1 x i32> %maxIterations_load140_broadcast_init, zeroinitializer
|
||||||
|
%"oldMask&test.i322" = select <1 x i1> %less_i_load_count_load.i321, <1 x i32> <i32 -1>, <1 x i32> zeroinitializer
|
||||||
|
%"internal_mask&function_mask10.i323" = and <1 x i32> %"oldMask&test.i322", %mask
|
||||||
|
%item.i.i324 = extractelement <1 x i32> %"internal_mask&function_mask10.i323", i32 0
|
||||||
|
%cmp.i.i325 = icmp slt i32 %item.i.i324, 0
|
||||||
|
%11 = xor i32 %height12, -1
|
||||||
|
%12 = add i32 %bid.i.i177, 1
|
||||||
|
%13 = mul i32 %yspan16, %12
|
||||||
|
%14 = xor i32 %13, -1
|
||||||
|
%15 = icmp sgt i32 %11, %14
|
||||||
|
%smax336 = select i1 %15, i32 %11, i32 %14
|
||||||
|
%16 = xor i32 %smax336, -1
|
||||||
|
br label %for_test115.preheader
|
||||||
|
|
||||||
|
for_test.preheader: ; preds = %allocas
|
||||||
|
br i1 %less_yi_load_yend_load319, label %for_test40.preheader.lr.ph, label %for_exit
|
||||||
|
|
||||||
|
for_test40.preheader.lr.ph: ; preds = %for_test.preheader
|
||||||
|
%less_xi_load_xend_load317 = icmp slt i32 %mul_calltmp_xspan_load, %r.i.i
|
||||||
|
%maxIterations_load_broadcast_init = insertelement <1 x i32> undef, i32 %maxIterations18, i32 0
|
||||||
|
%less_i_load_count_load.i204308 = icmp sgt <1 x i32> %maxIterations_load_broadcast_init, zeroinitializer
|
||||||
|
%"oldMask&test.i205309" = select <1 x i1> %less_i_load_count_load.i204308, <1 x i32> <i32 -1>, <1 x i32> zeroinitializer
|
||||||
|
%item.i.i206310 = extractelement <1 x i32> %"oldMask&test.i205309", i32 0
|
||||||
|
%cmp.i.i207311 = icmp slt i32 %item.i.i206310, 0
|
||||||
|
%output_load_ptr2int = ptrtoint i32* %output20 to i64
|
||||||
|
%17 = xor i32 %height12, -1
|
||||||
|
%18 = add i32 %bid.i.i177, 1
|
||||||
|
%19 = mul i32 %yspan16, %18
|
||||||
|
%20 = xor i32 %19, -1
|
||||||
|
%21 = icmp sgt i32 %17, %20
|
||||||
|
%smax = select i1 %21, i32 %17, i32 %20
|
||||||
|
%22 = xor i32 %smax, -1
|
||||||
|
br label %for_test40.preheader
|
||||||
|
|
||||||
|
for_test40.preheader: ; preds = %for_exit43, %for_test40.preheader.lr.ph
|
||||||
|
%yi.0320 = phi i32 [ %mul_calltmp31_yspan_load, %for_test40.preheader.lr.ph ], [ %yi_load77_plus1, %for_exit43 ]
|
||||||
|
br i1 %less_xi_load_xend_load317, label %for_loop42.lr.ph, label %for_exit43
|
||||||
|
|
||||||
|
for_loop42.lr.ph: ; preds = %for_test40.preheader
|
||||||
|
%yi_load52_to_float = sitofp i32 %yi.0320 to float
|
||||||
|
%mul_yi_load52_to_float_dy_load = fmul float %dy8, %yi_load52_to_float
|
||||||
|
%add_y0_load_mul_yi_load52_to_float_dy_load = fadd float %y06, %mul_yi_load52_to_float_dy_load
|
||||||
|
%add_y0_load_mul_yi_load52_to_float_dy_load_broadcast_init = insertelement <1 x float> undef, float %add_y0_load_mul_yi_load52_to_float_dy_load, i32 0
|
||||||
|
%mul_yi_load56_width_load57 = mul i32 %yi.0320, %width10
|
||||||
|
br i1 %cmp.i.i207311, label %for_loop.i229.lr.ph.us, label %mandel___vyfvyfvyi.exit244
|
||||||
|
|
||||||
|
mandel___vyfvyfvyi.exit244.us: ; preds = %for_step.i212.us
|
||||||
|
%tid.i.i189.us = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #3
|
||||||
|
%tid.i.i.i190.us = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() #3
|
||||||
|
%sub_calltmp3_.i191.us = add i32 %tid.i.i.i190.us, -1
|
||||||
|
%bitop.i192.us = and i32 %sub_calltmp3_.i191.us, %tid.i.i189.us
|
||||||
|
%add_xi_load62_calltmp65.us = add i32 %bitop.i192.us, %xi.0318.us
|
||||||
|
%less_add_xi_load62_calltmp65_xend_load66.us = icmp slt i32 %add_xi_load62_calltmp65.us, %r.i.i
|
||||||
|
br i1 %less_add_xi_load62_calltmp65_xend_load66.us, label %if_then.us, label %if_exit.us
|
||||||
|
|
||||||
|
if_then.us: ; preds = %mandel___vyfvyfvyi.exit244.us
|
||||||
|
%tid.i.i.i194.us = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() #3
|
||||||
|
%sub_calltmp3_.i195.us = add i32 %tid.i.i.i194.us, 1073741823
|
||||||
|
%tid.i.i193.us = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #3
|
||||||
|
%bitop.i196.us = and i32 %sub_calltmp3_.i195.us, %tid.i.i193.us
|
||||||
|
%add_xi_load58_calltmp61.us = add i32 %xi.0318.us, %mul_yi_load56_width_load57
|
||||||
|
%add_mul_yi_load56_width_load57_add_xi_load58_calltmp61.us = add i32 %add_xi_load58_calltmp61.us, %bitop.i196.us
|
||||||
|
%23 = shl i32 %add_mul_yi_load56_width_load57_add_xi_load58_calltmp61.us, 2
|
||||||
|
%iptr__id.i264.rhs.us = sext i32 %23 to i64
|
||||||
|
%iptr__id.i264.us = add i64 %iptr__id.i264.rhs.us, %output_load_ptr2int
|
||||||
|
%ptr__id.i265.us = inttoptr i64 %iptr__id.i264.us to i32*
|
||||||
|
store i32 %sel.i.i291.us, i32* %ptr__id.i265.us, align 4
|
||||||
|
br label %if_exit.us
|
||||||
|
|
||||||
|
if_exit.us: ; preds = %if_then.us, %mandel___vyfvyfvyi.exit244.us
|
||||||
|
%tid.i.i188.us = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() #3
|
||||||
|
%add_xi_load76_calltmp74.us = add i32 %tid.i.i188.us, %xi.0318.us
|
||||||
|
%less_xi_load_xend_load.us = icmp slt i32 %add_xi_load76_calltmp74.us, %r.i.i
|
||||||
|
br i1 %less_xi_load_xend_load.us, label %for_loop.i229.lr.ph.us, label %for_exit43
|
||||||
|
|
||||||
|
for_loop.i229.us: ; preds = %for_loop.i229.lr.ph.us, %for_step.i212.us
|
||||||
|
%"oldMask&test.i205316.us" = phi <1 x i32> [ %"oldMask&test.i205309", %for_loop.i229.lr.ph.us ], [ %"oldMask&test.i205.us", %for_step.i212.us ]
|
||||||
|
%break_lanes_memory.0.i201315.us = phi <1 x i32> [ zeroinitializer, %for_loop.i229.lr.ph.us ], [ %"mask|break_mask.i220.us", %for_step.i212.us ]
|
||||||
|
%r.i.i292295314.us = phi <1 x i32> [ zeroinitializer, %for_loop.i229.lr.ph.us ], [ %r.i.i292.us, %for_step.i212.us ]
|
||||||
|
%add_x0_load_mul_add_xi_load48_calltmp51_to_float_dx_load_broadcast_init301313.us = phi <1 x float> [ %add_x0_load_mul_add_xi_load48_calltmp51_to_float_dx_load_broadcast_init.us, %for_loop.i229.lr.ph.us ], [ %add_x0_load_mul_add_xi_load48_calltmp51_to_float_dx_load_broadcast_init300.us, %for_step.i212.us ]
|
||||||
|
%add_y0_load_mul_yi_load52_to_float_dy_load_broadcast_init303312.us = phi <1 x float> [ %add_y0_load_mul_yi_load52_to_float_dy_load_broadcast_init, %for_loop.i229.lr.ph.us ], [ %add_y0_load_mul_yi_load52_to_float_dy_load_broadcast_init302.us, %for_step.i212.us ]
|
||||||
|
%mul_z_re_load_z_re_load13.i214.us = fmul <1 x float> %add_x0_load_mul_add_xi_load48_calltmp51_to_float_dx_load_broadcast_init301313.us, %add_x0_load_mul_add_xi_load48_calltmp51_to_float_dx_load_broadcast_init301313.us
|
||||||
|
%mul_z_im_load_z_im_load14.i216.us = fmul <1 x float> %add_y0_load_mul_yi_load52_to_float_dy_load_broadcast_init303312.us, %add_y0_load_mul_yi_load52_to_float_dy_load_broadcast_init303312.us
|
||||||
|
%add_mul_z_re_load_z_re_load13_mul_z_im_load_z_im_load14.i217.us = fadd <1 x float> %mul_z_im_load_z_im_load14.i216.us, %mul_z_re_load_z_re_load13.i214.us
|
||||||
|
%greater_add_mul_z_re_load_z_re_load13_mul_z_im_load_z_im_load14_.i218.us = fcmp ugt <1 x float> %add_mul_z_re_load_z_re_load13_mul_z_im_load_z_im_load14.i217.us, <float 4.000000e+00>
|
||||||
|
%"oldMask&test16.i219.us" = select <1 x i1> %greater_add_mul_z_re_load_z_re_load13_mul_z_im_load_z_im_load14_.i218.us, <1 x i32> %"oldMask&test.i205316.us", <1 x i32> zeroinitializer
|
||||||
|
%"mask|break_mask.i220.us" = or <1 x i32> %"oldMask&test16.i219.us", %break_lanes_memory.0.i201315.us
|
||||||
|
%item.i63.i222.us = extractelement <1 x i32> %"mask|break_mask.i220.us", i32 0
|
||||||
|
%v.i64.i223.us = lshr i32 %item.i63.i222.us, 31
|
||||||
|
%item.i62.i225.us = extractelement <1 x i32> %"oldMask&test.i205316.us", i32 0
|
||||||
|
%v.i.i226.us = lshr i32 %item.i62.i225.us, 31
|
||||||
|
%"equal_finished&func_internal_mask&function_mask12.i228.us" = icmp eq i32 %v.i64.i223.us, %v.i.i226.us
|
||||||
|
br i1 %"equal_finished&func_internal_mask&function_mask12.i228.us", label %for_step.i212.us, label %not_all_continued_or_breaked.i243.us
|
||||||
|
|
||||||
|
not_all_continued_or_breaked.i243.us: ; preds = %for_loop.i229.us
|
||||||
|
%"!(break|continue)_lanes.i232.us" = xor <1 x i32> %"mask|break_mask.i220.us", <i32 -1>
|
||||||
|
%new_mask28.i233.us = and <1 x i32> %"oldMask&test.i205316.us", %"!(break|continue)_lanes.i232.us"
|
||||||
|
%sub_mul_z_re_load31_z_re_load32_mul_z_im_load33_z_im_load34.i238.us = fsub <1 x float> %mul_z_re_load_z_re_load13.i214.us, %mul_z_im_load_z_im_load14.i216.us
|
||||||
|
%mul__z_re_load35.i239.us = fmul <1 x float> %add_x0_load_mul_add_xi_load48_calltmp51_to_float_dx_load_broadcast_init301313.us, <float 2.000000e+00>
|
||||||
|
%mul_mul__z_re_load35_z_im_load36.i240.us = fmul <1 x float> %add_y0_load_mul_yi_load52_to_float_dy_load_broadcast_init303312.us, %mul__z_re_load35.i239.us
|
||||||
|
%add_c_re_load42_new_re_load.i241.us = fadd <1 x float> %add_x0_load_mul_add_xi_load48_calltmp51_to_float_dx_load_broadcast_init.us, %sub_mul_z_re_load31_z_re_load32_mul_z_im_load33_z_im_load34.i238.us
|
||||||
|
%add_c_im_load44_new_im_load.i242.us = fadd <1 x float> %add_y0_load_mul_yi_load52_to_float_dy_load_broadcast_init, %mul_mul__z_re_load35_z_im_load36.i240.us
|
||||||
|
br label %for_step.i212.us
|
||||||
|
|
||||||
|
for_step.i212.us: ; preds = %not_all_continued_or_breaked.i243.us, %for_loop.i229.us
|
||||||
|
%add_y0_load_mul_yi_load52_to_float_dy_load_broadcast_init302.us = phi <1 x float> [ %add_y0_load_mul_yi_load52_to_float_dy_load_broadcast_init303312.us, %for_loop.i229.us ], [ %add_c_im_load44_new_im_load.i242.us, %not_all_continued_or_breaked.i243.us ]
|
||||||
|
%add_x0_load_mul_add_xi_load48_calltmp51_to_float_dx_load_broadcast_init300.us = phi <1 x float> [ %add_x0_load_mul_add_xi_load48_calltmp51_to_float_dx_load_broadcast_init301313.us, %for_loop.i229.us ], [ %add_c_re_load42_new_re_load.i241.us, %not_all_continued_or_breaked.i243.us ]
|
||||||
|
%internal_mask_memory.1.i209.us = phi <1 x i32> [ zeroinitializer, %for_loop.i229.us ], [ %new_mask28.i233.us, %not_all_continued_or_breaked.i243.us ]
|
||||||
|
%m.i.i287.us = extractelement <1 x i32> %internal_mask_memory.1.i209.us, i32 0
|
||||||
|
%d0.i.i289.us = extractelement <1 x i32> %r.i.i292295314.us, i32 0
|
||||||
|
%not.cmp.i.i288.us = icmp ne i32 %m.i.i287.us, 0
|
||||||
|
%d1.i.i290.us = zext i1 %not.cmp.i.i288.us to i32
|
||||||
|
%sel.i.i291.us = add i32 %d0.i.i289.us, %d1.i.i290.us
|
||||||
|
%r.i.i292.us = insertelement <1 x i32> undef, i32 %sel.i.i291.us, i32 0
|
||||||
|
%less_i_load_count_load.i204.us = icmp slt <1 x i32> %r.i.i292.us, %maxIterations_load_broadcast_init
|
||||||
|
%"oldMask&test.i205.us" = select <1 x i1> %less_i_load_count_load.i204.us, <1 x i32> %internal_mask_memory.1.i209.us, <1 x i32> zeroinitializer
|
||||||
|
%item.i.i206.us = extractelement <1 x i32> %"oldMask&test.i205.us", i32 0
|
||||||
|
%cmp.i.i207.us = icmp slt i32 %item.i.i206.us, 0
|
||||||
|
br i1 %cmp.i.i207.us, label %for_loop.i229.us, label %mandel___vyfvyfvyi.exit244.us
|
||||||
|
|
||||||
|
for_loop.i229.lr.ph.us: ; preds = %if_exit.us, %for_loop42.lr.ph
|
||||||
|
%xi.0318.us = phi i32 [ %add_xi_load76_calltmp74.us, %if_exit.us ], [ %mul_calltmp_xspan_load, %for_loop42.lr.ph ]
|
||||||
|
%tid.i.i180.us = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #3
|
||||||
|
%tid.i.i.i181.us = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() #3
|
||||||
|
%sub_calltmp3_.i182.us = add i32 %tid.i.i.i181.us, -1
|
||||||
|
%bitop.i183.us = and i32 %sub_calltmp3_.i182.us, %tid.i.i180.us
|
||||||
|
%add_xi_load48_calltmp51.us = add i32 %bitop.i183.us, %xi.0318.us
|
||||||
|
%add_xi_load48_calltmp51_to_float.us = sitofp i32 %add_xi_load48_calltmp51.us to float
|
||||||
|
%mul_add_xi_load48_calltmp51_to_float_dx_load.us = fmul float %dx4, %add_xi_load48_calltmp51_to_float.us
|
||||||
|
%add_x0_load_mul_add_xi_load48_calltmp51_to_float_dx_load.us = fadd float %x02, %mul_add_xi_load48_calltmp51_to_float_dx_load.us
|
||||||
|
%add_x0_load_mul_add_xi_load48_calltmp51_to_float_dx_load_broadcast_init.us = insertelement <1 x float> undef, float %add_x0_load_mul_add_xi_load48_calltmp51_to_float_dx_load.us, i32 0
|
||||||
|
br label %for_loop.i229.us
|
||||||
|
|
||||||
|
for_exit: ; preds = %for_exit118, %for_exit43, %for_test.preheader, %for_test104.preheader
|
||||||
|
ret void
|
||||||
|
|
||||||
|
mandel___vyfvyfvyi.exit244: ; preds = %if_exit, %for_loop42.lr.ph
|
||||||
|
%xi.0318 = phi i32 [ %add_xi_load76_calltmp74, %if_exit ], [ %mul_calltmp_xspan_load, %for_loop42.lr.ph ]
|
||||||
|
%tid.i.i189 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #3
|
||||||
|
%tid.i.i.i190 = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() #3
|
||||||
|
%sub_calltmp3_.i191 = add i32 %tid.i.i.i190, -1
|
||||||
|
%bitop.i192 = and i32 %sub_calltmp3_.i191, %tid.i.i189
|
||||||
|
%add_xi_load62_calltmp65 = add i32 %bitop.i192, %xi.0318
|
||||||
|
%less_add_xi_load62_calltmp65_xend_load66 = icmp slt i32 %add_xi_load62_calltmp65, %r.i.i
|
||||||
|
br i1 %less_add_xi_load62_calltmp65_xend_load66, label %if_then, label %if_exit
|
||||||
|
|
||||||
|
for_exit43: ; preds = %if_exit, %if_exit.us, %for_test40.preheader
|
||||||
|
%yi_load77_plus1 = add i32 %yi.0320, 1
|
||||||
|
%exitcond = icmp eq i32 %yi_load77_plus1, %22
|
||||||
|
br i1 %exitcond, label %for_exit, label %for_test40.preheader
|
||||||
|
|
||||||
|
if_then: ; preds = %mandel___vyfvyfvyi.exit244
|
||||||
|
%tid.i.i.i194 = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() #3
|
||||||
|
%sub_calltmp3_.i195 = add i32 %tid.i.i.i194, 1073741823
|
||||||
|
%tid.i.i193 = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #3
|
||||||
|
%bitop.i196 = and i32 %sub_calltmp3_.i195, %tid.i.i193
|
||||||
|
%add_xi_load58_calltmp61 = add i32 %xi.0318, %mul_yi_load56_width_load57
|
||||||
|
%add_mul_yi_load56_width_load57_add_xi_load58_calltmp61 = add i32 %add_xi_load58_calltmp61, %bitop.i196
|
||||||
|
%24 = shl i32 %add_mul_yi_load56_width_load57_add_xi_load58_calltmp61, 2
|
||||||
|
%iptr__id.i264.rhs = sext i32 %24 to i64
|
||||||
|
%iptr__id.i264 = add i64 %iptr__id.i264.rhs, %output_load_ptr2int
|
||||||
|
%ptr__id.i265 = inttoptr i64 %iptr__id.i264 to i32*
|
||||||
|
store i32 0, i32* %ptr__id.i265, align 4
|
||||||
|
br label %if_exit
|
||||||
|
|
||||||
|
if_exit: ; preds = %if_then, %mandel___vyfvyfvyi.exit244
|
||||||
|
%tid.i.i188 = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() #3
|
||||||
|
%add_xi_load76_calltmp74 = add i32 %tid.i.i188, %xi.0318
|
||||||
|
%less_xi_load_xend_load = icmp slt i32 %add_xi_load76_calltmp74, %r.i.i
|
||||||
|
br i1 %less_xi_load_xend_load, label %mandel___vyfvyfvyi.exit244, label %for_exit43
|
||||||
|
|
||||||
|
for_test115.preheader: ; preds = %for_exit118, %for_test115.preheader.lr.ph
|
||||||
|
%yi109.0335 = phi i32 [ %mul_calltmp31_yspan_load, %for_test115.preheader.lr.ph ], [ %yi_load171_plus1, %for_exit118 ]
|
||||||
|
br i1 %less_xi_load122_xend_load123331, label %for_loop117.lr.ph, label %for_exit118
|
||||||
|
|
||||||
|
for_loop117.lr.ph: ; preds = %for_test115.preheader
|
||||||
|
%yi_load135_to_float = sitofp i32 %yi109.0335 to float
|
||||||
|
%mul_yi_load135_to_float_dy_load136 = fmul float %dy8, %yi_load135_to_float
|
||||||
|
%add_y0_load134_mul_yi_load135_to_float_dy_load136 = fadd float %y06, %mul_yi_load135_to_float_dy_load136
|
||||||
|
%add_y0_load134_mul_yi_load135_to_float_dy_load136_broadcast_init = insertelement <1 x float> undef, float %add_y0_load134_mul_yi_load135_to_float_dy_load136, i32 0
|
||||||
|
br i1 %cmp.i.i325, label %for_loop.i.lr.ph.us, label %if_exit159
|
||||||
|
|
||||||
|
if_exit159.us: ; preds = %for_step.i.us
|
||||||
|
%tid.i.i.us = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() #3
|
||||||
|
%add_xi120_load_calltmp169.us = add i32 %tid.i.i.us, %xi120.0332.us
|
||||||
|
%less_xi_load122_xend_load123.us = icmp slt i32 %add_xi120_load_calltmp169.us, %r.i.i
|
||||||
|
br i1 %less_xi_load122_xend_load123.us, label %for_loop.i.lr.ph.us, label %for_exit118
|
||||||
|
|
||||||
|
for_loop.i.us: ; preds = %for_loop.i.lr.ph.us, %for_step.i.us
|
||||||
|
%"oldMask&test.i329.us" = phi <1 x i32> [ %"oldMask&test.i322", %for_loop.i.lr.ph.us ], [ %"oldMask&test.i.us", %for_step.i.us ]
|
||||||
|
%break_lanes_memory.0.i328.us = phi <1 x i32> [ zeroinitializer, %for_loop.i.lr.ph.us ], [ %"mask|break_mask.i.us", %for_step.i.us ]
|
||||||
|
%25 = phi <1 x i32> [ zeroinitializer, %for_loop.i.lr.ph.us ], [ %r.i.i261.us, %for_step.i.us ]
|
||||||
|
%add_x0_load127_mul_add_xi_load128_calltmp131_to_float_dx_load132_broadcast_init305327.us = phi <1 x float> [ %add_x0_load127_mul_add_xi_load128_calltmp131_to_float_dx_load132_broadcast_init.us, %for_loop.i.lr.ph.us ], [ %add_x0_load127_mul_add_xi_load128_calltmp131_to_float_dx_load132_broadcast_init304.us, %for_step.i.us ]
|
||||||
|
%add_y0_load134_mul_yi_load135_to_float_dy_load136_broadcast_init307326.us = phi <1 x float> [ %add_y0_load134_mul_yi_load135_to_float_dy_load136_broadcast_init, %for_loop.i.lr.ph.us ], [ %add_y0_load134_mul_yi_load135_to_float_dy_load136_broadcast_init306.us, %for_step.i.us ]
|
||||||
|
%"internal_mask&function_mask12.i.us" = and <1 x i32> %"oldMask&test.i329.us", %mask
|
||||||
|
%mul_z_re_load_z_re_load13.i.us = fmul <1 x float> %add_x0_load127_mul_add_xi_load128_calltmp131_to_float_dx_load132_broadcast_init305327.us, %add_x0_load127_mul_add_xi_load128_calltmp131_to_float_dx_load132_broadcast_init305327.us
|
||||||
|
%mul_z_im_load_z_im_load14.i.us = fmul <1 x float> %add_y0_load134_mul_yi_load135_to_float_dy_load136_broadcast_init307326.us, %add_y0_load134_mul_yi_load135_to_float_dy_load136_broadcast_init307326.us
|
||||||
|
%add_mul_z_re_load_z_re_load13_mul_z_im_load_z_im_load14.i.us = fadd <1 x float> %mul_z_im_load_z_im_load14.i.us, %mul_z_re_load_z_re_load13.i.us
|
||||||
|
%greater_add_mul_z_re_load_z_re_load13_mul_z_im_load_z_im_load14_.i.us = fcmp ugt <1 x float> %add_mul_z_re_load_z_re_load13_mul_z_im_load_z_im_load14.i.us, <float 4.000000e+00>
|
||||||
|
%"oldMask&test16.i.us" = select <1 x i1> %greater_add_mul_z_re_load_z_re_load13_mul_z_im_load_z_im_load14_.i.us, <1 x i32> %"oldMask&test.i329.us", <1 x i32> zeroinitializer
|
||||||
|
%"mask|break_mask.i.us" = or <1 x i32> %"oldMask&test16.i.us", %break_lanes_memory.0.i328.us
|
||||||
|
%"finished&func.i.us" = and <1 x i32> %"mask|break_mask.i.us", %mask
|
||||||
|
%item.i63.i.us = extractelement <1 x i32> %"finished&func.i.us", i32 0
|
||||||
|
%v.i64.i.us = lshr i32 %item.i63.i.us, 31
|
||||||
|
%item.i62.i.us = extractelement <1 x i32> %"internal_mask&function_mask12.i.us", i32 0
|
||||||
|
%v.i.i.us = lshr i32 %item.i62.i.us, 31
|
||||||
|
%"equal_finished&func_internal_mask&function_mask12.i.us" = icmp eq i32 %v.i64.i.us, %v.i.i.us
|
||||||
|
br i1 %"equal_finished&func_internal_mask&function_mask12.i.us", label %for_step.i.us, label %not_all_continued_or_breaked.i.us
|
||||||
|
|
||||||
|
not_all_continued_or_breaked.i.us: ; preds = %for_loop.i.us
|
||||||
|
%"!(break|continue)_lanes.i.us" = xor <1 x i32> %"mask|break_mask.i.us", <i32 -1>
|
||||||
|
%new_mask28.i.us = and <1 x i32> %"oldMask&test.i329.us", %"!(break|continue)_lanes.i.us"
|
||||||
|
%sub_mul_z_re_load31_z_re_load32_mul_z_im_load33_z_im_load34.i.us = fsub <1 x float> %mul_z_re_load_z_re_load13.i.us, %mul_z_im_load_z_im_load14.i.us
|
||||||
|
%mul__z_re_load35.i.us = fmul <1 x float> %add_x0_load127_mul_add_xi_load128_calltmp131_to_float_dx_load132_broadcast_init305327.us, <float 2.000000e+00>
|
||||||
|
%mul_mul__z_re_load35_z_im_load36.i.us = fmul <1 x float> %add_y0_load134_mul_yi_load135_to_float_dy_load136_broadcast_init307326.us, %mul__z_re_load35.i.us
|
||||||
|
%add_c_re_load42_new_re_load.i.us = fadd <1 x float> %add_x0_load127_mul_add_xi_load128_calltmp131_to_float_dx_load132_broadcast_init.us, %sub_mul_z_re_load31_z_re_load32_mul_z_im_load33_z_im_load34.i.us
|
||||||
|
%add_c_im_load44_new_im_load.i.us = fadd <1 x float> %add_y0_load134_mul_yi_load135_to_float_dy_load136_broadcast_init, %mul_mul__z_re_load35_z_im_load36.i.us
|
||||||
|
br label %for_step.i.us
|
||||||
|
|
||||||
|
for_step.i.us: ; preds = %not_all_continued_or_breaked.i.us, %for_loop.i.us
|
||||||
|
%add_y0_load134_mul_yi_load135_to_float_dy_load136_broadcast_init306.us = phi <1 x float> [ %add_y0_load134_mul_yi_load135_to_float_dy_load136_broadcast_init307326.us, %for_loop.i.us ], [ %add_c_im_load44_new_im_load.i.us, %not_all_continued_or_breaked.i.us ]
|
||||||
|
%add_x0_load127_mul_add_xi_load128_calltmp131_to_float_dx_load132_broadcast_init304.us = phi <1 x float> [ %add_x0_load127_mul_add_xi_load128_calltmp131_to_float_dx_load132_broadcast_init305327.us, %for_loop.i.us ], [ %add_c_re_load42_new_re_load.i.us, %not_all_continued_or_breaked.i.us ]
|
||||||
|
%internal_mask_memory.1.i.us = phi <1 x i32> [ zeroinitializer, %for_loop.i.us ], [ %new_mask28.i.us, %not_all_continued_or_breaked.i.us ]
|
||||||
|
%m.i.i.us = extractelement <1 x i32> %internal_mask_memory.1.i.us, i32 0
|
||||||
|
%d0.i.i259.us = extractelement <1 x i32> %25, i32 0
|
||||||
|
%not.cmp.i.i258.us = icmp ne i32 %m.i.i.us, 0
|
||||||
|
%d1.i.i260.us = zext i1 %not.cmp.i.i258.us to i32
|
||||||
|
%sel.i.i.us = add i32 %d0.i.i259.us, %d1.i.i260.us
|
||||||
|
%r.i.i261.us = insertelement <1 x i32> undef, i32 %sel.i.i.us, i32 0
|
||||||
|
%less_i_load_count_load.i.us = icmp slt <1 x i32> %r.i.i261.us, %maxIterations_load140_broadcast_init
|
||||||
|
%"oldMask&test.i.us" = select <1 x i1> %less_i_load_count_load.i.us, <1 x i32> %internal_mask_memory.1.i.us, <1 x i32> zeroinitializer
|
||||||
|
%"internal_mask&function_mask10.i.us" = and <1 x i32> %"oldMask&test.i.us", %mask
|
||||||
|
%item.i.i.us = extractelement <1 x i32> %"internal_mask&function_mask10.i.us", i32 0
|
||||||
|
%cmp.i.i.us = icmp slt i32 %item.i.i.us, 0
|
||||||
|
br i1 %cmp.i.i.us, label %for_loop.i.us, label %if_exit159.us
|
||||||
|
|
||||||
|
for_loop.i.lr.ph.us: ; preds = %if_exit159.us, %for_loop117.lr.ph
|
||||||
|
%xi120.0332.us = phi i32 [ %add_xi120_load_calltmp169.us, %if_exit159.us ], [ %mul_calltmp_xspan_load, %for_loop117.lr.ph ]
|
||||||
|
%tid.i.i184.us = call i32 @llvm.nvvm.read.ptx.sreg.tid.x() #3
|
||||||
|
%tid.i.i.i185.us = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() #3
|
||||||
|
%sub_calltmp3_.i186.us = add i32 %tid.i.i.i185.us, -1
|
||||||
|
%bitop.i187.us = and i32 %sub_calltmp3_.i186.us, %tid.i.i184.us
|
||||||
|
%add_xi_load128_calltmp131.us = add i32 %bitop.i187.us, %xi120.0332.us
|
||||||
|
%add_xi_load128_calltmp131_to_float.us = sitofp i32 %add_xi_load128_calltmp131.us to float
|
||||||
|
%mul_add_xi_load128_calltmp131_to_float_dx_load132.us = fmul float %dx4, %add_xi_load128_calltmp131_to_float.us
|
||||||
|
%add_x0_load127_mul_add_xi_load128_calltmp131_to_float_dx_load132.us = fadd float %x02, %mul_add_xi_load128_calltmp131_to_float_dx_load132.us
|
||||||
|
%add_x0_load127_mul_add_xi_load128_calltmp131_to_float_dx_load132_broadcast_init.us = insertelement <1 x float> undef, float %add_x0_load127_mul_add_xi_load128_calltmp131_to_float_dx_load132.us, i32 0
|
||||||
|
br label %for_loop.i.us
|
||||||
|
|
||||||
|
for_exit118: ; preds = %if_exit159, %if_exit159.us, %for_test115.preheader
|
||||||
|
%yi_load171_plus1 = add i32 %yi109.0335, 1
|
||||||
|
%exitcond337 = icmp eq i32 %yi_load171_plus1, %16
|
||||||
|
br i1 %exitcond337, label %for_exit, label %for_test115.preheader
|
||||||
|
|
||||||
|
if_exit159: ; preds = %if_exit159, %for_loop117.lr.ph
|
||||||
|
%xi120.0332 = phi i32 [ %add_xi120_load_calltmp169, %if_exit159 ], [ %mul_calltmp_xspan_load, %for_loop117.lr.ph ]
|
||||||
|
%tid.i.i = call i32 @llvm.nvvm.read.ptx.sreg.warpsize() #3
|
||||||
|
%add_xi120_load_calltmp169 = add i32 %tid.i.i, %xi120.0332
|
||||||
|
%less_xi_load122_xend_load123 = icmp slt i32 %add_xi120_load_calltmp169, %r.i.i
|
||||||
|
br i1 %less_xi_load122_xend_load123, label %if_exit159, label %for_exit118
|
||||||
|
}
|
||||||
|
|
||||||
|
attributes #0 = { nounwind readnone }
|
||||||
|
attributes #1 = { alwaysinline nounwind readnone }
|
||||||
|
attributes #2 = { alwaysinline nounwind }
|
||||||
|
attributes #3 = { nounwind }
|
||||||
|
attributes #4 = { alwaysinline nounwind readonly }
|
||||||
|
attributes #5 = { nounwind "target-features"="+sm_35" }
|
||||||
|
!nvvm.annotations = !{!1}
|
||||||
|
!1 = metadata !{void ({ float, float, float, float, i32, i32, i32, i32, i32, i32*, <1 x i32> }* , i32, i32, i32, i32, i32, i32, i32, i32, i32, i32)* @mandelbrot_scanline___unfunfunfunfuniuniuniuniuniun_3C_uni_3E_, metadata !"kernel", i32 1}
|
||||||
53
examples_cuda/mandelbrot_tasks3d/mandel_task_cu.cu
Normal file
53
examples_cuda/mandelbrot_tasks3d/mandel_task_cu.cu
Normal file
@@ -0,0 +1,53 @@
|
|||||||
|
#include <stdio.h>
|
||||||
|
#define blockIndex0 (blockIdx.x)
|
||||||
|
#define blockIndex1 (blockIdx.y)
|
||||||
|
#define vectorWidth (32)
|
||||||
|
#define vectorIndex (threadIdx.x & (vectorWidth-1))
|
||||||
|
|
||||||
|
int __device__ __forceinline__
|
||||||
|
mandel(float c_re, float c_im, int count)
|
||||||
|
{
|
||||||
|
float z_re = c_re, z_im = c_im;
|
||||||
|
int i;
|
||||||
|
for (i = 0; i < count; ++i) {
|
||||||
|
if (z_re * z_re + z_im * z_im > 4.0f)
|
||||||
|
break;
|
||||||
|
|
||||||
|
float new_re = z_re*z_re - z_im*z_im;
|
||||||
|
float new_im = 2.0f * z_re * z_im;
|
||||||
|
{
|
||||||
|
z_re = c_re + new_re;
|
||||||
|
z_im = c_im + new_im;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return i;
|
||||||
|
}
|
||||||
|
|
||||||
|
extern "C"
|
||||||
|
__global__ void mandelbrot_scanline(
|
||||||
|
float x0, float dx,
|
||||||
|
float y0, float dy,
|
||||||
|
int width, int height,
|
||||||
|
int xspan, int yspan,
|
||||||
|
int maxIterations, int output[])
|
||||||
|
{
|
||||||
|
const int xstart = blockIndex0 * xspan;
|
||||||
|
const int xend = min(xstart + xspan, width);
|
||||||
|
|
||||||
|
const int ystart = blockIndex1 * yspan;
|
||||||
|
const int yend = min(ystart + yspan, height);
|
||||||
|
|
||||||
|
for (int yi = ystart; yi < yend; yi++)
|
||||||
|
for (int xi = xstart; xi < xend; xi += vectorWidth)
|
||||||
|
{
|
||||||
|
const float x = x0 + (xi + vectorIndex) * dx;
|
||||||
|
const float y = y0 + yi * dy;
|
||||||
|
|
||||||
|
const int res = mandel(x,y,maxIterations);
|
||||||
|
const int index = yi * width + (xi + vectorIndex);
|
||||||
|
if (xi + vectorIndex < xend)
|
||||||
|
output[index] = res;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
BIN
examples_cuda/mandelbrot_tasks3d/mandel_task_cu.cubin
Normal file
BIN
examples_cuda/mandelbrot_tasks3d/mandel_task_cu.cubin
Normal file
Binary file not shown.
BIN
examples_cuda/mandelbrot_tasks3d/mandel_task_cu.fatbin
Normal file
BIN
examples_cuda/mandelbrot_tasks3d/mandel_task_cu.fatbin
Normal file
Binary file not shown.
213
examples_cuda/mandelbrot_tasks3d/mandel_task_cu.ptx
Normal file
213
examples_cuda/mandelbrot_tasks3d/mandel_task_cu.ptx
Normal file
@@ -0,0 +1,213 @@
|
|||||||
|
//
|
||||||
|
// Generated by NVIDIA NVVM Compiler
|
||||||
|
// Compiler built on Thu Jul 18 02:37:37 2013 (1374107857)
|
||||||
|
// Cuda compilation tools, release 5.5, V5.5.0
|
||||||
|
//
|
||||||
|
|
||||||
|
.version 3.2
|
||||||
|
.target sm_35
|
||||||
|
.address_size 64
|
||||||
|
|
||||||
|
.file 1 "/home/evghenii/soft/ispc-code/ispc/examples/mandelbrot_tasks3d/mandel_task_cu.cu", 1383122156, 1370
|
||||||
|
.file 2 "/usr/local/cuda-5.5/bin/..//include/cuda_device_runtime_api.h", 1375338991, 7655
|
||||||
|
.file 3 "/usr/local/cuda-5.5/bin/..//include/device_functions.h", 1375338991, 185228
|
||||||
|
.extern .func (.param .b32 func_retval0) vprintf
|
||||||
|
(
|
||||||
|
.param .b64 vprintf_param_0,
|
||||||
|
.param .b64 vprintf_param_1
|
||||||
|
)
|
||||||
|
;
|
||||||
|
.global .align 1 .b8 $str[26] = {118, 101, 99, 116, 111, 114, 73, 110, 100, 101, 120, 61, 32, 37, 100, 32, 32, 98, 105, 100, 61, 32, 37, 100, 10, 0};
|
||||||
|
|
||||||
|
.weak .func (.param .b32 func_retval0) cudaMalloc(
|
||||||
|
.param .b64 cudaMalloc_param_0,
|
||||||
|
.param .b64 cudaMalloc_param_1
|
||||||
|
)
|
||||||
|
{
|
||||||
|
.reg .s32 %r<2>;
|
||||||
|
|
||||||
|
|
||||||
|
mov.u32 %r1, 30;
|
||||||
|
st.param.b32 [func_retval0+0], %r1;
|
||||||
|
.loc 2 66 3
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
.weak .func (.param .b32 func_retval0) cudaFuncGetAttributes(
|
||||||
|
.param .b64 cudaFuncGetAttributes_param_0,
|
||||||
|
.param .b64 cudaFuncGetAttributes_param_1
|
||||||
|
)
|
||||||
|
{
|
||||||
|
.reg .s32 %r<2>;
|
||||||
|
|
||||||
|
|
||||||
|
mov.u32 %r1, 30;
|
||||||
|
st.param.b32 [func_retval0+0], %r1;
|
||||||
|
.loc 2 71 3
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
.visible .entry mandelbrot_scanline(
|
||||||
|
.param .f32 mandelbrot_scanline_param_0,
|
||||||
|
.param .f32 mandelbrot_scanline_param_1,
|
||||||
|
.param .f32 mandelbrot_scanline_param_2,
|
||||||
|
.param .f32 mandelbrot_scanline_param_3,
|
||||||
|
.param .u32 mandelbrot_scanline_param_4,
|
||||||
|
.param .u32 mandelbrot_scanline_param_5,
|
||||||
|
.param .u32 mandelbrot_scanline_param_6,
|
||||||
|
.param .u32 mandelbrot_scanline_param_7,
|
||||||
|
.param .u32 mandelbrot_scanline_param_8,
|
||||||
|
.param .u64 mandelbrot_scanline_param_9
|
||||||
|
)
|
||||||
|
{
|
||||||
|
.local .align 8 .b8 __local_depot2[8];
|
||||||
|
.reg .b64 %SP;
|
||||||
|
.reg .b64 %SPL;
|
||||||
|
.reg .pred %p<9>;
|
||||||
|
.reg .s32 %r<40>;
|
||||||
|
.reg .f32 %f<20>;
|
||||||
|
.reg .s64 %rd<8>;
|
||||||
|
|
||||||
|
|
||||||
|
mov.u64 %SPL, __local_depot2;
|
||||||
|
cvta.local.u64 %SP, %SPL;
|
||||||
|
ld.param.f32 %f9, [mandelbrot_scanline_param_0];
|
||||||
|
ld.param.f32 %f10, [mandelbrot_scanline_param_1];
|
||||||
|
ld.param.f32 %f11, [mandelbrot_scanline_param_2];
|
||||||
|
ld.param.f32 %f12, [mandelbrot_scanline_param_3];
|
||||||
|
ld.param.u32 %r14, [mandelbrot_scanline_param_4];
|
||||||
|
ld.param.u32 %r17, [mandelbrot_scanline_param_5];
|
||||||
|
ld.param.u32 %r15, [mandelbrot_scanline_param_6];
|
||||||
|
ld.param.u32 %r18, [mandelbrot_scanline_param_7];
|
||||||
|
ld.param.u32 %r16, [mandelbrot_scanline_param_8];
|
||||||
|
ld.param.u64 %rd1, [mandelbrot_scanline_param_9];
|
||||||
|
add.u64 %rd2, %SP, 0;
|
||||||
|
.loc 1 35 1
|
||||||
|
cvta.to.local.u64 %rd3, %rd2;
|
||||||
|
mov.u32 %r19, %tid.x;
|
||||||
|
and.b32 %r20, %r19, 31;
|
||||||
|
mov.u32 %r21, %ntid.x;
|
||||||
|
cvta.global.u64 %rd4, $str;
|
||||||
|
st.local.v2.u32 [%rd3], {%r20, %r21};
|
||||||
|
// Callseq Start 0
|
||||||
|
{
|
||||||
|
.reg .b32 temp_param_reg;
|
||||||
|
.param .b64 param0;
|
||||||
|
st.param.b64 [param0+0], %rd4;
|
||||||
|
.param .b64 param1;
|
||||||
|
st.param.b64 [param1+0], %rd2;
|
||||||
|
.param .b32 retval0;
|
||||||
|
.loc 1 35 1
|
||||||
|
call.uni (retval0),
|
||||||
|
vprintf,
|
||||||
|
(
|
||||||
|
param0,
|
||||||
|
param1
|
||||||
|
);
|
||||||
|
ld.param.b32 %r22, [retval0+0];
|
||||||
|
}
|
||||||
|
// Callseq End 0
|
||||||
|
.loc 1 36 1
|
||||||
|
mov.u32 %r23, %ctaid.x;
|
||||||
|
.loc 1 37 1
|
||||||
|
mad.lo.s32 %r24, %r23, %r15, %r15;
|
||||||
|
.loc 3 2621 10
|
||||||
|
min.s32 %r1, %r24, %r14;
|
||||||
|
.loc 1 39 1
|
||||||
|
mov.u32 %r25, %ctaid.y;
|
||||||
|
mul.lo.s32 %r37, %r25, %r18;
|
||||||
|
.loc 1 40 1
|
||||||
|
add.s32 %r26, %r37, %r18;
|
||||||
|
.loc 3 2621 10
|
||||||
|
min.s32 %r3, %r26, %r17;
|
||||||
|
.loc 1 42 1
|
||||||
|
setp.ge.s32 %p1, %r37, %r3;
|
||||||
|
@%p1 bra BB2_12;
|
||||||
|
|
||||||
|
cvta.to.global.u64 %rd5, %rd1;
|
||||||
|
|
||||||
|
BB2_2:
|
||||||
|
.loc 1 36 1
|
||||||
|
mul.lo.s32 %r38, %r23, %r15;
|
||||||
|
.loc 1 43 1
|
||||||
|
setp.ge.s32 %p2, %r38, %r1;
|
||||||
|
@%p2 bra BB2_11;
|
||||||
|
|
||||||
|
.loc 1 46 1
|
||||||
|
cvt.rn.f32.s32 %f13, %r37;
|
||||||
|
fma.rn.f32 %f1, %f13, %f12, %f11;
|
||||||
|
|
||||||
|
BB2_4:
|
||||||
|
.loc 1 45 1
|
||||||
|
add.s32 %r7, %r20, %r38;
|
||||||
|
cvt.rn.f32.u32 %f14, %r7;
|
||||||
|
fma.rn.f32 %f2, %f14, %f10, %f9;
|
||||||
|
mov.u32 %r39, 0;
|
||||||
|
setp.gt.s32 %p3, %r16, 0;
|
||||||
|
.loc 1 12 1
|
||||||
|
@%p3 bra BB2_5;
|
||||||
|
bra.uni BB2_8;
|
||||||
|
|
||||||
|
BB2_5:
|
||||||
|
mov.f32 %f18, %f1;
|
||||||
|
mov.f32 %f19, %f2;
|
||||||
|
|
||||||
|
BB2_6:
|
||||||
|
.loc 1 13 1
|
||||||
|
mov.f32 %f4, %f19;
|
||||||
|
mov.f32 %f3, %f18;
|
||||||
|
mul.f32 %f5, %f3, %f3;
|
||||||
|
mul.f32 %f6, %f4, %f4;
|
||||||
|
add.f32 %f15, %f6, %f5;
|
||||||
|
setp.gt.f32 %p4, %f15, 0f40800000;
|
||||||
|
@%p4 bra BB2_8;
|
||||||
|
|
||||||
|
.loc 1 16 1
|
||||||
|
sub.f32 %f16, %f6, %f5;
|
||||||
|
.loc 1 17 1
|
||||||
|
add.f32 %f17, %f4, %f4;
|
||||||
|
.loc 1 19 1
|
||||||
|
add.f32 %f7, %f2, %f16;
|
||||||
|
.loc 1 20 1
|
||||||
|
fma.rn.f32 %f8, %f17, %f3, %f1;
|
||||||
|
.loc 1 12 96
|
||||||
|
add.s32 %r39, %r39, 1;
|
||||||
|
.loc 1 12 1
|
||||||
|
setp.lt.s32 %p5, %r39, %r16;
|
||||||
|
mov.f32 %f18, %f8;
|
||||||
|
mov.f32 %f19, %f7;
|
||||||
|
@%p5 bra BB2_6;
|
||||||
|
|
||||||
|
BB2_8:
|
||||||
|
.loc 1 49 1
|
||||||
|
mad.lo.s32 %r34, %r37, %r14, %r38;
|
||||||
|
add.s32 %r11, %r34, %r20;
|
||||||
|
.loc 1 50 1
|
||||||
|
setp.ge.u32 %p6, %r7, %r1;
|
||||||
|
@%p6 bra BB2_10;
|
||||||
|
|
||||||
|
.loc 1 51 1
|
||||||
|
mul.wide.s32 %rd6, %r11, 4;
|
||||||
|
add.s64 %rd7, %rd5, %rd6;
|
||||||
|
st.global.u32 [%rd7], %r39;
|
||||||
|
|
||||||
|
BB2_10:
|
||||||
|
.loc 1 43 57
|
||||||
|
add.s32 %r38, %r38, 32;
|
||||||
|
.loc 1 43 1
|
||||||
|
setp.lt.s32 %p7, %r38, %r1;
|
||||||
|
@%p7 bra BB2_4;
|
||||||
|
|
||||||
|
BB2_11:
|
||||||
|
.loc 1 42 57
|
||||||
|
add.s32 %r37, %r37, 1;
|
||||||
|
.loc 1 42 1
|
||||||
|
setp.lt.s32 %p8, %r37, %r3;
|
||||||
|
@%p8 bra BB2_2;
|
||||||
|
|
||||||
|
BB2_12:
|
||||||
|
.loc 1 53 2
|
||||||
|
ret;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
22
examples_cuda/mandelbrot_tasks3d/mandelbrot_launch.ispc
Normal file
22
examples_cuda/mandelbrot_tasks3d/mandelbrot_launch.ispc
Normal file
@@ -0,0 +1,22 @@
|
|||||||
|
extern task void
|
||||||
|
mandelbrot_scanline(
|
||||||
|
uniform float x0, uniform float dx,
|
||||||
|
uniform float y0, uniform float dy,
|
||||||
|
uniform int width, uniform int height,
|
||||||
|
uniform int xspan, uniform int yspan,
|
||||||
|
uniform int maxIterations, uniform int output[]);
|
||||||
|
|
||||||
|
export void
|
||||||
|
mandelbrot_ispc(uniform float x0, uniform float y0,
|
||||||
|
uniform float x1, uniform float y1,
|
||||||
|
uniform int width, uniform int height,
|
||||||
|
uniform int maxIterations, uniform int output[]) {
|
||||||
|
uniform float dx = (x1 - x0) / width;
|
||||||
|
uniform float dy = (y1 - y0) / height;
|
||||||
|
const uniform int xspan = 16; /* make sure it is big enough to avoid false-sharing */
|
||||||
|
const uniform int yspan = 16;
|
||||||
|
|
||||||
|
launch [width/xspan, height/yspan]
|
||||||
|
mandelbrot_scanline(x0, dx, y0, dy, width, height, xspan, yspan,
|
||||||
|
maxIterations, output);
|
||||||
|
}
|
||||||
BIN
examples_cuda/mandelbrot_tasks3d/mandelbrot_task.bc
Normal file
BIN
examples_cuda/mandelbrot_tasks3d/mandelbrot_task.bc
Normal file
Binary file not shown.
BIN
examples_cuda/mandelbrot_tasks3d/mandelbrot_task.fatbin
Normal file
BIN
examples_cuda/mandelbrot_tasks3d/mandelbrot_task.fatbin
Normal file
Binary file not shown.
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user