I’m trying to push the limits of interactive scientific visualization, so I gotta start somewhere! Ok, so 20 million particles running at ~40fps is not a bad start, but this is also using the simplest rendering possible (GL_POINTS of size 1, with no alpha blending or lighting).
I’m running my code on Ubuntu 10.04 on a Dell Precision T7500 with a NVIDIA GTX480 (1.5GB GDDR5). The machine has 12GB of ram and 8 cores of CPU which don’t get used at all by this (besides to initialize the system and run the main loop of course). Check out the video to see some pretty patterns and a large amount of particles!
A quick note for those waiting for these in Blender, currently I have it working but only with about 100k particles if I use a Mesh as the emitter (any more and Blender starts choking before I get in the game engine). I’m going to start looking at the existing particle interface (as well as the redesign) to start integrating tighter into Blender now that I’m getting comfortable with OpenCL/OpenGL.
Let’s talk some math and look at some numbers, if that doesn’t sound fun it’s ok to stop reading now ;)
So first let me admit that this OpenCL kernel is not doing that much work. I’m solving the Lorenz Attractor ODEs with the RK4 method for each particle. It’s all embarrassingly parallel and there is no interaction between them. Second, I tuned down the rendering so it’s doing as little as possible. This lets us get to the memory limit of the GTX480 at 20 million particles with my setup. Lets see why:
I use float4 arrays (OpenCL variables of 4 32bit floats in a row) for vertices, colors, generators and velocities. I use one float array to keep track of the life of each particle.
4 (arrays) x 20,000,000 (particles) x 4 (floats) x 4 (bytes) = 1,280,000,000 bytes
1 (array) x 20,000,000 (particles) x 1 (float) x 4 (bytes) = 80,000,000 bytes
So thats 1,360,000,000 bytes / 1024E3 = 1.267GB of memory on the graphics card!
Luckily I’m using OpenGL interop, so the vertices and color array are actually VBOs in OpenGL’s context and are modified in place. Since that’s the case we don’t transfer any of that memory back to the CPU, which would be a serious problem. It’s possible my kernel could be optimized more, but right now memory and rendering are the limiting factors. When I start implementing more interesting physics like collision detection and fluid dynamics this will be a bigger issue. I’m also planning to implement depth sorting using an index array VBO so I can render cool effects with proper alpha blending. This of course will also limit the number of particles possible, with my guess being that rendering will be the biggest culprit (not memory).