Today we journey as pythonauts into the world of particle systems and fast 3D graphics, manipulating thousands upon thousands of little dots in mere milliseconds! We accomplish this with two Python modules, PyOpenCL and PyOpenGL. This is a port of my C++ OpenCL tutorial, but with much less code and all the splendors of numpy.
What you’ll need:
- numpy (at least version 1.4)
- PyOpenCL(for now, Mac users will need to build from git)
- the code
- A graphics card and driver that support OpenCL
Let’s get started!
First lets take a look at the files we will be dissecting
- important files main.py part2.py part2.cl initialize.py - support files glutil.py vector.py timing.py
You can go ahead and run python main.py to see the code in action. The way our particle system works is that we have some collection of particles, really just 3D points in space, each with it’s own lifetime. Every time we loop we want to update the positions of our particles based on some set of rules (e.g. gravity, initial velocity) and decrease their lifetime a little. When the lifetime of a particle reaches 0 we set it back to its original position and reset its lifetime. So we want to structure our code so that we initialize the positions of the particles first, then every frame we update and render them.
main.py is responsible for setting up the OpenGL environment using PyOpenGL and GLUT, providing mouse and keyboard interaction as well as a run loop for our program. So now we have some 3D space we can look around by clicking and dragging (left and right mouse buttons) or hit ‘q’ to quit and ‘t’ to print out timing of the update function.
Once we have our environment we need to setup some particles, for this look in initialize.py. The first function you will see is function_np which creates several numpy arrays and sets their values using numpy slice operators. If that’s a little confusing, it may be easier to follow the fountain_loopy function which does the same thing but with a (slower) for loop. Essentially we are randomly placing the particles on a flat donut in the x, y plane. They each start with an initial velocity in the same direction as their position and a random lifetime. We also make an RGBA color array so that each particle can have its own color.
Notice the fountain function at the bottom, which just calls the fountain_np function and creates Vertex Buffer Objects (VBO) out of the numpy arrays. Presumably when you go on to make your own OpenCL/OpenGL code you will start with your own numpy arrays and this is a key point for setting up your memory on the GPU. If you want OpenCL to be able to act on OpenGL memory (without making copies) you will need to prepare VBOs (or render buffer objects with clImage which should be the subject of a later tutorial) before you go to OpenCL. Luckily its dead simple with PyOpenGL
from OpenGL.arrays import vbo pos_vbo = vbo.VBO(data=pos, usage=GL_DYNAMIC_DRAW, target=GL_ARRAY_BUFFER) pos_vbo.bind()
Now lets look in main.py to see how the initialize functions are called and the results are passed into OpenCL.
#set up initial conditions (pos_vbo, col_vbo, vel) = initialize.fountain(num) #create our OpenCL instance self.cle = part2.Part2(num, dt) self.cle.loadData(pos_vbo, col_vbo, vel)
Here num and dt are defined at the top of main.py, which are user parameters for how many particles to make and essentially how much simulation time passes each frame. Now we get to the OpenCL, we’ve made an instance of a Part2 class. Then we call loadData with the vbos and velocity array.
Interfacing with OpenCL
Let’s take a look at what the constructor and loadData are doing in part2.py
self.clinit() self.loadProgram("part2.cl"); self.num = num self.dt = numpy.float32(dt)
Notice how we explicitly “cast” dt to a numpy float, this is required to pass variables (non-buffers) to a PyOpenCL kernel.
Now let’s take a minute to look at what clinit is doing, for it is responsible for setting up our CL Context using the existing GL context.
plats = cl.get_platforms() from pyopencl.tools import get_gl_sharing_context_properties import sys if sys.platform == "darwin": self.ctx = cl.Context(properties=get_gl_sharing_context_properties(), devices=) else: self.ctx = cl.Context(properties=[ (cl.context_properties.PLATFORM, plats)] + get_gl_sharing_context_properties(), devices=None) self.queue = cl.CommandQueue(self.ctx)
First we get a list of available OpenCL platforms on the machine and then import a handy function provided by PyOpenCL which abstracts the messy business[pdf] of setting up the right properties for sharing a GL context. Due to the eccentric whims of Apple (hey, they pretty much came up with OpenCL) the way you create the context on Mac OS X is slightly different, hence the if statement checking for “darwin”. After that its back to business as usual by creating a CommandQueue from the context. If you are using PyOpenCL .92 or beta2011 you will need to replace the contents of the clinit function with this code.
So loadProgram is the same as in Part 1 where we simply read in the file to instantiate and build a program object. So lets skip that and get straight into loading the data:
def loadData(self, pos_vbo, col_vbo, vel): import pyopencl as cl mf = cl.mem_flags #... cut out the saving of variables to self ... #Setup vertex buffer objects and share them with OpenCL as GLBuffers self.pos_vbo.bind() self.pos_cl = cl.GLBuffer(self.ctx, mf.READ_WRITE, int(self.pos_vbo.buffers)) self.col_vbo.bind() self.col_cl = cl.GLBuffer(self.ctx, mf.READ_WRITE, int(self.col_vbo.buffers)) #pure OpenCL arrays self.vel_cl = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=vel) self.pos_gen_cl = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=self.pos) self.vel_gen_cl = cl.Buffer(self.ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=self.vel) self.queue.finish() # set up the list of GL objects to share with opencl self.gl_objects = [self.pos_cl, self.col_cl]
The key here is on line 33 in creating OpenCL buffers from the vbo objects using the cl.GLBuffer class. Notice how we do not pass in the vbo object itself, but the actual integer value of the buffer as it is represented by OpenGL. We create normal OpenCL buffers as we did in Part 1 simply by passing in numpy arrays (♥). The other significant change is defining the gl_objects list which we will use in execute!
def execute(self, sub_intervals): cl.enqueue_acquire_gl_objects(self.queue, self.gl_objects) global_size = (self.num,) local_size = None kernelargs = (self.pos_cl, self.col_cl, self.vel_cl, self.pos_gen_cl, self.vel_gen_cl, self.dt) for i in xrange(0, sub_intervals): self.program.part2(self.queue, global_size, local_size, *(kernelargs)) cl.enqueue_release_gl_objects(self.queue, self.gl_objects) self.queue.finish()
The first thing to point out is that we are acquiring the gl_objects before we pass them in as arguments to the kernel. This makes sure that OpenGL is not using the buffers for anything and allows us to read from and write to them. We are setting our global workgroup size to the length of our arrays, essentially saying that each thread global workitem will be one element of our original array.
We also introduce sub intervals, allowing us to perform a variable number of updates per frame. This means one could make dt smaller, generally making the simulation more accurate but not slow down the desired motion of the particles. For example if you decrease dt from .01 to .001 you will be making the particles move 10 times slower every iteration, so to keep the visual speed the same you would do 10 sub iterations.
So after all that setup we are ready to run our kernel, found in part2.cl.
This is also the exact same kernel as used in my C++ version.
So there you have it, the essentials of interoperating between OpenCL and OpenGL in Python! I didn’t cover what I did in my utility files but those are subjects of later posts. You should be able to poke around to see whats going on, and as far as rendering I’m using very basic GL calls for VBOs which there are other tutorials for.
I’d like to shout out to Keith Brafford for helping test and refactor this code on Windows as well as the PyOpenCL patch I worked on to get GL interop working on the Mac. Of course this tutorial wouldn’t be possible without the valiant efforts of Andreas Klöckner!