Comments on: Adventures in OpenCL Part 3: Constant Memory Structs http://enja.org/2011/03/30/adventures-in-opencl-part-3-constant-memory-structs/ casin' the joint since '85 Thu, 03 Mar 2016 20:39:33 +0000 hourly 1 https://wordpress.org/?v=4.4.3 By: Balthazar http://enja.org/2011/03/30/adventures-in-opencl-part-3-constant-memory-structs/comment-page-1/#comment-42884 Fri, 15 May 2015 15:09:48 +0000 http://enja.org/?p=521#comment-42884 I see there is a discussion about passing structs as arguments, and alignment. What works on one computer/compiler might not work on another. I see that in the cl.h file from Khronos there are alignment attributes added to the definitions of types like cl_float. It may be best to just use those types for structs that are passed between the host and the kernel.

]]>
By: Max http://enja.org/2011/03/30/adventures-in-opencl-part-3-constant-memory-structs/comment-page-1/#comment-2696 Tue, 24 Sep 2013 11:17:09 +0000 http://enja.org/?p=521#comment-2696 I just wanted to comment on “Emanuel Ey” regarding his way of passing a struct by value into private memory.

First of, the only way the struct will actually end up in private memory is by copying it from global or constant memory. Second, every thread has its own private memory so the struct will be duplicated for each thread wasting memory space. Third, a struct with array components that are accessed in a dynamic way (through a pointer for example) cannot be stored in registers anyways, so it will reside in global memory.
Making things worse is that private memory that is spilled to global memory will not be cached because it makes no sense to do so, since each thread has its own copy.

Regarding: “This is ok for me, since ‘constant’ is a special type of ‘global’ and reading it comes at a performance penalty anyway.”
Yes it is global memory but there is a special constant cache (usually 8kb) on each multiprocessor which is as fast as L1 cache.
So storing the struct in constant memory is in fact the most efficient way (those GPU designers have done their homework after all).

Please don’t take my comment the wrong way, i’ve done some crazy “optimizations” myself in the past :)

]]>
By: Emanuel Ey http://enja.org/2011/03/30/adventures-in-opencl-part-3-constant-memory-structs/comment-page-1/#comment-1893 Tue, 17 Apr 2012 10:26:16 +0000 http://enja.org/?p=521#comment-1893 So i have been playing around with passing structures to OpenCL kernels as well, and I’ve actually been able to pass structs directly (i.e., without a pointer).
I did this in C, but it should be easy to adapt for C++.

Here are the most relevant points from the host code:

typedef struct{
cl_uchar cDist;
cl_uchar cClass
float y[N_POINTS_DEPTH];
float x[N_POINTS_RANGE];
float c1D[N_POINTS_DEPTH];
}myStruct_t;

The test kernel takes only 2 arguments, the struct as for input, and another struct of the same type to hold some test data computed from the input. Here’s the host-side memory allocation:

//instantiate a struct for input:
myStruct_t a;
a.y[0] = 1.4;
a.y[1] = 2.5;

//allocate memory for output:
cl_mem out = clCreateBuffer(context, CL_MEM_READ_WRITE, sizeof(myStruct_t), NULL, &errNum);
if(errNum != CL_SUCCESS){
fatal(clError(errNum));
}else{
DEBUG(2, "Successfully allocated memory for output.\n");
}

Then, set the kernel args:

errNum = clSetKernelArg(kernel, 0, sizeof(myStruct_t), &a);
errNum |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &out);
if (errNum != CL_SUCCESS){
fprintf(stderr, "Error setting kernel arguments.\n");
fatal(clError(errNum));
exit(EXIT_FAILURE);
}else{
DEBUG(2, "Successfully defined kernel arguments.\n");
}

The kernel:

__kernel void testStructs( private myStruct_t soundIn,
global myStruct_t *result){
local myStruct_t ssp;

ssp.y[0] = 2*soundIn.y[0];
ssp.y[1] = 2*soundIn.y[1];

*result = ssp;
}

So from my tests i figured out that apparently you cannot actually use the ‘constant’ qualifier when passing in a struct, it has to be ‘private’. This is ok for me, since ‘constant’ is a special type of ‘global’ and reading it comes at a performance penalty anyway. To me it makes sense to have a settings struct in the fastest available memory.
I tested this on an Nvidia GPU with OpenCL 1.1 and on a Intel Ivy Bridge CPU also with a OpenCL 1.1 implementation.

]]>
By: David Garcia http://enja.org/2011/03/30/adventures-in-opencl-part-3-constant-memory-structs/comment-page-1/#comment-770 Fri, 01 Apr 2011 23:21:56 +0000 http://enja.org/?p=521#comment-770 As far as passing in a struct not as an array, I have tried many different combinations to no avail.

I looked at the OpenCL conformance tests and couldn’t find any place that tests that feature. It’s possible that it’s broken in some implementations :(

]]>
By: enj http://enja.org/2011/03/30/adventures-in-opencl-part-3-constant-memory-structs/comment-page-1/#comment-767 Fri, 01 Apr 2011 21:38:49 +0000 http://enja.org/?p=521#comment-767 @David
I just did a quick test, if I don’t have padding it works whether I specify the alignment or not, so I will remove that part from the post until I find a use for it.

]]>
By: enj http://enja.org/2011/03/30/adventures-in-opencl-part-3-constant-memory-structs/comment-page-1/#comment-766 Fri, 01 Apr 2011 21:07:56 +0000 http://enja.org/?p=521#comment-766 Hello David,
Thank’s for pointing these out, I will revise my post with some of your corrections, after making sure I have a grasp on my misunderstandings.
The alignment issue is certainly a case of me not having the whole story, I have been telling the compiler to align the structs in my project’s code to 16 bytes (with __attribute__(alligned(16)) keyword, or #pragma pack(16) on windows). I read because of the fact that compilers can arbitrarily add padding it was a good idea to specify. I left that out of the tutorial, but perhaps I should put it back after double checking this. It helps me to try it out for myself.

As far as passing in a struct not as an array, I have tried many different combinations to no avail. If someone can give me an example which works I will gladly update the tutorial but we’ve had to play this trick finding no alternative.

Also thanks for pointing out the constant args device info, that’s good to know. 9 still seems rather small.

Thanks again
Ian

]]>
By: David Garcia http://enja.org/2011/03/30/adventures-in-opencl-part-3-constant-memory-structs/comment-page-1/#comment-761 Fri, 01 Apr 2011 02:20:12 +0000 http://enja.org/?p=521#comment-761 I believe there are some inaccuracies in the article.

Even though we only want one structure, we still need to pass it in as a buffer to appease OpenCL.

I don’t think there’s anything in the OpenCL spec preventing you to pass a struct as a kernel argument. For instance, section 5.7.2 lists how to pass different types of arguments to clSetKernelArg. After listing all built-in types and pointers, it says “For all other kernel arguments, the arg_value entry must be a pointer to the actual data to be used as argument value.” Additionally, section 6.8-p reads “Arguments to __kernel functions that are declared to be a struct do not allow OpenCL objects to be passed as elements of the struct”, which implies that it’s okay to pass structs as kernel arguments.

When interpreting a struct, OpenCL accesses the memory in blocks of 16 bytes, which is the same as 4 floats (each 4 bytes).

That is not correct either. What OpenCL requires is that each struct member must be naturally aligned. For example, a float variable, since it has a size of 4 bytes, must be aligned to a 4-byte boundary. Your example struct could be defined just fine as

typedef struct Params
{
float A;
float B;
int C;
} Params;

Keep in mind that, in accordance to C99 rules (of which OpenCL C is a derivative), compilers are free to insert padding between struct members and at the end of the struct. It wouldn’t be surprising at all if the particular compiler you are using has decided to pad the struct size to 16 bytes.

If you want to pass structs to kernels it makes sense to specify alignment attributes, as explained in section 6.10.1 rather than attempting to guess what the particular OpenCL compiler you have installed is doing. Inserting hand-crafted padding members means that your program may not work correctly in a different computer.

Additionally, at least on some implementations there seems to be an arbitrary limit of 9 constant (non-buffer) parameters, which using a struct will help you avoid.

This limit is not arbitrary. It can be queried through clGetDeviceInfo() and CL_DEVICE_MAX_CONSTANT_ARGS.

If you have any other doubts or questions about the OpenCL standard, please refer to the Khronos message boards where people will be happy to help.

]]>
By: enj http://enja.org/2011/03/30/adventures-in-opencl-part-3-constant-memory-structs/comment-page-1/#comment-754 Thu, 31 Mar 2011 02:38:18 +0000 http://enja.org/?p=521#comment-754 Hey Bunt, I didn’t switch, this one has the same code in both languages. My main project is still in C++ so I will have to focus on it. I still prefer PyOpenCL for development though, and I’ve been playing in it recently. From now on I will probably be doing my tutorials in both languages.

]]>
By: Bunt http://enja.org/2011/03/30/adventures-in-opencl-part-3-constant-memory-structs/comment-page-1/#comment-753 Thu, 31 Mar 2011 02:14:42 +0000 http://enja.org/?p=521#comment-753 Hey enja great tutorials. Is there any reason why you switched from Python back to C++ though, did you find any particular drawbacks in working with Python and PyOpenCL? Cheers, Bunt, BVI

]]>