About Me

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as Zed
  to his mates & enemies!

notzed at gmail >
fosstodon.org/@notzed >

Tags

android (44)
beagle (63)
biographical (104)
blogz (9)
business (1)
code (77)
compilerz (1)
cooking (31)
dez (7)
dusk (31)
esp32 (4)
extensionz (1)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (459)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (231)
java ee (3)
javafx (49)
jjmpeg (81)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (10)
opencl (120)
os (17)
panamaz (5)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
players (1)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
vulkan (3)
wanki (3)
workshop (3)
zcl (4)
zedzone (26)
Wednesday, 05 March 2014, 12:41

opencl javafx mandelbrot thingy

I didn't want to think too much coming up with something to test my opencl binding so I took the mandelbrot demo code from the parallella opencl forum and ported it to JavaFX and zcl. Aim is to try it on the parallella but i'm having troubles with the opencl environment so I just did it on my new workstation.

Because I was too lazy to do a full 'explorer' I just converted it to a julia-set generator and use the mouse location to set the seed point.

Using the gpu it easily keeps up with the screen refresh even at 4x this resolution (1024x1024). The julia set escapes much faster than the mandelbrot set so even the cpu driver can keep up quite well actually until you hit the blobby empty areas. I'm using a kaveri 7850 APU and the above image is about 100-200uS on the gpu and about 3mS on the cpu driver according to the profiling timestamps.

At first I was a little stumped as to how to hook the rendering into the JavaFX refresh. But I came up with a couple of approaches:

  1. Use a never-ending Transition.

    Each time the interpolate method fires it checks to see if the last rendering has finished and if it has it fires off another rendering (and event) and copies the pixels to the WritableImage.

  2. Use an event callback.

    When the job is queued, add an event callback for when the output event reaches the CL_COMPLETE state. When it fires queue a copy to the WritableImage using Platform.runLater() and at the same time enqueue a new render.

The former hits the sort of rendering issues you get from double-buffering (if it's not fast enough it jumps to half speed) although it loads the system quite lightly and does a lot less mucking about with cross-thread invocation.

I wonder if you can efficiently triple-buffer with JavaFX? I guess render to a non-attached WritableImage from another thread and dump it into the ImageView at the next pulse? Maybe sort of.

Of course ... it would be better if the OpenCL just wrote directly to the GL texture, ... but I guess we can't have everything we want right away.

The code is too ugly to publish and it's not like the world needs another one of these demos but if i ever get opencl working on the parallella i'll drop it somewhere.

Software "upgrades"

Yes, scary quotes.

On to a short rant. As part of this post I needed to get a screenshot. No worries I says, fire up the gimp as usual, grab a screenshot, save it for blogger as png. Oh, it wont let me. What the hell? Why would I want to save in xcf unless I wanted to save in xcf? Bizarre.

Seems there's a whole ocean of complaints about this one I've managed to avoid until now because I so rarely re-install my operating system. The whole thing about not wanting "normal" users is pretty fucked up though. Who the hell do they think is going to use it then, Hollywood? And if so - why the fuck are they pandering to those DRM-eye-pee-bullshit bastards in the first place? Free software gone mad. Effectively forked from it's mission by it's own bully developers and yes-men cronies. Krita looks a bit too hip for my tastes but I suppose it's worth a look (yeah, i'm probably not going to poke at ImageZ again, although, you know, opencl + javafx could be a 'thing' there).

I'm sure i've mentioned this many times but in part i think it's just that nobody wants to 'just maintain' fully mature software and they have to keep poking at it just for the sake of it. And new blood always wants to make it's mark - and there's no wiser hands left to stop them. And no doubt they've caught a bit of the software-treadmill disease from M$ or apple since that's what they were brought up with. Not to mention this whole rolling-upgrade-never-ending-changes bullshit that mozilla and google are so fond of. I had reason to go to the australian acma website yesterday (robot caller kept calling back every half hour - which is apparently completely legal ... sigh ... and given it was polling political data for the upcoming state election we have no hope of the pollies doing anything about it) and I was absolutely astounded at how shitty and difficult to use it has become. Obviously a M$ weenie is in the design driver's seat because the whole front page is a huge mess of metro-inspired animating vomit.

It's no wonder i'm so averse to "upgrading" my gnu/linux install. Every time there's some piece or even whole suite of software which is just that little bit more fucked up and harder to use than the previous version. I gotta spend half a day just getting the defaults usable again and it's weeks before everything is sorted (e.g. like this gimp surprise). It even keeps happening to emacs of all things (visual line mode - i mean really?) and gcc (well the C standards) can't seem to agree on what is worth a warning from version to version.

Tagged hacking, java, javafx, opencl, parallella, rants.
Wednesday, 05 March 2014, 04:11

zcl 0.2 released

Details on zcl home page but the main thing is that it now builds on parallella and with OpenCL 1.1 (and presumably other 32-bit platforms).

Doesn't seem to work as per the previous post, but it builds.

Tagged hacking, java, opencl, parallella.
Wednesday, 05 March 2014, 04:08

parallella + java + opencl = not happy jan

So following the previous post I thought i'd try getting zcl running on the parallella. I can't really think of anything to use it for but I thought maybe a javafx frontend to a opencl-something would do for an experiment and I haven't used the machine for months and feel a bit like I should.

I had to fix some 32-bit issues and also discovered some api bits missing but the main work was getting zcl to support an OpenCL 1.1 backend since unfortunately coprthr only supports OpenCL 1.1 at the moment. Rather than pollute the api I kept the Java API the same and emulate the various functions where possible through the 1.2 interfaces (e.g. clCreateImage calls clCreateImage2D/3D as appropriate) or throw an UnsupportedOperationException for new functionality. I fixed some other issues as logged on the downloads page.

So I compiled it and linked it to libocl instead of libOpenCL.so (the first surprise), compiled up some pre-requisites for coprthr ... and yeah it just crashes in loadLibrary(). Bummer.

It's most likely that my disk image is out of date I was just hoping to avoid a whole sysadmin session just to poke at some software. I'm kinda over that shit these days.

Update: See the follow-up post where I got it working. It was just a library versioning issue. The OpenCL stuff is a little 'rough' IMNSHO, but it should still be possible to do real work with it too.

Tagged hacking, java, opencl, parallella.
Tuesday, 04 March 2014, 08:40

zcl 0.1 released

Just wanted to get it out of the way so I did a bit of house-keeping on my isp web page and created a zcl home page. It pretty much explains everything that could be said here.

As part of the house-keeping I created a basic but functional software index page to consolidate the mess of stuff I had there and moved all the download files to a common location.

Update: I went looking for that option that turns on warnings for functions without prototypes ... and found a bug. enqueueReadImage() can't work in 0.1.

Anyway I found something to do: get it going on parallella.

Tagged code, hacking, java, opencl.
Tuesday, 04 March 2014, 04:00

toArray or not to array

So I guess I found some more stuff to play around with. It's a really lovely day here but I know town is going to be packed a bit more than I'd like and I've kinda had enough of going out for the moment. I guess i'll see after lunch on that.

First I thought i'd fix the array getter methods. The OpenCL get methods are built in a way where if you don't know the size of a field you have to call it twice - once to get the size and the next to get the content. There didn't seem to be much point exposing this to the Java side of things as I had initially done and I went with just doing that directly in the C code and returning a newly allocated and correctly sized result. My original thoughts were that perhaps buffers could be re-used for multiple gets but in reality it just isn't that useful: for small buffers it doesn't matter and for large ones you need to find out the size anyway and any management overheads (thread-specific etc, and simply having memory sitting around doing nothing) of reusable buffers is going to swamp allocation and GC.

I also realised I could fix one of the last bastions of the exposed native pointers and change the long getInfoP methods to return an object directly to:

 native public <T extends CLObject> T getInfoP(int param, Class<T> klass);

Which is kind of nice. Actually getInfoP() was hidden by type-specic getters but doing it this way (and particularly for the array types) saved even more code in the Java side for a minimal cost on the C side (actually I ended up saving code by reorganising the array getters).

Then I thought about whether I could add native array types to the CLCommandQueue interfaces. e.g.

    native public void enqueueReadBuffer(CLBuffer mem, boolean blocking,
        long mem_offset, long size,
        byte[] buffer, long buf_offset,
        CLEventList wait,
        CLEventList event) throws CLException;

In addition to the interface that uses nio buffers.

The tricky bit is that these can run asynchronously so you can't use the GetPrimitiveArrayCritical() calls and you're basically left with either manually copying them using Get*ArrayRegion() or using Get*ArrayElements() which just seems to copy them on hotspot anyway.

As an experiment I tried the latter. Actually it ends up copying both ways which is a bit of a waste.

When called without blocking I use an event callback to await completion and then release the array back to Java. Strictly speaking I should also do the same for the Buffer versions so that the Buffer doesn't get GC'd while it's running but that's something I think can be left to the programmer to keep track of.

I tried a test program which just did many calls followed by a flush each time and actually performance wasn't too bad relative to the Buffer version. Maybe 10-20% slower (which is ok since accessing arrays is faster and simpler than Buffers in java). But then I tried a silly example of moving the flush outside of the loop. Ok, now it's 4x slower and god knows how much memory it ends up swallowing whilst executing.

So I followed up by trying the GetArrayRegion interface. This is a little bit faster but nothing to write home about.

At this point I think i'll just keep the binding and api smaller and leave it with using a ByteBuffer (sigh, which i still need to fix the endianess of) but i'll save the code for maybe later.

Actually probably the most surprising thing is just how slow the OpenCL stuff is here. This is only using the CPU driver so there's no weird memory busses to go over (even if this wasn't an apu). It's about 100x slower than copying a ByteBuffer to a byte array the same number of times. I thought it might be because the calls are non-blocking, but making them blocking only makes it worse. I tested the JNI overhead too by simply nooping out the clEnqueueReadBuffer call on the array Region version and that is only about 2x slower than ByteBuffer.get().

Yeah ...

Tagged hacking, java, opencl.
Monday, 03 March 2014, 09:33

ZCL == feature complete

I managed to completely fill out the OpenCL binding i've been playing with: wrote accessors for all the properties which made sense (only the refcount stuff omitted, CLDevice has a fuckload), cleaned up a few bits pieces, added lots of validity checks, fixed up some portability things, added some (incomplete) javadocs, license headers, a README, and a couple of incomplete hacked-up makefiles which build all 4 outputs (jar, javadoc, jni library, source distribution) in under 5 seconds.

Counting semi-colons it's about 1KLOC of Java and 1K3LOC of C; which seems quite reasonable for 100% api coverage, and KLOC was a big part of the experiment I was conducting. And even then about 400 of those Java lines are just a copy of the defines from cl.h. Kinda lost-track of how much time i've spent on it at this point - something like 4 not-quite-full days work.

But now i'm kinda bored with that toy.

Don't really feel like testing enough of it to get it to bee-ta state at this point. And that's the point I guess; there isn't a point to any of it. The end-goal wasn't the interesting bit.

Journey's over, what next?

But I spose ... i'll dump it somewhere when I feel like spitting out a home-page for it.

Tagged hacking, java, opencl.
Sunday, 02 March 2014, 13:15

Native kernels too?

I kept poking from the previous post and ended up getting native kernels going as well. I'm not really sure how useful they are but it's nice to come up with a neat solution.

It took me a while to grok the interface to clEnqueueNativeKernel but it seems to make sense.

This is the result I managed:

  public interface CLNativeKernel {
    public void invoke(Object[] args);
  }

  class CLCommandQueue {
      public native void enqueueNativeKernel(
          CLNativeKernel kernel,
          CLEventList waiters,
          CLEventList events,
          Object... args) throws CLException;
  }

Which leads to a relatively clean usage:

 CLBuffer mem = cl.createBuffer(0, 1024 * 4, null);

 q.enqueueNativeKernel((Object[] args) -> {
    System.out.printf("native kernel invoked %s\n", Thread.currentThread());
    for (Object o : args) {
        System.out.printf(" %s = %s\n", o.getClass().getName(), o);
    }
  }, null, null, mem, 10, mem, 10L);

Produces:

native kernel invoked Thread[Thread-0,5,main]
 java.nio.DirectByteBuffer = java.nio.DirectByteBuffer[pos=0 lim=4096 cap=4096]
 java.lang.Integer = 10
 java.nio.DirectByteBuffer = java.nio.DirectByteBuffer[pos=0 lim=4096 cap=4096]
 java.lang.Long = 10

The tricky bit is getting the memory handled. clEnqueueNativeKernel takes cl_mem arguments as input but then remaps them to physical (virtual) memory pointers when invoking the kernel. The only equivalent of a pointer in Java is a ByteBuffer ... but that also needs a length.

But basically I just copy over the jobject references from the jobject array and change any CLMemory classes to be the cl_mem they point to. In the native kernel hook I then have to remap the provided pointers of any CLMemory instances to direct ByteBuffers, and I obtain the actual memory size using clGetMemObjectInfo(). Because the native kernel hook can only take one set of arguments I fudge it by internally using argument 0 as a structure which contains all the copies of stuff I need and then free it afterwards. It does force the java code deal with some of the bytebuffer details but the only alternatives I can think of get pretty messy and actually doing lots of processing on memory buffers isn't something you should be doing from any native kernel to start with. They only work on CPU targets (APU?) anyway.

I did hit an issue in that AttachCurrentThread() was attaching to another native thread this time; so I tried using AttachCurrentThreadAsDaemon() instead. That may actually not be a good idea but it depends on whether a given OpenCL implementation is using thread pools or not. I guess?

Anyway, i'm fairly pleased with the result here.

Tagged hacking, java, opencl.
Sunday, 02 March 2014, 10:00

on JNI callback reference handling

After the previous post detailing some issues with handling callback reference handling I had another look at it this evening.

First for the clBuildProgram() function I just deleted the global reference in the callback. I tried to identify reference leaks using the netbeans memory profiler but it was a little difficult to interpret the results. For starters running the demo routine in a loop didn't result in loop-number reference leaks as one would expect (or even loop-number of reference creations oddly enough; may be related to hotspot and/or it being a static method) ... anyway I think it should work regardless except in the specific case where OpenCL doesn't actually call the notify callback for whatever reason: it is unclear from the specification if it MUST always call it for example. I'm just going to have to assume if that ever happens the system is in such a state then adding a leak is of no practical importance.

Then took at look at the clCreateContext() issue which seemed a bit trickier. On a hunch I looked up how weak references work from JNI and at first I didn't see anything useful but whilst poking around a tidy solution became apparent.

All I have to do is save the original reference to any notify function in the CLContext on the Java side. This lets Java handle the reference as it normally would and any notify object should automatically have the same lifetime as the reference to CLContext.

From the (rather badly formatted) JNI document:

Weak Global References

Weak global references are a special kind of global reference. Unlike normal global references, a weak global reference allows the underlying Java object to be garbage collected. Weak global references may be used in any situation where global or local references are used. When the garbage collector runs, it frees the underlying object if the object is only referred to by weak references. A weak global reference pointing to a freed object is functionally equivalent to NULL. Programmers can detect whether a weak global reference points to a freed object by using IsSameObject to compare the weak reference against NULL.

Weak global references in JNI are a simplified version of the Java Weak References, available as part of the Java 2 Platform API ( java.lang.ref package and its classes).

Clarification    (added June 2001)

Since garbage collection may occur while native methods are running, objects referred to by weak global references can be freed at any time. While weak global references can be used where global references are used, it is generally inappropriate to do so, as they may become functionally equivalent to NULL without notice.

While IsSameObject can be used to determine whether a weak global reference refers to a freed object, it does not prevent the object from being freed immediately thereafter. Consequently, programmers may not rely on this check to determine whether a weak global reference may used (as a non- NULL reference) in any future JNI function call.

To overcome this inherent limitation, it is recommended that a standard (strong) local or global reference to the same object be acquired using the JNI functions NewLocalRef or NewGlobalRef , and that this strong reference be used to access the intended object. These functions will return NULL if the object has been freed, and otherwise will return a strong reference (which will prevent the object from being freed). The new reference should be explicitly deleted when immediate access to the object is no longer required, allowing the object to be freed.

So all the native callback function has to do is call NewLocalRef() on the passed in handle, and if that is not-null it is still live and can be called; otherwise it can print some warning and continue on it's merry way. The reference can either be saved by creating a different constructor or by adding a wrapper to the native method which does the saving.

If I don't find some short-coming in this implementation then this is a nice clean solution without having to try to create my own mirror of either the opencl or java reference trees - which would be a very undesirable.

For the buildProgram notify I decided to pass a reference to the actual CLProgram rather than create a new instance, not particularly important but a bit tidier. Other than that it just deletes the references and frees the callback block after invoking the notify interface.

For createContext I went with a new constructor mechanism and it only needed some minor changes in the JNI code.

  public class CLContext extends CLObject {

    final CLContextNotify notify;

    CLContext(long p, CLContextNotify notify) {
      super(p);
      this.notify = notify;
    }
  }

And some changes to the JNI init code:

  -  data = (*env)->NewGlobalRef(env, jnotify);
  +  data = (*env)->NewWeakGlobalRef(env, jnotify);

And JNI callback code:

  +  jnotify = (*env)->NewLocalRef(env, jnotify);
  +  if (!jnotify) {
  +    fprintf(stderr, "cl_context notify called after object death\n");
  +    return;
  +  }

I'm still not sure how i'm going to manage native kernels yet, hopefully it is like CLBuild and just runs once per invocation.

I guess over the next few hacking sessions i'll fill it out a bit and look at dumping the source somewhere. I'm not sure if i'm even going to use it for anything or just use it as a learning exercise.

Tagged code, hacking, java, opencl.
Newer Posts | Older Posts
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!