More on project panama
So i've continues to work on code utilising project panama/jdk.foreign to bind directly to C apis. Most of the work has gone into one of the vulkan bindings in panamaz - i started a 3rd one that creates the api directly from registry but utilises templates heavily to be a bit less hairy than the first attempt. It's still grown to be pretty hairy but at least it runs fast and generates a good quality api. I might split it out into it's own project soon as netbeans is getting a bit unwildy opening panamaz with so source files to work with.
Now i'm going through the vulkan tutorial and using tha to refine the api a bit further. I'm working towards using signed distance fields for digital art of some sort. What? Who knows, i'm lacking inspiration.
Also my heart isn't really in it at the moment so it's a pretty slow burn. I've stopped tracking the development version and i'm just using openjdk 18 - the primary reason is that netbeans wont open projects properly when i use my development builds so i've kinda given up. The api has moved on quite a bit from jdk 18 so it will be quite a bit of change when it comes to update unfortunately. I'd been using emacs to get around this but without code completion Java isn't a lot of fun.
notzed.nativez, java.make
I've been working on an update java.make which removes the JNI bits and replaces whem with jdk.foriegn bits via an updated notzed.nativez (jdk-foreign branch). I had made a couple of mistakes: using .PHONY targets as dependencies just doesn't work, and trying to hook everyting to the 'classes remade' sentinal wasn't sufficient granularity. I've added a few more sentinal markers which can be used as dependency targets and this has greatly simplified the dependency setup and allows builds to properly parallelise.
Long term i'll probably update jjmpeg and zcl to use these new mechanisms but for now they are not a priority.
I have another background project going to try to optimsie the build process. I have almost all the code done in both c and java to handle accurate incremental builds, e.g. it can find out what needs to be remade, what needs to be deleted and so on, run a compiler server, and generate makefile dependency rules, but for now i'm using javac with '-m'.
netbeans
Along the way I ran into some performance issues with Netbeans and java code completion. As part of the code completion it runs through a routine which filters out the visible symbols based on Java's scoping rules, but the routine uses an N^2 algorithm, and to make matters worse the inner loop does a string comparision based on output of CharSequence:toString() - which is very expensive, genereates a ton of garbage, and by far the majority of the time expense. Even the somewhat modest 3000 constants from vulkan will slow the routine down to around 500ms on a ryzen cpu. Oddly there was alreayd a solution to this coded up but for some reason it was never hooked up to all uses of the visibility check. I reported it with some potential fixes and then found out jira is no longer the bugtracker for netbeans. Sigh, it sucked but at least it wasn't github.
Netbeans code completion seems more broken than it used to be as well since they switched away from the gpl licensed nb-javac. It wont really complete properly until you've added the import and syntax errors (you know, the ones that exist as you're writing new code because it's decided to repase it mid-phrase) will break all sorts of things. Now i've got a codebase I can edit i might see if i can at least build a local copy without all that tooltip bullshit that keeps getting in the way of what you're doing. Be nice to fix the slow performance of the output window too - there's no reason it should be so miserably slow.
jdk.vector
A few weeks ago I also played a bit with the vector api which allows one to access SIMD instruction sets natively in java. It's a bit better than I thought it was when I first looked at it (it seemed very clumsy), but it's also less abstract than I had hoped. It's more or less a mapping to AVX intrinsics - you need to specify the vector width and although you have abstracted intrinsics you must code to the platform the same way you need to with intrinsics otherwise the performans tanks. Also suffers from the general problem of compilers over-usign registers so it's easy to get spills if you try to unroll anything.
I was mostly poking around with graphics related maths functions like matrices, one routine was a 4x4 matrix inversion using gauss-jordan elimination. I tried to unroll the inner loops entirely using SIMD selection logic to implement the row sorting but it didn't create much speed-up. Oddly I tried a branchy bit of code that generated some very odd hotspot output that executed microbenchmarks quite a bit faster - it seems to branch off to a sort of exception point, re-arrange the registers via memory swaps and then re-enter the main loop through some hotspot function. Kinda hard to follow what was going on with the assembler output. I might post about this later if i revisit it.