About Me

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as Zed
  to his mates & enemies!

notzed at gmail >
fosstodon.org/@notzed >

Tags

android (44)
beagle (63)
biographical (104)
blogz (9)
business (1)
code (77)
compilerz (1)
cooking (31)
dez (7)
dusk (31)
esp32 (4)
extensionz (1)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (459)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (231)
java ee (3)
javafx (49)
jjmpeg (81)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (10)
opencl (120)
os (17)
panamaz (5)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
players (1)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
vulkan (3)
wanki (3)
workshop (3)
zcl (4)
zedzone (26)
Sunday, 01 August 2010, 14:27

Hungover, wild storms, ...

... so what better to do than spend the whole weekend hacking on code.

So since last time I have kept poking at my 'graphics editing' programme, and a few things are finally starting to come together. I think enough even to give it a name, for which I decided ImageZ worked well.

Say Hello, ImageZ

I had another try at the Java2D compositing system, but this time I wrote my own alpha compositor designed just for the float format i'm using. Much much faster, still slower than the custom code but it might be an option although I have other plans going forward. I also fixed my alpha blending code - I kept finding the tool-layer blend darkened when I applied it to the target layer. Not pre-multiplying alpha properly, and not applying alpha properly. Now it works nicely.

I got the paint applicator tool working - it just steps along the drawing line and puts a dab of paint every time it's travelled a certain distance. This allowed me to very easily write a `texta tool' and a 'fuzzy brush' tool in half a dozen lines of code each - just by changing the Paint applied to the dot, one just has a radial texture. Since the paint is applied itself using a Shape, it can be anything so this covers a fair whack of the drawing end of things. And it wont take much to add pen jitter implement an airbrush, or bitmap brushes.

I have the backend but not the front end for the selection tool. It can select arbitrarily shaped regions with all the boolean operators applied to each sub-selection (including exclusive or), along with feathering. I just used the Area class to implement most of the logic, and then spent way too much time trying to get a gaussian blur running properly. I'm not sure how i'll go about displaying the selection. The gimp draws lines around individual pixels which is sort of interesting, although it doesn't really work well at showing anti-aliasing or feathering. I can use the path object to draw lines for the square, ellipse and 'free select' tools, but that wont work for select-by-colour and so on.

Selection contents - union of two rectangular regions with feathering.

I had a look at saving files in a way which preserves the layers. I was just going to write a zip file with the layers stored in separate `standard' file formats. Unfortunately I couldn't get anything to save out the float images (i'm not sure what the TIFF saver expects or if I need to get JAI for that), so i'm not sure what to do there. There are quite a few Java image libraries so I probably just have to look around. I had a go at writing an OpenEXR loader since that is a pretty simple format that supports floats. The file format is nice and easy to parse and after a few hours I had something which parsed pretty much a whole file. But unfortunately i couldn't work out the format of the line chunks - I was getting something out but the stride was out - the image was offset and squashed and stretched and nothing I tried worked. Not sure the C++ code I grabbed for doing the half float conversion translated to Java 100% correctly either (should be, assuming the Float functions are taking the same format for ints). Since I wasn't making much progress and I was getting really tired I thought I'd better move onto something easier before it got me too bogged down. So no saving for now.

I also had a go at adding a frequency convolution mechanism. The speed is ok - visually within 1-2x the speed of the Gimp for a gaussian blur and I think it's using 2 threads for the FFT most of the time (about 0.5s for an RGBA image about 800x600). But with big blur factors or big motion blur you get the edges bleeding in (although it still runs at the same speed), so I need to pad the data first (extend pixels I guess?). The mathematical neatness of it is nice though and it allows for some interesting things that can't be done with a spatial-domain convolution kernel.

Then today I got really side-tracked. I really really hate having menu's attached to every window. Just such a huge waste of space, ugly and so on - and the 'animated slide to hide' crap just shits me off no end since it just gets in the way. So I had a go at trying to work out how to display AmigaOS style menus for Swing applications. They're hidden till you hit the menu button, then they work like any other - but it allows that part of the screen to be used for other things. After a very long journey of dead-ends I finally have something that works remarkably well. I had to add a mouse-listener to the glass pane of every window and I have to manually track and route mouse events. Basically I created simple sub-class of JMenuBar that uses a PopupFactory to present it as a popup menu instead and close it once the selection has been made. I can position that anywhere on the screen - e.g. at the top a-la-AmigaOS, although for now i'm sticking to putting it on the top-left of the window because top-left on a dual-screen display is a bit of distance away. It is no doubt rather hacky and almost certainly not portable but it's still bloody fucking cool.

So that's where the menu went!

I even hacked up the JFileChooser so that it opens in 'details mode' and a lot taller (why do they open it up so unusably small by default?). For that I have to walk the widget tree and then programmatically 'press' the details button. This makes it look basically the same as the ASL file requester (AmigaOS again) although unfortunately it isn't nearly as nice to use - the ASL one let you navigate easily from the keyboard without having to tab around to every single gadget (e.g. key the up or down cursor whilst the filename gadget has focus and it moves the selected item in the file list whilst dropping it into the file entry for editing, hit return on a drawer and it opens it rather than giving your application a `file' it can't use). It also ran asynchronously much better (the GNOME one is getting dreadfully slow).

I had a bit of a time trying to work out how to get the image window to open the right size. revalidate() is the key here. Although on big images they're opening bigger than the screen now :( And no matter what 'setMaximumSize()'s I used it makes no difference. Known bug.

I got zoom working. A lot easier than I thought it would be in the end. I just had to add an AffineTransform to the drawImage() call, do a little bit of scaling so the flattened image updates properly based on paint events and visa versa, and finally scale the mouse events for the tools. Then it just worked. Simple. Really fast too - basically instantaneous - since the only thing scaled is the backing image on its way to the screen. I have it hooked up to the keys 1-8 for now although I don't have it centring nicely when you zoom yet.

2x Zoom, with the two paint tools so far and different opacity settings.

I have a small toolbox and was fighting with Netbeans earlier in the night trying to get the widgets laid out nicely. Not entirely successful there. There's some painful stuff when you try getting any of the various layouts like GridBagLayout's to size to their content, and Netbeans doesn't let you set glue in BoxLayout's. I will probably just resort to hand coding the widgets, I guess there isn't really that much that needs doing anyway. I don't have it all hooked up, and I only have 'normal' blending mode in that menu, but I do have enough backend to implement the options shown.

Yes, the toolbox has a menu too (hidden).

I'm quite pleased with the progress so far - I have had some extremely late nights so it's sucked up quite a few hours (and i've been doing plenty of hours for work too; I work to forget). I'm not that happy with the way the layer window is implemented and how the tools are interacting with the image. I probably need a 'tool layer' as part of the image somewhere and not as part of the tools. And likewise the 'toolbox is the state' isn't really clean - although that's more a matter of re-factoring into some other static state class.

I'm also pretty pleased with the performance, considering I haven't exactly done much in the way of optimisation and everything apart from the FFT is only using one thread. I'm throwing around float images around like nobodies business and apart from using a keg of memory it's all nice and snappy (it's not really fair to compare a full app to a tech demo but certain things like the image window is noticeably snappier when magnified, probably because it's not bothering with tiles and Java does a bunch of multi-threading behind the scenes and doesn't need to deal with event polling all the time either). That's a little surprising since i'm using a non-standard format which is at least an order of magnitude slower for Java2D than a standard one. Part of the reason for playing with this was to have something I could accelerate using OpenCL, but at least right now it seems barely necessary (until I get some complex filters going). I'm kind of in two minds now - whether I just take out the float stuff and see how well it can go when using a supported backend format, or whether I look at moving most of the pipeline to OpenCL for the fun of it (or OpenGL I suppose, but that misses the point of what i'm trying to do). I wasn't originally going to support different data formats, but perhaps if I think about it a bit more I can find a way to experiment with multiple pipelines without adding a whole pile of support framework. I will keep most of the 'tool layer' or structured graphics layers (text, etc) CPU side regardless so it might be fairly easy to do.

Got a work deadline in under two weeks, so I might be a bit busy for a little while :-(

Tagged imagez, java.
Thursday, 22 July 2010, 14:12

Java2D and Float Images

So much for a day off, well I didn't get too wrapped up in work nor too wrapped in coding but I did dabble a bit. It was one of those crappy cold days - no wind, just no sun and a seeping cold that gets into your bones and turns your toes numb and fingers stiff.

I did finally work out one thing, or maybe re-worked it out; how to create Java BufferedImage's backed by a floating point buffer (and i'll get the details out of the way first).

{
        int width = 1024, height = 768;
        float [] data = new float[width * height * 4];

        ColorSpace cs = ColorSpace.getInstance(ColorSpace.CS_sRGB);
        ColorModel cm = new ComponentColorModel(cs, true, false, 
          ColorModel.TRANSLUCENT, DataBuffer.TYPE_FLOAT);
        SampleModel sm = new ComponentSampleModel(DataBuffer.TYPE_FLOAT, width, 
          height, 4, bounds.width * 4, new int[]{0, 1, 2, 3});
        DataBufferFloat db = new DataBufferFloat(data, data.length);
        WritableRaster wr = Raster.createWritableRaster(sm, db, null);
        BufferedImage bimage = new BufferedImage(cm, wr, false, null);
}

Each pixel is then stored in the array data[] in the order R G B, A. With the backing array image ops can work directly on the float data (FFT convolution anyone?), or you can get a Graphics2D from the bimage and work with that.

I was poking at a layered image system, using floating point buffers in RGBA to store everything. I load the image, convert it to the buffers, and run a really crap, really simple multi-pass compositing system to blend them into the display. So I have an image I can scroll around and set the alpha of, but what about drawing?

I recall trying to get float backed buffers working before and not having much luck, so I was going to look at using java2d to write to a byte or short buffer and then just converting that over (good enough for what i want). But I did finally work out the float buffers so I don't need to do that - and that's despite the documentation saying that 'TYPE_FLOAT' is just a placeholder. Actually it's even better since I can just attach a BufferedImage to any arbitrary layer's float array and then use the nice Java2D API to write to directly - there goes most of a 'drawing application'. It only needs a little bit of damaged-area tracking to get this onto the screen efficiently.

Currently i'm still converting the composited float image to INT_RGBA since that is a bit faster than drawing the float-backed image itself, but it isn't a huge difference.

Tada ... 2 layered image, the photo is about 70% opacity by the slider on the right with the 'background' showing through, and the top layer was drawn to using Java2D (I forgot to turn on anti-aliasing, but that's trivial to add). Actually java has it's own compositing mechanism so I can probably throw those few lines of code away too. Update: Tried this. Way too slow. Nevermind.

It doesn't do much, but then it didn't take a lot of code to do it either.

Tagged java.
Thursday, 22 July 2010, 04:48

XBMC beagle, GSOC 2010

Well I 'promised' an update on the beagleboard gsoc 2010 xbmc whatsit, and since we've just had the 'mid-terms' and I have some spare time it seems like a good point to poke it out.

The good news of the day is that Tobias passed the midterms well - although I haven't had a huge amount of time to devote to it, he has thankfully worked very well independently. He's been working well with both the xbmc and beagleboard communities, finding relevant experts to aid the task which has let me off the hook quite a bit. He's had to spend a lot of time just on the beagleboard environment which was an unavoidable pain since the hardware arrived late - and xbmc is a mammoth bit of code that takes an age and a half to compile. But most of the code to this point has been changing the rendering system from a game-like render-all loop to a damage-based system - which could be done on a pc. Still bugs, but it's getting there. The patches look nice, and he keeps the commited code building (just as well - it takes hours to build on the target).

He's started on the video overlay system now, so i'm expecting some big improvements. Some initial timing suggests it's spending nearly 60% of it's time in the 'gpu' doing YUV conversion (i'm not sure what resolution he's running it at). The video overlay will do that for free, and more in that it reduces the memory bandwidth requirements significantly.

XBMC basically 'runs' on the beagleboard now, but can only play quite low-resolution video and there's a few issues with missing text, but it does run. With a simpler theme and the video overlay work i'm hoping it will at least be at the SD-video media player level. The XM might even manage 720p for simpler video formats like mpeg2. Although out of scope for this stage of the project, there's also the DSP sitting idle at the moment so the hardware is capable of quite a bit more yet.

Tagged beagle, gsoc.
Thursday, 22 July 2010, 03:27

Lots o threads

I got a new work machine - hence the previous post. That was a short diversion into ms vista 7, which I thankfully didn't need to keep up - I was having massive problems with the nvidia graphics drivers under fedora 13, and problems with my code. But it turned out that it was just my broken code and it crashed just as badly in ms visa 7. Wow what a horrid system they've designed. Move a window to look at something behind it and suddenly it maximises so you can't see what you wanted, the 'file browser manager' thing which seems confused as to what it's trying to be, and probably the worst item - move a mouse over a list and the scroll wheel keeps scrolling the last list that had focus. Not even clicking on the scroll-bar gives it focus and you need to click in the list (often activating it - which you don't want). Ugh. It's like a hollow shell of a tech demo of slightly wacky ideas from GNOME and KDE all wrapped together with a questionably 'pretty' interface (i found it far too spaced out with poor font choices). It kind of looks ok, but there's no meat under it and lots of things don't work quite right. The OS installs pretty fast at least - but you don't get anything that lets you do any work and it just turns into a labourious hunt for some crap that probably doesn't work very well, install, repeat, until you have a remotely usable system. And it still does product registration? Jesus fucking Christ, that's just offensive.

OpenCL

So I had a few problems when I started moving code from the ATI card i've been using to the Nvidia one in the new machine. The compiler is a bit pickier/different about a few things, although iirc that was mostly not auto-casting scalars to vector types in a variable declaration. A bit of a pain but fortunately I don't have too much code yet and it was mostly a mechanical conversion process.

I suppose the main problem I had - had I known it at the time it would've saved me a very long and wasted day or two - was that the CPUs are much pickier about the code they'll execute. The ATI card doesn't mind some stray memory accesses but the nvidia one just crashes. That is good really since the code is buggy - but unfortunately you get no indicator of why it crashed, or even when it crashed. At some random point after some code you've queued to execute runs you get a random and meaningless (and undocumented/not to spec I might add) error code which says things have stopped working. I was thrown out because the nvidia drivers were a pain to set up - the `development drivers' just wouldn't run, and the production drivers ran but were a little touchy - if I log out of the session X wont restart. I was also thrown out since adding some debugging code made the routine run too (and since I had it working on the other machine ...).

Anyway now that I know any of these random errors are actually just segfaults it's much easier to deal with without getting a splitting headache. Actually I think I was getting so stressed (or maybe it's because i've been eating all sorts of crap) I spent most of one day with an anxiety induced dizzy spell and headache (ms vista 7 helped there too).

So anyway, the one main routine on which i've been working for the last few weeks got running again and I cleaned it up and whatnot. It's only about 2x faster than the ATI card (HD 5770, vs GTX 480 IIRC), but the code was 'tuned' for the ATI. Although using the word 'tuned' is being a bit generous really, I just kept trying things and seeing what was faster, since there are zero tools on Linux to perform any detailed profiling. I guess that isn't so surprising - if I coded it right it should be completely memory constrained anyway. I did make some minor changes since the nvidia cpu's support better datatype conversion than juniper, e.g. loading floats from bytes in 2 instructions, not dozens (it was much faster to load uint's directly and convert manually on Juniper, but the other way around on nvidia). Right now i'm taking data and converting to floats and working with that everywhere which was the right approach on the Juniper arch but might not be on nvidia since it multiplies the memory bandwidth by 4. But there's just not enough time in every day to try everything - I worked over the wet dreary weekend and ended up over 50 hours by COB Wednesday so i'm having a break now. I was supposed to be dropping to 4 days/week this financial year!

I'm still getting to grips with mapping problems efficiently to the GPUs. I've had some success with a more complex approach which copies data to local memory in coalesced accesses and then works from the local memory - which is fast (and pretty much essential on the ATI with no cache). But for smaller problem sets it gets difficult to find enough threads to work together on the problem or even to work out the addressing arithmetic so the algorithm works. Although I don't think it leads to the ultimate performance, and may not work terribly well on the ATI - a solution that seems to be working somewhat is to just throw as many threads at it as possible - reduce the address arithmetic to very simple operations and then process as little as one result per kernel. And it makes it practical to vary parameters without needing to hand-code every scenario to get usable performance, let alone best performance.

Free as a thing of freeness

If I could think of something to work on i'd also like to write some free software using OpenCL now i'm starting to get the hang of it - well if I can invent a time machine so I can add an extra week to every week so I can fit it in. But the trivial stuff I can think of seems too pointless, or the more complex stuff way too complex.

In the back of my mind i've had the idea of doing a Gimp-ish/ImageJ-ish application in Java (see ImageJ - many big operations work faster than the gimp), and using OpenCL to accelerate (or indeed completely implement) the operations. But ... it's such a big fucking task to get something even useful - and requires a huge amount of work in the UI department, so i'm not sure I want to commit to it. Just the basic window with a zoomable editable layered surface with a couple of drawing tools, selection and filter/effect options is quite a task (ok ok, it's basically the whole app ;-). I guess if I can get over the hurdle of a main editing surface widget I might be able to move forward with this idea.

Another idea is a 'gimp for video'. There's a nice java wrapping for ffmpeg which sorts out the codec end of things (yes there is, although like many java things, its fucking hard to find non-stale shit on google - xuggler). But here i'm lacking a bit of domain knowledge (and about all I really want to do is create slideshows/splice video together), and i'm not sure OpenCL is a good fit (simple fades and wipes are probably faster on a modern cpu). And working with media containers is entering a world of pain. Let alone the sorry fucked up state of linux sound which is something I don't think I could face sober and wouldn't put up with drunk. Might leave that idea.

I can't really think of anything else I might use that could make use of it to be honest.

Tagged opencl, rants.
Friday, 16 July 2010, 15:51

Ahaah

Well, now I know where KDE4 got its fucked up shithouse 'start menu' from. And the original from which is blatantly copied is also fucked up and shithouse.

Wow.

How fucked up.

And shithouse.

Tagged rants.
Tuesday, 13 July 2010, 14:59

Cordial

I had just a few limequats left on a rather sickly looking tree I have in a pot so I thought i'd make some cordial from it before I used them up ('syrup' for americans). They have a very nice flavour - much as you'd expect, ripe lime mixed with kumquat, so a little like a tart orange/lemon. I threw in a couple of lemons too since the limequats were a bit small.

I ended up with nearly 2 litres of this nice golden liquid, plus some glace peel I can use in a cake if I remember to save it.

I dropped the sugar a bit off the recipe I found on the ABC, and a bit more citric acid because I don't like it too sweet and a bit more tart (and the pot was too full!). I used 1kg sugar and about 1.5tbs of acid.

The tree was looking pretty ill and I think I overdid the treatments and now most of the rest of the leaves have fallen off! It was much like that last year by mid-winter too, so I guess i'll have to wait till spring to see if it will recover - I hope so because I love the flavour. Being in a big pot I didn't keep it watered properly over summer either. I also found that my lime tree has borers in the trunk and given I also failed to water it properly over summer it wasn't in great shape anyway - may well lose that one. If so I might get a native lime (if i ever see one in a shop again - saw one once, 5 years ago), or a more acidic lime (or lemon).

Tagged cooking.
Thursday, 08 July 2010, 14:13

Yay

Had a bit of a victory today - after a kick in the nuts or two. Finally got some of my OpenCL code running with the correct results at a reasonable clip.

I spent most of the day working out why the results were wrong - partially because of a minor bug or three, but mostly because all of the synchronisation primitives don't work when you call a kernel function from another kernel function (at least in the ATI sdk). Wish I had have known that to start with ...

I think it's roughly 100x faster than the original java or c code (although I should quantify it), so that's a pretty penny in the bank, and I think there's a bit more I can squeeze out of it - let alone using beefier hardware. One of the keys was to use a native format for most operations - I take the input data which is in a packed byte format and convert it to floats, and then operate on those. The other key is to use local memory as a programmed cache to reduce the load on global memory. And finally to utilise registers as much as possible - once i've loaded data from memory re-use the data repeatedly before needing to go back to memory or running out of registers. The OpenCL api also has some nice queuing and job management which makes it easy to let the CPU do other work whilst the GPU is busy, without having to synchronise every operation - which is the real mind killer. And it goes without saying that the data is loaded once to the graphics card memory and all operations operate there until I get a result out (converted to the format I need).

I still haven't managed to get the image datatypes to work but I will keep trying as t should fit this problem well (and nice to see that the JOCL guys were quick to implement the missing api's to support them). Using arrays is a bit of a pita tbh - i've had to split my work 'tile' into multiple slices, and keeping track of where each of the work units (threads) within the work group ('process') gets hairier than a hippies armpits. Using the texture units should let me remove all of the manual cache code and messy address arithmetic - although whether it executes faster is the real test.

Tagged opencl.
Tuesday, 29 June 2010, 04:22

Sourdough 0.3 - Crusty Loaf Edition

Well, another week another attempt at sourdough, and much more success this time.

I probably didn't let it proof quite long enough because it was so cold - I gave it a good 4 hours, but after forming a loaf it managed to rise ok - although it took about 18 hours. I went straight from creating the dough to forming the loaf without an intermediate rise and that definitely worked better.

Given the cake yesterday took an extra 30 minutes to bake at what should have been the correct temperature, I ramped the oven setting up to over 200 to try to compensate. It was probably a bit hot and I cooked it a bit too long, but after 20 minutes I ended up with a pretty decent loaf of bread.

It is a little burnt at the back, but not so much it isn't edible. I put a large frying pan with hot water in the base of the oven to provide a bit of steam and the crust turned out a little shiny, and crunchy without being hard. Fairly even texture inside, no big bubbles, and although there is not a very strong sour flavour it tastes nice and bready.

Next time I might have to try proofing and raising the bread in a warming box so it doesn't take quite so long, and lowering the oven temperature a little bit (and watching it more closely).

Tagged cooking.
Newer Posts | Older Posts
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!