About Me
Michael Zucchi
B.E. (Comp. Sys. Eng.)
also known as Zed
to his mates & enemies!
< notzed at gmail >
< fosstodon.org/@notzed >
XBMC beagle, GSOC 2010
Well I 'promised' an update on the beagleboard gsoc 2010 xbmc whatsit, and since we've just had the 'mid-terms' and I have some spare time it seems like a good point to poke it out.
The good news of the day is that Tobias passed the midterms well - although I haven't had a huge amount of time to devote to it, he has thankfully worked very well independently. He's been working well with both the xbmc and beagleboard communities, finding relevant experts to aid the task which has let me off the hook quite a bit. He's had to spend a lot of time just on the beagleboard environment which was an unavoidable pain since the hardware arrived late - and xbmc is a mammoth bit of code that takes an age and a half to compile. But most of the code to this point has been changing the rendering system from a game-like render-all loop to a damage-based system - which could be done on a pc. Still bugs, but it's getting there. The patches look nice, and he keeps the commited code building (just as well - it takes hours to build on the target).
He's started on the video overlay system now, so i'm expecting some big improvements. Some initial timing suggests it's spending nearly 60% of it's time in the 'gpu' doing YUV conversion (i'm not sure what resolution he's running it at). The video overlay will do that for free, and more in that it reduces the memory bandwidth requirements significantly.
XBMC basically 'runs' on the beagleboard now, but can only play quite low-resolution video and there's a few issues with missing text, but it does run. With a simpler theme and the video overlay work i'm hoping it will at least be at the SD-video media player level. The XM might even manage 720p for simpler video formats like mpeg2. Although out of scope for this stage of the project, there's also the DSP sitting idle at the moment so the hardware is capable of quite a bit more yet.
Lots o threads
I got a new work machine - hence the previous post. That was a short diversion into ms vista 7, which I thankfully didn't need to keep up - I was having massive problems with the nvidia graphics drivers under fedora 13, and problems with my code. But it turned out that it was just my broken code and it crashed just as badly in ms visa 7. Wow what a horrid system they've designed. Move a window to look at something behind it and suddenly it maximises so you can't see what you wanted, the 'file browser manager' thing which seems confused as to what it's trying to be, and probably the worst item - move a mouse over a list and the scroll wheel keeps scrolling the last list that had focus. Not even clicking on the scroll-bar gives it focus and you need to click in the list (often activating it - which you don't want). Ugh. It's like a hollow shell of a tech demo of slightly wacky ideas from GNOME and KDE all wrapped together with a questionably 'pretty' interface (i found it far too spaced out with poor font choices). It kind of looks ok, but there's no meat under it and lots of things don't work quite right. The OS installs pretty fast at least - but you don't get anything that lets you do any work and it just turns into a labourious hunt for some crap that probably doesn't work very well, install, repeat, until you have a remotely usable system. And it still does product registration? Jesus fucking Christ, that's just offensive.
OpenCL
So I had a few problems when I started moving code from the ATI card i've been using to the Nvidia one in the new machine. The compiler is a bit pickier/different about a few things, although iirc that was mostly not auto-casting scalars to vector types in a variable declaration. A bit of a pain but fortunately I don't have too much code yet and it was mostly a mechanical conversion process.
I suppose the main problem I had - had I known it at the time it would've saved me a very long and wasted day or two - was that the CPUs are much pickier about the code they'll execute. The ATI card doesn't mind some stray memory accesses but the nvidia one just crashes. That is good really since the code is buggy - but unfortunately you get no indicator of why it crashed, or even when it crashed. At some random point after some code you've queued to execute runs you get a random and meaningless (and undocumented/not to spec I might add) error code which says things have stopped working. I was thrown out because the nvidia drivers were a pain to set up - the `development drivers' just wouldn't run, and the production drivers ran but were a little touchy - if I log out of the session X wont restart. I was also thrown out since adding some debugging code made the routine run too (and since I had it working on the other machine ...).
Anyway now that I know any of these random errors are actually just segfaults it's much easier to deal with without getting a splitting headache. Actually I think I was getting so stressed (or maybe it's because i've been eating all sorts of crap) I spent most of one day with an anxiety induced dizzy spell and headache (ms vista 7 helped there too).
So anyway, the one main routine on which i've been working for the last few weeks got running again and I cleaned it up and whatnot. It's only about 2x faster than the ATI card (HD 5770, vs GTX 480 IIRC), but the code was 'tuned' for the ATI. Although using the word 'tuned' is being a bit generous really, I just kept trying things and seeing what was faster, since there are zero tools on Linux to perform any detailed profiling. I guess that isn't so surprising - if I coded it right it should be completely memory constrained anyway. I did make some minor changes since the nvidia cpu's support better datatype conversion than juniper, e.g. loading floats from bytes in 2 instructions, not dozens (it was much faster to load uint's directly and convert manually on Juniper, but the other way around on nvidia). Right now i'm taking data and converting to floats and working with that everywhere which was the right approach on the Juniper arch but might not be on nvidia since it multiplies the memory bandwidth by 4. But there's just not enough time in every day to try everything - I worked over the wet dreary weekend and ended up over 50 hours by COB Wednesday so i'm having a break now. I was supposed to be dropping to 4 days/week this financial year!
I'm still getting to grips with mapping problems efficiently to the GPUs. I've had some success with a more complex approach which copies data to local memory in coalesced accesses and then works from the local memory - which is fast (and pretty much essential on the ATI with no cache). But for smaller problem sets it gets difficult to find enough threads to work together on the problem or even to work out the addressing arithmetic so the algorithm works. Although I don't think it leads to the ultimate performance, and may not work terribly well on the ATI - a solution that seems to be working somewhat is to just throw as many threads at it as possible - reduce the address arithmetic to very simple operations and then process as little as one result per kernel. And it makes it practical to vary parameters without needing to hand-code every scenario to get usable performance, let alone best performance.
Free as a thing of freeness
If I could think of something to work on i'd also like to write some free software using OpenCL now i'm starting to get the hang of it - well if I can invent a time machine so I can add an extra week to every week so I can fit it in. But the trivial stuff I can think of seems too pointless, or the more complex stuff way too complex.
In the back of my mind i've had the idea of doing a Gimp-ish/ImageJ-ish application in Java (see ImageJ - many big operations work faster than the gimp), and using OpenCL to accelerate (or indeed completely implement) the operations. But ... it's such a big fucking task to get something even useful - and requires a huge amount of work in the UI department, so i'm not sure I want to commit to it. Just the basic window with a zoomable editable layered surface with a couple of drawing tools, selection and filter/effect options is quite a task (ok ok, it's basically the whole app ;-). I guess if I can get over the hurdle of a main editing surface widget I might be able to move forward with this idea.
Another idea is a 'gimp for video'. There's a nice java wrapping for ffmpeg which sorts out the codec end of things (yes there is, although like many java things, its fucking hard to find non-stale shit on google - xuggler). But here i'm lacking a bit of domain knowledge (and about all I really want to do is create slideshows/splice video together), and i'm not sure OpenCL is a good fit (simple fades and wipes are probably faster on a modern cpu). And working with media containers is entering a world of pain. Let alone the sorry fucked up state of linux sound which is something I don't think I could face sober and wouldn't put up with drunk. Might leave that idea.
I can't really think of anything else I might use that could make use of it to be honest.
Ahaah
Well, now I know where KDE4 got its fucked up shithouse 'start menu' from. And the original from which is blatantly copied is also fucked up and shithouse.
Wow.
How fucked up.
And shithouse.
Cordial
I had just a few limequats left on a rather sickly looking tree I have in a pot so I thought i'd make some cordial from it before I used them up ('syrup' for americans). They have a very nice flavour - much as you'd expect, ripe lime mixed with kumquat, so a little like a tart orange/lemon. I threw in a couple of lemons too since the limequats were a bit small.
I ended up with nearly 2 litres of this nice golden liquid, plus some glace peel I can use in a cake if I remember to save it.
I dropped the sugar a bit off the recipe I found on the ABC, and a bit more citric acid because I don't like it too sweet and a bit more tart (and the pot was too full!). I used 1kg sugar and about 1.5tbs of acid.
The tree was looking pretty ill and I think I overdid the treatments and now most of the rest of the leaves have fallen off! It was much like that last year by mid-winter too, so I guess i'll have to wait till spring to see if it will recover - I hope so because I love the flavour. Being in a big pot I didn't keep it watered properly over summer either. I also found that my lime tree has borers in the trunk and given I also failed to water it properly over summer it wasn't in great shape anyway - may well lose that one. If so I might get a native lime (if i ever see one in a shop again - saw one once, 5 years ago), or a more acidic lime (or lemon).
Yay
Had a bit of a victory today - after a kick in the nuts or two. Finally got some of my OpenCL code running with the correct results at a reasonable clip.
I spent most of the day working out why the results were wrong - partially because of a minor bug or three, but mostly because all of the synchronisation primitives don't work when you call a kernel function from another kernel function (at least in the ATI sdk). Wish I had have known that to start with ...
I think it's roughly 100x faster than the original java or c code (although I should quantify it), so that's a pretty penny in the bank, and I think there's a bit more I can squeeze out of it - let alone using beefier hardware. One of the keys was to use a native format for most operations - I take the input data which is in a packed byte format and convert it to floats, and then operate on those. The other key is to use local memory as a programmed cache to reduce the load on global memory. And finally to utilise registers as much as possible - once i've loaded data from memory re-use the data repeatedly before needing to go back to memory or running out of registers. The OpenCL api also has some nice queuing and job management which makes it easy to let the CPU do other work whilst the GPU is busy, without having to synchronise every operation - which is the real mind killer. And it goes without saying that the data is loaded once to the graphics card memory and all operations operate there until I get a result out (converted to the format I need).
I still haven't managed to get the image datatypes to work but I will keep trying as t should fit this problem well (and nice to see that the JOCL guys were quick to implement the missing api's to support them). Using arrays is a bit of a pita tbh - i've had to split my work 'tile' into multiple slices, and keeping track of where each of the work units (threads) within the work group ('process') gets hairier than a hippies armpits. Using the texture units should let me remove all of the manual cache code and messy address arithmetic - although whether it executes faster is the real test.
Sourdough 0.3 - Crusty Loaf Edition
Well, another week another attempt at sourdough, and much more success this time.
I probably didn't let it proof quite long enough because it was so cold - I gave it a good 4 hours, but after forming a loaf it managed to rise ok - although it took about 18 hours. I went straight from creating the dough to forming the loaf without an intermediate rise and that definitely worked better.
Given the cake yesterday took an extra 30 minutes to bake at what should have been the correct temperature, I ramped the oven setting up to over 200 to try to compensate. It was probably a bit hot and I cooked it a bit too long, but after 20 minutes I ended up with a pretty decent loaf of bread.
It is a little burnt at the back, but not so much it isn't edible. I put a large frying pan with hot water in the base of the oven to provide a bit of steam and the crust turned out a little shiny, and crunchy without being hard. Fairly even texture inside, no big bubbles, and although there is not a very strong sour flavour it tastes nice and bready.
Next time I might have to try proofing and raising the bread in a warming box so it doesn't take quite so long, and lowering the oven temperature a little bit (and watching it more closely).
Pumpkin Soup
Yes, another food post ... The old lady came to visit today and dropped off a small butternut pumpkin and a small pressure cooker. After growing some pumpkins myself last year I went off pumpkins a bit, and about the only thing I could think of was pumpkin soup, and it also gave an opportunity to try out the pressure cooker.
Fried up the pumpkin, some potato, onion, garlic, chillies, some keens curry powder and oil in a separate pot since this one was too small, and then threw it into the cooker. I grabbed some frozen stock from the fridge melted those in the microwave and added another chicken stock cube and poured that all in.
Cooked it for 10 minutes under pressure - once I worked out the rate of heat to charge it up. Blended it up with a little milk and pepper, and it was done. Turns out that one of the frozen stock blocks wasn't chicken after-all but some crab stock I made from the remains of some chilly crab I couldn't let go to waste. Glad I did, it really added something quite special.
And perhaps some desert? I bought some pears to widen my diet but they were so sweet and the texture so soggy I found them unpalatable. This cake (whose recipe is suspiciously like a nice apple cake recipe I have) looks the part but it's still cooling and seems a little fragile so far.
Omlette
Spent all afternoon in the garden, mowing the lawn and nature strip, sweeping up leaves, weeding, fertilising and watering all the pots, spraying some oil on the citrus. To finish the day off I made a nice variation of a Spanish Omelette for dinner, and it turned out ok even though I forgot a few ingredients.
Bacon, eggs, capsicum, a bunch of greenery from the garden (stinging nettles and sweet potato leaves), chillies, chives, pepper, fish sauce. Fried in a hot pan and then finished under the grill topped with cheese and spices. For something different I also put in a some limequat juice and chopped up rind (kumquat hybrid) which added a nice tang and some bitterness.
Ate the whole lot in one sitting! Well it was the only meal I had today.
Copyright (C) 2019 Michael Zucchi, All Rights Reserved.
Powered by gcc & me!