About Me

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as Zed
  to his mates & enemies!

notzed at gmail >
fosstodon.org/@notzed >

Tags

android (44)
beagle (63)
biographical (104)
blogz (9)
business (1)
code (77)
compilerz (1)
cooking (31)
dez (7)
dusk (31)
esp32 (4)
extensionz (1)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (459)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (231)
java ee (3)
javafx (49)
jjmpeg (81)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (10)
opencl (120)
os (17)
panamaz (5)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
players (1)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
vulkan (3)
wanki (3)
workshop (3)
zcl (4)
zedzone (26)
Tuesday, 26 April 2011, 00:56

Parallel Prefix Sum

14/9/11: added a further paragraph on additional thoughts

Since coming across the parallel prefix sum a couple of weeks ago, a lot of things I need to solve seem to fall into the class of problems it is suited for within OpenCL on GPU platforms. However after a lot of trial and error and experimentation i've found it is usually just slower - sometimes by quite a margin.

In short, it takes advantage of the very high speed local memory ('LS') and parallelism to compute a commutative result from every element to every previous element in log2(n/2) steps.

But with GPU's there are a couple of problems with it:

  1. Even in the ideal case many of the threads are computing redundant data or not operating (depending how one chooses to implement it).
  2. A synchronisation step is required after every single operation - which is usually something trivially simple.

The first leads to an over-commitment of threading resources which impacts the scalability as the overall job size increases. And the second leads to very inefficient scheduling even on simple tasks, and a much heavier 'inner loop'.

For example, I implemented a 5x5 maximum operation (for non-maximum suppression peak detection) using a separate X and Y operation (I realise a 5-tap test doesn't really exercise the log2(N) nature of the algorithm much, but more on that later).

My first implementation uses a 16x16 workgroup size (after much experimentation this seems to be the generally best workgroup size for operating on images on my hardware - it leads to an occupancy of 1 and seems to be a good fit for the texture cache configuration). Each local workgroup reads a 16x16 area into LS and then 16 threads work together on each row of result. It only does a couple of 'prefix sum' steps because I only need the result from 4 samples, and I do the last one manually. I use the trick of offsetting the starting point so no thread requires any conditional execution. Finally, it only produces 12 valid results for the 16 inputs since you need overlap.

Figure 1: Steps taken for parallel maximum calculation. Only the workings of 4 of the 16 threads are shown.

Because it only generates 12 results it needs to be run 16/12 times the width of the image. This runs in about 65uS on the test data set.

Then I tried a version which reads 2x 16x16 blocks into memory so it can produce all 16 results in one go - unfortunately i've lost the timings and I can't be bothered to re-run it, but i'm fairly confident it wasn't terribly impressive.

Finally I implemented a very simple version which just reads in 2 16x16 blocks into local memory, and then does the operation on the 2 pixels before and 2 pixels after the current location (i.e. an unrolled loop). This was somewhat quicker - 48uS, or about 25% faster.

I didn't bother trying it for the parallel sum case, but I also tried larger window sizes for the simple version - and even at 9 it is still 20% faster than the 5X case for the parallel sum version. And this is for the single channel case - for a 4 channel image you have a 4x LS load, which is not required when it is calculated in registers.

Intuition would tell you that increasing the data-size will eventually lead to a case where it out-performs the simple cases. But the wider the data being calculated the more threads you require and this reduces the opportunity for hiding latencies by letting the GPU schedule independent workgroups. The local store can also be a factor since it too can limit how wide you can go.

I also applied it to (larger) problems where you're only interested in the final result. Because branching is expensive it seems on paper that it doesn't matter if you generate many redundant results since the overall number of steps is much lower - e.g. a 16x16 summation only takes 7 steps rather than 256. Although in reality you break it up into 16 strips 1xwide so it's only 32 steps (16 lots of 16 plus 1 of 16). And it only takes 16 threads rather than 256, so you can execute 16x as many at once for a given number of threads. And you don't need any local store.

I found in all cases it was (sometimes much) faster to split it into 16x1 loops which operate on 16 data items, and then have a single thread complete the partial sums.

And finally the one case where it seemed to have traction - calculating an integral image where every pixel has it's value added to every pixel to the right/below it - did seem faster than another implementation I had. But that initial implementation was before I had discovered other performance improvements so I suspect I could probably do better if i had another go. To satisfy my curiosity I just tried implementing part of it using a looping implementation and with little effort managed to beat or at least equal the prefix-sum version. Incidentally both require splitting the problem into smaller parts and then a final step to 'fix' the integral image - for the parallel prefix sum version you run out of local store or threads, and in both cases you need the parallelism to help improve the GPU efficiency.

Further Thoughts 14/9/11

Since writing this a lot more water has flowed under the bridge and I have a few more thoughts to add.

Having a smaller rather than larger work-size is important as I alluded to above: but larger problems can be made smaller by storing intermediate values in registers and then only sharing the work to reduce a smaller-multiple of the dataset. e.g. storing 4 registers locally allows 4x as much data to be 'processed' using the same amount of shared-work (and shared memory too) - which is the expensive stuff.

Since I was sticking to spec I have never tried removing the barriers and relying on the hardware's behaviour. So I don't know how much difference this makes: the technique in the paragraph above is even more useful then, if you can reduce the problem to the 64 elements required to benefit from the hardware characteristics.

The Integral Image code in socles uses these techniques, and in this case the parallel prefix sum was a (small) win. And IMHO is a fairly tight bit of code.

Tagged hacking, opencl.
Tuesday, 26 April 2011, 00:35

Hot Sauce #0 - Incendiary Tomato

Update I finally gave it a name and labelled the bottles - "Incendiary Tomato" sounded like a good name. I've also downed 1/4 of a bottle of this stuff with cheese and crackers or on my dinner, and it's really bloody nice!

I have a chilli or two from the garden and have been making a few hot(ish) sauces - tomato sauce with a few chillies thrown in to give it a pleasant kick, a hot sweet chilli and ginger sauce, and an apple, ginger, and chilli chuntney with much more of a kick. The latter is pretty nice, I took a recipe that asked for 500g of capsicum and 6 chillies and just used 500g of chillies. I think it's fairly medium-warm on the heat scale but a mate can't stomach it. I also tried a green fermented sauce (unfortunately `killed' that with too much vinegar) and have some red fermented chillies i'm not sure what to do with yet.

Habanero Chillies in Blue Bowl

The habaneros are hitting their fruiting stride at the moment so I thought i'd try a hand at something with a bit more bite and less sweetness. I also wanted to avoid the vinegar flavour which had overpowered my last effort so I based the acid on lime and citric acid. I found a recipe that looked a bit dogey - based on a tin of tomatoes - but I had some tomatoes i had to use so I started with that and then spiced it up beyond recognition. For once I recorded everything I put into it, so i thought i'd share ... I pretty much made it up as I went and mixed a few ideas I really wanted to try separately but once it was in the pot there was no going back.

Ingredients

700g Roasted whole tomatoes.
12 Ripe Habanero chillies.
1 Lime, juice and zest.
1 tsp Citric acid.
1 tsp White pepper (see below).
1 tsp Black pepper.
8 Cloves
1/2 tsp All spice.
1/4 tsp Ground mustard.
1/2 tsp Ground ginger.
1 tsp Salt
1 tbs Palm Sugar.
1 tbs Sugar.

Method

Cut the chillies into small pieces. Pound the whole dry spices in a mortar and pestle. Break up the palm sugar (I only did 1 tablespoon because i was too lazy to do another and used plain sugar when I needed more sweetness).

Put everything in a pot and simmer for about an hour - until the chillies are soft.

Use a (stick) blender to puree everything together. Don't splash your eyes.

Bottle in sterlised jars whilst hot.

Results

I'd probably rate it about a '7/10' for heat, where tobasco sauce is 5 (although it's been some time since I had any). Although the heat lingers much longer, and builds up the more you have. A teaspoon would be enough for a nicely burning steak.

I'll have to let it sit in the bottle a while to finalise the flavours but for now I think it has a bit too much pepper which over-powers the chillies; although it's a bit hard to tell since the bite over-powers the flavour on the initial taste. Possibly more mustard, ginger, and all-spice would work too, even lime juice. Maybe a little vinegar wouldn't hurt? I think the sugar level is about right (for my palette), as is the saltiness - both of which are required to bring out the flavours. It's basically a very strong, extremely hot tomato sauce.

I don't know how it'll keep yet - the acid in the tomatoes, lime and added citric acid should hopefully be enough since I wont be finishing the litre or so I made too quickly and this is probably one sauce I can't share with friends.

And i've got more chillies and ideas to try so it might be competing for condiment time.

Tagged cooking.
Saturday, 16 April 2011, 12:11

Apple Pie

Back to regular programming after the last rant ...

I ended up with some ageing/damaged fruit and after sitting there degrading for a week I finally pulled a finger out and made my first apple pie with the recoverable fruit. I'm not a big dessert person - I can't really remember having apple pie since I was a kid (although i'm sure I've had some here and there) - and it was never like this.

It worked out pretty well in the end; it tastes at least as good as it looks and quite possibly better. I didn't have any lemon so used citric acid with some water to keep the sliced apples from going brown - and I think that gave it a bit of extra tang. I threw in a few more spices too for good measure (vanilla, extra whole cloves, more cinnamon). Even the crust on the bottom was a bit crunchy - I tried a tip I saw on the Hairy Biker's a coupe of days ago, sprinkling the base with semolina to absorb some extra moisture. Although having never done it before I don't know if it made any difference. I used this recipe - although I had a hell of a time with the pastry. I just can't seem to get the hang of working with short pastry although it usually turns out ok.

I've been pretty busy hacking some OpenCL stuff of late - so busy i've barely left a keyboard. More on that soon I imagine, once I get a bit of sleep and remember it is just what i've done - and if any of it is worth sharing.

Tagged cooking.
Friday, 01 April 2011, 04:49

Julia Gillard: Offensive Idiot

So one wonders what exactly prompted our Prime Minister to call The Greens extremists the other day, and then follow that up with the following gem yesterday:

"The Greens will never embrace Labor's delight at sharing the values of everyday Australians, in our cities, suburbs, towns and bush, who day after day, do the right thing, leading purposeful and dignified lives, driven by love of family and nation."

This is the sort of offensive wedge politics that had fuckwit little johnny rotten howard in power for a decade. Some minda of a minder has convinced her to pander to the bogan set obviously.

"The differences between Labor and the Greens take many forms but at the bottom of it are two vital ones. The Greens wrongly reject the moral imperative to a strong economy.

In this case, clearly "strong economy" is just a euphamism for "pandering to multinational and big-money interests through free trade, globalisation, and pro-business tax policies", and so on.

Not to mention fair go for all, mutual respect, and all that. Who's really being ``un-australian'' here? After-all, she's the one locking babies up behind razor wire in remote desert camps.

I'd suggest that if The Green's don't delight in sharing the values of 'everyday' Australians then it is probably the values of the particular group of 'everyday' Australians she's referring to that need to be questioned. Clearly at least 11% of the country thinks The Green's are ok enough to vote for which is a pretty big group to offend so directly (add the fact that these 11% are keeping her in power and it is clearly an idiotic thing to say).

Tagged rants.
Tuesday, 29 March 2011, 07:28

Social interaction on the level of the checkout

I was just thinking the other day about just what it is with facebook and twatter that most annoys me.

I think its because for the most part (particularly for facebook), the interaction never really gets beyond the level of discourse during a visit to the checkout. A hollow greeting and perhaps a comment on some immediate concern such as the weather.

That this is amongst people who ostensibly know each other on some personal level - a so-called 'friend' - is quite sad.

World in chaos

Of course, there have been an awful lot of other things occupying my mind of late - the disasters around the Pacific Ring Of Fire, North Africa, the collapse of western economies and so forth. After a while it turns into a sort of 'disaster porn' that you need a break from though. Having empathy for people's plight is one thing but it can become overwhelming and for nobody's benefit.

I'm pretty much convinced that people's plight is just misery anyway. The strong take what they want and the rest suffer. Seems to be the winning evolutionary strategy since we started agriculture, since before that the social groups were too close and small for cheating to win in the long term.

Back to work

After a nice break and some good weather i'm back to work again too. Going is a little slow but i'm making progress and getting back into the groove of things. Must start doing some exercise though - after xmas I gave up on the health kick and have been on a downward slide ever since.

Tagged rants.
Monday, 14 February 2011, 22:59

A house full of stuff

Stuff stuff everywhere - it's amazing how much stuff one accumulates over the years, even for someone like me who detests shopping.

After being a bit of a shut-in for a few weeks just doing nothing or poking away at the back yard i've been trundling about on the deadly treadly trying to procure a few more items.

First I was after a West Indian Lime (although I have since discovered that the 'Sublime' my sister in law gave me is basically the same). Unfortunately the closest nursery had none. Neither did the next closest (it was stupidly expensive anyway). So a bit of a hike to the next one - Bunnings at Mile End. Gigantic shop. Gigantic nursery. Not a single fruiting lime tree in their stock. Plenty of grapefruit. I mean who buys grapefruit any more? And fucking greeters and grumpy old ladies as checkout chicks. At least the greeters are hidden behind a 'desk' now, and not assaulting you as you enter the shop. I try to avoid Bunnings but I thought with their super-sized-shop they might at least have some. But once again they've demonstrated they have quantity of quality or range - I found another local hardware store anyway so hopefully that's the last time I ever have to go there. So the last place I could think of was in Beulah Park on the other side of the city across the road from where I used to live. Fortunately they had plenty of lime trees - and hot checkout chicks too. And their prices were even a fair bit cheaper. I bought a couple of tall ones (one for my sister) and rode home with two tree-flags flapping in the draft. Pannier bags are awesome. Now I have to work out where to put it.

That, and making some tomato sauce pretty much did the whole day in.

So the next day I set out to get another PlayTV (for a mythtv instance - only because I know for certain it works well with the computer, the TinyTwin TV tuner I am using currently is a bit crappy) and a chest freezer. I first went to Radio Rentals because it's close. But my patience was already tried somewhat - I got pissed off with the lack of freezers (they had 2 on display), the lack of staff in the giant shop - and the bizarre one-way doors. You go in the front and have exit out the back. I jogged through the isles to get out just to make the point how silly it was.

Next stop was Hardly Normals, or so I thought. I don't like shopping there (or shopping in general) but it was worth a comparison since it wasn't too far away. Unfortunately I had forgotten they closed the one I was headed to and moved it another 5km up the road to a disgustingly giant strip mall. Well to put it bluntly - fuck that, how do they expect people to shop there if you have to travel 10km each way just to get looked at funny and treated poorly? On the way i'd dropped by tricky dickies to see the price on the PlayTV but didn't get one thinking I could try to 'deal' with Hardly Normals to get both, but the lack of shop put a dent in that idea and then I was so pissed off I forgot about it.

I decided to head to the other side of town instead since I know of a few shops that way - but since I was going past home I stopped to look up other shops on the internets and to cut a long story short decided to really 'fuck that', and just bought it online. Last time I bought something just up the road they wanted $50 or more for delivery anyway, with no indication of when it might arrive other than the day (i'm often home but i don't like being tied home unnecessarily). I ordered this from Sydney, $130 below retail, $40 delivery and within 5 hours someone had called to say it would arrive between 11 and 2 the next day (obviously it isn't coming from Sydney). Assuming it arrives that was a much easier experience.

On the plus side I discovered an independent hardware store on the way home - and one somewhat closer too. That'll give me another option when I want to avoid Bunnings. Although the manager warned me not to leave my bike outside for the thieves which frequent the area - unfortunately I missed the opportunity to suggest a bike rail which are sorely lacking in almost all of Prospect (so much for their 'sustainable vision') - there isn't even any parking control signs to lock your bike to ...

So on Sunday I also made up a batch of Tomato sauce. I'm fairly happy the way it turned out although I probably should've reduced it a little more as it is a touch runny. But in the end I simply lost patience after cooking and simmering for more than 6 hours, and it should do. I couldn't find the recipe I used last time and since I had a bottle of Ezy-Sauce I'd bought more than 5 years ago I thought it was about time I used that. There's a recipe on taste.com.au as 'Grandma's Tomato Sauce' but it's really just the same as the one on the bottle so I followed that instead. Unfortunately I forgot to shake the bottle before opening it so I stuffed it up a bit, but I made it up with a few more crushed cloves, black pepper and a huge pile of chillies. It's got a nice little kick to it although perhaps a bit sweet for my taste. I don't even use it much myself but it is absolutely scrumptious on a bit of burnt snag in a fresh slice of bread and perfect for hot dogs - so the 4.5L I made should do me for a while.

I'm also a couple of weeks into making some fermented 'tobasco' sauce. I have some 'Cayesan' chillies which have gone ballistic and the chillies were so heavy one of the branches broke off. I didn't even pick all of them off (I managed to save the branch by tying it up) and ended up with about 750g of green chillies. Ideally i'd use red ones but since I had a whole lot of fresh chillies at once I thought i'd see how a green version of the fermented sauce works out. Still a few weeks left on that. Now the Cayesan's are starting to ripen I'm getting a few red ones each day and waiting till I get enough to make something with.

My habanero plants are starting to produce fruit at last too. They taste fantastic as always - I put a couple of green ones in the tomato sauce, I love the almost apple-like sweetness followed by the searing heat. I only just finished the last of the big crop I had 2 years ago and I finally have some in the ground rather than just in pots so I'm hoping for a crap-load again. Although i've had the occasional problems with pests on the ones in pots in general the pests have been under control better this year, and that's perhaps because i'm growing a bigger variety of stuff and it's all a bit healthier with all the rain we've had.

I've finally levelled off the lawn at the back and well on the way to having it full of lush grass. After 3 odd years of having piles of crap, dirt, and general mess it's a relief to have it almost done. It's a pity I don't have a bit more room for horticulture but I think I have the balance about right given the layout of the yard. There's always pots I guess - but I have heaps of those already and they take a lot more work and the plants usually don't grow as well.

Oh, I got my BeagleBoard polo shirt and Beagleboard-XM yesterday evening, at long last - a bonus from working on GSOC 2010. That took a while to arrive. I haven't got it working yet because it uses a different gender for serial cable and I don't have the right PSU and all my cables are in a box behind other boxes, but hopefully i'll get it working soon. It looks like a tidy little unit anyway. I've barely been touching the computer for the last few weeks other than to read a few blogs and keep up with the news (there's been a lot of it lately), but hey, it's summer so that stuff can wait for the long rainy days of winter.

Tagged biographical, horticulture, house, rants.
Tuesday, 04 January 2011, 02:02

Gran Turismo 5

I finally got over being angry at Sony and played a bit of Gran Turismo 5 on the weekend (it had nothing to do with the discovery of the broken PS3 security over the same weekend - but yay for that).

In a word: unfinished.

Compared to GT3 - which I consider the pinnacle of the series - there are so many graphical issues, slow loading times, poor models and frustrating game-play mechanics.

Models
Some of the models are really bad - almost certainly direct PSP imports. If as reported they spent 6 months on modelling each of the 200 'premium' cars then it was mostly a waste of time - you can barely see much of the interior if you do drive from inside, and there's little reason to anyway. The sounds are pretty weak too.
1080p@ under 50 fps
The sharpness and crispness only shows up how plain some of the models and race-tracks are. I'd rather 2x AA on 720p, so at least the frame-rate could keep up and you could improve the textures (also allow simpler models). It makes the screenshots look bare, and the moving game less than impressive.
Screen tearing
Tearing tearing everywhere. Trying to do too much AA or too high a resolution is all lost as soon as tearing rips the screen in two. I've definitely seen worse tearing ... but that was in games that weren't gran turismo.
Pop in
There's just too much pop in. Occasionally a whole road's worth of shadows pops in while you're driving on it too.
Shadows
Shadow mask is way too low a resolution, as are the environmental maps used for reflections (additionally they have low temporal resolution). Again this looks particularly bad at 1080p.
Volumetric mist/dust
I think the dust and mist actually looks quite nice - at least you can't see it as a bunch of layered animated sprites. Except at the edges that is - where it looks like total shit. Nice algorithm, simply unfinished or simply unsuitable for such a high screen resolution.
Slow/inconsistent menus
Obviously different teams/individuals work on every separate menu system in isolation as they still have the problem where they all look and work differently.
OK pox
Way too many 'are you sure' 'ok?' boxes and so on. Win some money - click ok. Win a car - click ok. Then go to the ticket, click on it, click ok to get it added to your garage. Click ok after it shows it driving up in the dark. Click ok/cancel to 'use it now'. I mean ... really? Its even worse when one considers this is the only game these guys have been working on for well over a decade, and yet the basic navigation mechanics haven't progressed at all - or gotten worse!
Loading slog
This is supposed to be a `next gen' console, why does it still have so much loading delay? Loading a track is one thing but almost every action from going to the car settings to the options to changing menu's triggers another load. I don't consider a 10G `install' a viable option to mitigate this poor design and reportedly it's hardly greased lightning with it anyway. These sort of delays had a reason on the PS2 with it's tiny RAM but there's no such reason on the PS3 - only excuses.

I really can't see where all the time went to come up with this result. Although the modelling and engine coding no doubt took a long time, all of these things can be done in parallel and they would have been playing with the PS3 as early as Evolution Studios anyway (Motorstorm devs who are now polishing their 3rd game), so it can't be the coding. There doesn't seem to be any obvious delay on the 'critical path' that could have caused such a long release cycle for such an unfinished game. Apart that is from poor management. Sony for letting them get away with it and Polyphony for fucking around too much. I'm not counting the over-detailed modelling here either, i'm talking about the engine and menu systems.

It just looks like Sony demanded that they stop fucking around and release it by xmas 2010, perhaps because the PS4 is coming up, or simply to ensure the PS3 hadn't hit it's inevitable decline.

It's only beta quality - a decent beta, but still lacking all the polish we should all expect from such a triple-a game. If I were to review it I wouldn't give it more than 60% given all of the above (a low 'C'/'Pass' grade), and the fact the game itself isn't much other than `GT4-HD'.

If it was a launch title then there would be good excuses for all of these problems, but it isn't.

Tagged games, ps3.
Thursday, 16 December 2010, 04:23

Branchy code

This week I was looking at feature detectors - and one of those I was trying is FAST. This is pretty much the definition of branchy code - a function which is a single if statement of 2900 lines.

I didn't think it would be something that would map to OpenCL particularly well, but I was pleasantly surprised.

I simply took the if statement and wrapped it in a kernel which I call for every pixel, and added a simple list append function at the end (more on that below). With a bit of playing with the kernel work size I got it down to about 75uS for a 1024x768 frame at around 1000 output points.

I still haven't done non-maximum suppression or the like but it certainly lives up to it's name - it's damn FAST. I've been playing with SURF and others and even a partial implementation is licking 2000uS/frame. FAST seems to be very sensitive to noise and camera focus though, so i'm not sure I can use it - hopefully the non-maximum suppression will help.

GPU List Append

One problem with GPU coding is that it particularly likes having large well-defined data-sets to work with, and what I needed to do was just generate points beyond a threshold. In the past i've just had a separate post-process which 'reduces' the data, but that input had already been reduced and wasn't just a whole frame's-worth.

So I came up with something very simple based on atomics. I don't know whether it's the best solution but it seems to work ok in this case.

kernel void somekernel(..., global uint *indexp, global float *posp) {

    // do stuff

    if (result > threshold) {
        uint index = atom_inc(&indexp[0]);

        if (index < 1024) {
            posp[index] = (float2) { x, y };
        }
    }
}

Anything that then uses the 'index' count just has to limit it to the maximum (e.g. 1024) and away it goes.

Leave

As of about an hour ago i'm now on leave until next year. Yay. I've been hanging out for it for the last few days and it's been pretty difficult keeping the motivation level up (despite working on some interesting stuff). Yard should keep me busy and i'll probably get into some hacking before long too. Maybe i'll finally play GT5 and 'upgrade' the PS3 firmware. But for the rest of the day it's time for SFA. And maybe a brewski or twoski. I've got one more wort busy in the cellar, and once i've bottled that next week i'll have over 100 longnecks - and even if that isn't enough for the whole summer it should be a good start.

Tagged opencl.
Newer Posts | Older Posts
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!