About Me

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as Zed
  to his mates & enemies!

notzed at gmail >
fosstodon.org/@notzed >

Tags

android (44)
beagle (63)
biographical (104)
blogz (9)
business (1)
code (77)
compilerz (1)
cooking (31)
dez (7)
dusk (31)
esp32 (4)
extensionz (1)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (459)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (231)
java ee (3)
javafx (49)
jjmpeg (81)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (10)
opencl (120)
os (17)
panamaz (5)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
players (1)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
vulkan (3)
wanki (3)
workshop (3)
zcl (4)
zedzone (26)
Wednesday, 27 April 2011, 23:54

Simple padding trick

For a few algorithms I have, I end up with data which is 32 elements wide operated on by 16x16 threads. This ends up with 100% local memory contention if mapped directly to the local memory since every 16th thread aliases to the same bank.

Although this can be addressed with a 16 word padding this is wasteful of precious local memory which might mean the code can't run in parallel to the extent it might otherwise, or simply cannot fit.

A simple trick which still keeps the addressing quite simple is to shift every odd line to the second half of the data which is then offset by 16 words. In effect bit 0 of the y address is shifted to the top of the addressing index, with an offset.

For example, if a kernel is working on a 16x16 region of memory but requires some data either side of the target tile, it might do something like:

   local float ldata[32*16];
   int lx = get_local_id(0);
   int ly = get_local_id(1);

   int i = lx + ly * 32;

   // load data
   ldata[i] = ..read data block 8 to the left ...;
   ldata[i+16] = ..read data 8 to the right...;
   barrier(CLK_LOCAL_MEM_FENCE);

   // work using i+8 as the centre pixel for this thread

By only changing the calculation of i and padding the storage with only 16 words, the contention is easily removed without changing any other code:

   local float ldata[32*16+16];
   ...

   int i = lx + ( ly >> 1 ) * 32 + (32*8+16)*(y & 1);

   ...

Assuming one is only working in the X direction, for Y the addressing is slightly more complex of course. But this could come at no extra run-time cost once the loops are unwound.

Tagged hacking, opencl.
Wednesday, 27 April 2011, 10:05

Transpose Is Your Friend

With graphics programming a lot of algorithms can be split into separate X and Y passes. This generally works particularly well in the X case but in Y you can hit issues with memory (or processor) locality which can have a big impact on the algorithm.

But the GPU texture cache is block oriented rather than line oriented so both X and Y oriented algorithms can be implemented equally (in)efficiently if you store data in images. However, once in local memory you're effectively back to line-oriented access ... (if you want to preserve your sanity whilst working out the memory addressing to efficiently access the banked memory).

The trick is just to transpose the data on read and write, and always work in the X direction locally. It also means the X and Y working code is often identical. This can be done just within the local work-group, but for 2D workgroups one has the added complication that work units are allocated in row-major order, i.e. in X first.

The simple solution is just to transpose the global X and Y work-size as well, and simply swap the result of get_global_id(0) and get_global_id(1) when reading or writing the images.

Tagged opencl.
Tuesday, 26 April 2011, 00:56

Parallel Prefix Sum

14/9/11: added a further paragraph on additional thoughts

Since coming across the parallel prefix sum a couple of weeks ago, a lot of things I need to solve seem to fall into the class of problems it is suited for within OpenCL on GPU platforms. However after a lot of trial and error and experimentation i've found it is usually just slower - sometimes by quite a margin.

In short, it takes advantage of the very high speed local memory ('LS') and parallelism to compute a commutative result from every element to every previous element in log2(n/2) steps.

But with GPU's there are a couple of problems with it:

  1. Even in the ideal case many of the threads are computing redundant data or not operating (depending how one chooses to implement it).
  2. A synchronisation step is required after every single operation - which is usually something trivially simple.

The first leads to an over-commitment of threading resources which impacts the scalability as the overall job size increases. And the second leads to very inefficient scheduling even on simple tasks, and a much heavier 'inner loop'.

For example, I implemented a 5x5 maximum operation (for non-maximum suppression peak detection) using a separate X and Y operation (I realise a 5-tap test doesn't really exercise the log2(N) nature of the algorithm much, but more on that later).

My first implementation uses a 16x16 workgroup size (after much experimentation this seems to be the generally best workgroup size for operating on images on my hardware - it leads to an occupancy of 1 and seems to be a good fit for the texture cache configuration). Each local workgroup reads a 16x16 area into LS and then 16 threads work together on each row of result. It only does a couple of 'prefix sum' steps because I only need the result from 4 samples, and I do the last one manually. I use the trick of offsetting the starting point so no thread requires any conditional execution. Finally, it only produces 12 valid results for the 16 inputs since you need overlap.

Figure 1: Steps taken for parallel maximum calculation. Only the workings of 4 of the 16 threads are shown.

Because it only generates 12 results it needs to be run 16/12 times the width of the image. This runs in about 65uS on the test data set.

Then I tried a version which reads 2x 16x16 blocks into memory so it can produce all 16 results in one go - unfortunately i've lost the timings and I can't be bothered to re-run it, but i'm fairly confident it wasn't terribly impressive.

Finally I implemented a very simple version which just reads in 2 16x16 blocks into local memory, and then does the operation on the 2 pixels before and 2 pixels after the current location (i.e. an unrolled loop). This was somewhat quicker - 48uS, or about 25% faster.

I didn't bother trying it for the parallel sum case, but I also tried larger window sizes for the simple version - and even at 9 it is still 20% faster than the 5X case for the parallel sum version. And this is for the single channel case - for a 4 channel image you have a 4x LS load, which is not required when it is calculated in registers.

Intuition would tell you that increasing the data-size will eventually lead to a case where it out-performs the simple cases. But the wider the data being calculated the more threads you require and this reduces the opportunity for hiding latencies by letting the GPU schedule independent workgroups. The local store can also be a factor since it too can limit how wide you can go.

I also applied it to (larger) problems where you're only interested in the final result. Because branching is expensive it seems on paper that it doesn't matter if you generate many redundant results since the overall number of steps is much lower - e.g. a 16x16 summation only takes 7 steps rather than 256. Although in reality you break it up into 16 strips 1xwide so it's only 32 steps (16 lots of 16 plus 1 of 16). And it only takes 16 threads rather than 256, so you can execute 16x as many at once for a given number of threads. And you don't need any local store.

I found in all cases it was (sometimes much) faster to split it into 16x1 loops which operate on 16 data items, and then have a single thread complete the partial sums.

And finally the one case where it seemed to have traction - calculating an integral image where every pixel has it's value added to every pixel to the right/below it - did seem faster than another implementation I had. But that initial implementation was before I had discovered other performance improvements so I suspect I could probably do better if i had another go. To satisfy my curiosity I just tried implementing part of it using a looping implementation and with little effort managed to beat or at least equal the prefix-sum version. Incidentally both require splitting the problem into smaller parts and then a final step to 'fix' the integral image - for the parallel prefix sum version you run out of local store or threads, and in both cases you need the parallelism to help improve the GPU efficiency.

Further Thoughts 14/9/11

Since writing this a lot more water has flowed under the bridge and I have a few more thoughts to add.

Having a smaller rather than larger work-size is important as I alluded to above: but larger problems can be made smaller by storing intermediate values in registers and then only sharing the work to reduce a smaller-multiple of the dataset. e.g. storing 4 registers locally allows 4x as much data to be 'processed' using the same amount of shared-work (and shared memory too) - which is the expensive stuff.

Since I was sticking to spec I have never tried removing the barriers and relying on the hardware's behaviour. So I don't know how much difference this makes: the technique in the paragraph above is even more useful then, if you can reduce the problem to the 64 elements required to benefit from the hardware characteristics.

The Integral Image code in socles uses these techniques, and in this case the parallel prefix sum was a (small) win. And IMHO is a fairly tight bit of code.

Tagged hacking, opencl.
Tuesday, 26 April 2011, 00:35

Hot Sauce #0 - Incendiary Tomato

Update I finally gave it a name and labelled the bottles - "Incendiary Tomato" sounded like a good name. I've also downed 1/4 of a bottle of this stuff with cheese and crackers or on my dinner, and it's really bloody nice!

I have a chilli or two from the garden and have been making a few hot(ish) sauces - tomato sauce with a few chillies thrown in to give it a pleasant kick, a hot sweet chilli and ginger sauce, and an apple, ginger, and chilli chuntney with much more of a kick. The latter is pretty nice, I took a recipe that asked for 500g of capsicum and 6 chillies and just used 500g of chillies. I think it's fairly medium-warm on the heat scale but a mate can't stomach it. I also tried a green fermented sauce (unfortunately `killed' that with too much vinegar) and have some red fermented chillies i'm not sure what to do with yet.

Habanero Chillies in Blue Bowl

The habaneros are hitting their fruiting stride at the moment so I thought i'd try a hand at something with a bit more bite and less sweetness. I also wanted to avoid the vinegar flavour which had overpowered my last effort so I based the acid on lime and citric acid. I found a recipe that looked a bit dogey - based on a tin of tomatoes - but I had some tomatoes i had to use so I started with that and then spiced it up beyond recognition. For once I recorded everything I put into it, so i thought i'd share ... I pretty much made it up as I went and mixed a few ideas I really wanted to try separately but once it was in the pot there was no going back.

Ingredients

700g Roasted whole tomatoes.
12 Ripe Habanero chillies.
1 Lime, juice and zest.
1 tsp Citric acid.
1 tsp White pepper (see below).
1 tsp Black pepper.
8 Cloves
1/2 tsp All spice.
1/4 tsp Ground mustard.
1/2 tsp Ground ginger.
1 tsp Salt
1 tbs Palm Sugar.
1 tbs Sugar.

Method

Cut the chillies into small pieces. Pound the whole dry spices in a mortar and pestle. Break up the palm sugar (I only did 1 tablespoon because i was too lazy to do another and used plain sugar when I needed more sweetness).

Put everything in a pot and simmer for about an hour - until the chillies are soft.

Use a (stick) blender to puree everything together. Don't splash your eyes.

Bottle in sterlised jars whilst hot.

Results

I'd probably rate it about a '7/10' for heat, where tobasco sauce is 5 (although it's been some time since I had any). Although the heat lingers much longer, and builds up the more you have. A teaspoon would be enough for a nicely burning steak.

I'll have to let it sit in the bottle a while to finalise the flavours but for now I think it has a bit too much pepper which over-powers the chillies; although it's a bit hard to tell since the bite over-powers the flavour on the initial taste. Possibly more mustard, ginger, and all-spice would work too, even lime juice. Maybe a little vinegar wouldn't hurt? I think the sugar level is about right (for my palette), as is the saltiness - both of which are required to bring out the flavours. It's basically a very strong, extremely hot tomato sauce.

I don't know how it'll keep yet - the acid in the tomatoes, lime and added citric acid should hopefully be enough since I wont be finishing the litre or so I made too quickly and this is probably one sauce I can't share with friends.

And i've got more chillies and ideas to try so it might be competing for condiment time.

Tagged cooking.
Saturday, 16 April 2011, 12:11

Apple Pie

Back to regular programming after the last rant ...

I ended up with some ageing/damaged fruit and after sitting there degrading for a week I finally pulled a finger out and made my first apple pie with the recoverable fruit. I'm not a big dessert person - I can't really remember having apple pie since I was a kid (although i'm sure I've had some here and there) - and it was never like this.

It worked out pretty well in the end; it tastes at least as good as it looks and quite possibly better. I didn't have any lemon so used citric acid with some water to keep the sliced apples from going brown - and I think that gave it a bit of extra tang. I threw in a few more spices too for good measure (vanilla, extra whole cloves, more cinnamon). Even the crust on the bottom was a bit crunchy - I tried a tip I saw on the Hairy Biker's a coupe of days ago, sprinkling the base with semolina to absorb some extra moisture. Although having never done it before I don't know if it made any difference. I used this recipe - although I had a hell of a time with the pastry. I just can't seem to get the hang of working with short pastry although it usually turns out ok.

I've been pretty busy hacking some OpenCL stuff of late - so busy i've barely left a keyboard. More on that soon I imagine, once I get a bit of sleep and remember it is just what i've done - and if any of it is worth sharing.

Tagged cooking.
Friday, 01 April 2011, 04:49

Julia Gillard: Offensive Idiot

So one wonders what exactly prompted our Prime Minister to call The Greens extremists the other day, and then follow that up with the following gem yesterday:

"The Greens will never embrace Labor's delight at sharing the values of everyday Australians, in our cities, suburbs, towns and bush, who day after day, do the right thing, leading purposeful and dignified lives, driven by love of family and nation."

This is the sort of offensive wedge politics that had fuckwit little johnny rotten howard in power for a decade. Some minda of a minder has convinced her to pander to the bogan set obviously.

"The differences between Labor and the Greens take many forms but at the bottom of it are two vital ones. The Greens wrongly reject the moral imperative to a strong economy.

In this case, clearly "strong economy" is just a euphamism for "pandering to multinational and big-money interests through free trade, globalisation, and pro-business tax policies", and so on.

Not to mention fair go for all, mutual respect, and all that. Who's really being ``un-australian'' here? After-all, she's the one locking babies up behind razor wire in remote desert camps.

I'd suggest that if The Green's don't delight in sharing the values of 'everyday' Australians then it is probably the values of the particular group of 'everyday' Australians she's referring to that need to be questioned. Clearly at least 11% of the country thinks The Green's are ok enough to vote for which is a pretty big group to offend so directly (add the fact that these 11% are keeping her in power and it is clearly an idiotic thing to say).

Tagged rants.
Tuesday, 29 March 2011, 07:28

Social interaction on the level of the checkout

I was just thinking the other day about just what it is with facebook and twatter that most annoys me.

I think its because for the most part (particularly for facebook), the interaction never really gets beyond the level of discourse during a visit to the checkout. A hollow greeting and perhaps a comment on some immediate concern such as the weather.

That this is amongst people who ostensibly know each other on some personal level - a so-called 'friend' - is quite sad.

World in chaos

Of course, there have been an awful lot of other things occupying my mind of late - the disasters around the Pacific Ring Of Fire, North Africa, the collapse of western economies and so forth. After a while it turns into a sort of 'disaster porn' that you need a break from though. Having empathy for people's plight is one thing but it can become overwhelming and for nobody's benefit.

I'm pretty much convinced that people's plight is just misery anyway. The strong take what they want and the rest suffer. Seems to be the winning evolutionary strategy since we started agriculture, since before that the social groups were too close and small for cheating to win in the long term.

Back to work

After a nice break and some good weather i'm back to work again too. Going is a little slow but i'm making progress and getting back into the groove of things. Must start doing some exercise though - after xmas I gave up on the health kick and have been on a downward slide ever since.

Tagged rants.
Monday, 14 February 2011, 22:59

A house full of stuff

Stuff stuff everywhere - it's amazing how much stuff one accumulates over the years, even for someone like me who detests shopping.

After being a bit of a shut-in for a few weeks just doing nothing or poking away at the back yard i've been trundling about on the deadly treadly trying to procure a few more items.

First I was after a West Indian Lime (although I have since discovered that the 'Sublime' my sister in law gave me is basically the same). Unfortunately the closest nursery had none. Neither did the next closest (it was stupidly expensive anyway). So a bit of a hike to the next one - Bunnings at Mile End. Gigantic shop. Gigantic nursery. Not a single fruiting lime tree in their stock. Plenty of grapefruit. I mean who buys grapefruit any more? And fucking greeters and grumpy old ladies as checkout chicks. At least the greeters are hidden behind a 'desk' now, and not assaulting you as you enter the shop. I try to avoid Bunnings but I thought with their super-sized-shop they might at least have some. But once again they've demonstrated they have quantity of quality or range - I found another local hardware store anyway so hopefully that's the last time I ever have to go there. So the last place I could think of was in Beulah Park on the other side of the city across the road from where I used to live. Fortunately they had plenty of lime trees - and hot checkout chicks too. And their prices were even a fair bit cheaper. I bought a couple of tall ones (one for my sister) and rode home with two tree-flags flapping in the draft. Pannier bags are awesome. Now I have to work out where to put it.

That, and making some tomato sauce pretty much did the whole day in.

So the next day I set out to get another PlayTV (for a mythtv instance - only because I know for certain it works well with the computer, the TinyTwin TV tuner I am using currently is a bit crappy) and a chest freezer. I first went to Radio Rentals because it's close. But my patience was already tried somewhat - I got pissed off with the lack of freezers (they had 2 on display), the lack of staff in the giant shop - and the bizarre one-way doors. You go in the front and have exit out the back. I jogged through the isles to get out just to make the point how silly it was.

Next stop was Hardly Normals, or so I thought. I don't like shopping there (or shopping in general) but it was worth a comparison since it wasn't too far away. Unfortunately I had forgotten they closed the one I was headed to and moved it another 5km up the road to a disgustingly giant strip mall. Well to put it bluntly - fuck that, how do they expect people to shop there if you have to travel 10km each way just to get looked at funny and treated poorly? On the way i'd dropped by tricky dickies to see the price on the PlayTV but didn't get one thinking I could try to 'deal' with Hardly Normals to get both, but the lack of shop put a dent in that idea and then I was so pissed off I forgot about it.

I decided to head to the other side of town instead since I know of a few shops that way - but since I was going past home I stopped to look up other shops on the internets and to cut a long story short decided to really 'fuck that', and just bought it online. Last time I bought something just up the road they wanted $50 or more for delivery anyway, with no indication of when it might arrive other than the day (i'm often home but i don't like being tied home unnecessarily). I ordered this from Sydney, $130 below retail, $40 delivery and within 5 hours someone had called to say it would arrive between 11 and 2 the next day (obviously it isn't coming from Sydney). Assuming it arrives that was a much easier experience.

On the plus side I discovered an independent hardware store on the way home - and one somewhat closer too. That'll give me another option when I want to avoid Bunnings. Although the manager warned me not to leave my bike outside for the thieves which frequent the area - unfortunately I missed the opportunity to suggest a bike rail which are sorely lacking in almost all of Prospect (so much for their 'sustainable vision') - there isn't even any parking control signs to lock your bike to ...

So on Sunday I also made up a batch of Tomato sauce. I'm fairly happy the way it turned out although I probably should've reduced it a little more as it is a touch runny. But in the end I simply lost patience after cooking and simmering for more than 6 hours, and it should do. I couldn't find the recipe I used last time and since I had a bottle of Ezy-Sauce I'd bought more than 5 years ago I thought it was about time I used that. There's a recipe on taste.com.au as 'Grandma's Tomato Sauce' but it's really just the same as the one on the bottle so I followed that instead. Unfortunately I forgot to shake the bottle before opening it so I stuffed it up a bit, but I made it up with a few more crushed cloves, black pepper and a huge pile of chillies. It's got a nice little kick to it although perhaps a bit sweet for my taste. I don't even use it much myself but it is absolutely scrumptious on a bit of burnt snag in a fresh slice of bread and perfect for hot dogs - so the 4.5L I made should do me for a while.

I'm also a couple of weeks into making some fermented 'tobasco' sauce. I have some 'Cayesan' chillies which have gone ballistic and the chillies were so heavy one of the branches broke off. I didn't even pick all of them off (I managed to save the branch by tying it up) and ended up with about 750g of green chillies. Ideally i'd use red ones but since I had a whole lot of fresh chillies at once I thought i'd see how a green version of the fermented sauce works out. Still a few weeks left on that. Now the Cayesan's are starting to ripen I'm getting a few red ones each day and waiting till I get enough to make something with.

My habanero plants are starting to produce fruit at last too. They taste fantastic as always - I put a couple of green ones in the tomato sauce, I love the almost apple-like sweetness followed by the searing heat. I only just finished the last of the big crop I had 2 years ago and I finally have some in the ground rather than just in pots so I'm hoping for a crap-load again. Although i've had the occasional problems with pests on the ones in pots in general the pests have been under control better this year, and that's perhaps because i'm growing a bigger variety of stuff and it's all a bit healthier with all the rain we've had.

I've finally levelled off the lawn at the back and well on the way to having it full of lush grass. After 3 odd years of having piles of crap, dirt, and general mess it's a relief to have it almost done. It's a pity I don't have a bit more room for horticulture but I think I have the balance about right given the layout of the yard. There's always pots I guess - but I have heaps of those already and they take a lot more work and the plants usually don't grow as well.

Oh, I got my BeagleBoard polo shirt and Beagleboard-XM yesterday evening, at long last - a bonus from working on GSOC 2010. That took a while to arrive. I haven't got it working yet because it uses a different gender for serial cable and I don't have the right PSU and all my cables are in a box behind other boxes, but hopefully i'll get it working soon. It looks like a tidy little unit anyway. I've barely been touching the computer for the last few weeks other than to read a few blogs and keep up with the news (there's been a lot of it lately), but hey, it's summer so that stuff can wait for the long rainy days of winter.

Tagged biographical, horticulture, house, rants.
Newer Posts | Older Posts
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!