About Me

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as Zed
  to his mates & enemies!

notzed at gmail >
fosstodon.org/@notzed >

Tags

android (44)
beagle (63)
biographical (104)
blogz (9)
business (1)
code (77)
compilerz (1)
cooking (31)
dez (7)
dusk (31)
esp32 (4)
extensionz (1)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (459)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (231)
java ee (3)
javafx (49)
jjmpeg (81)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (10)
opencl (120)
os (17)
panamaz (5)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
players (1)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
vulkan (3)
wanki (3)
workshop (3)
zcl (4)
zedzone (26)
Thursday, 24 June 2010, 22:01

Beanz Meanz ...

I was going to make some refried beans. So I soaked some red beans overnight, and then put them in a pot to simmer. And simmer ... and simmer.

After about 8 hours I ended up with this brown stuff. I found out later that I should've also added some salt and onions and garlic to add some more flavour, but I missed out this time.

I realised I didn't have enough onions to do the refried beans, so instead I made a variation on a Mexican breakfast dish "huevos tirados" as suggested by an old Mexican workmate Frederico.

Onion, bacon, eggs, chillies of course, together with the cooked beans, salt and pepper, all fried up with some oil. Topped with some fresh mint and basil collected from the garden. Even with the relatively bland unadulterated beans as a base it turned out quite tasty.

Frederico also suggested a pressure cooker and I might try that next time if I get hold of one. They're not terribly common here so tend to be a bit expensive, but mum said she might have an unused one in the shed.

I ended up cooking way too much of the beans and subsequently had beans for a few meals in a row, which was probably a bit of a mistake in hindsight. Emphasis on the hind bit.

Tagged cooking.
Thursday, 24 June 2010, 21:47

Sourdough 0.2

I have this never-ending battle to try to bake decent bread. Results are very inconsistent and mostly quite poor, but one persists. The latest chapter has been sourdough, using naturally grown yeast.

I think my first attempt was a bit premature as the starter hadn't settled down properly yet and I only used plain wholemeal baking flour. For this second attempt I had a much more active starter, but hit some cold weather which adversely affected the final rising. I got quite an active proofing although it took all night to run, and after making the dough it rose fairly well although it took all day as well. But I think I made a mistake in punching down the dough and expecting it to rise again - after a very long time it barely went anywhere and I was starting to worry the whole lot would go off so I just baked it.

It might look the part, but it's about 1/2 of the size it should be. You could build roads with this.

I had trouble getting the right moisture level using the wholegrain flour too which seems a perennial problem when I can't remember how it was last time I did it.

Tagged cooking.
Thursday, 24 June 2010, 21:40

Cheese & Rosemary Biscuits

I did some midnight baking a couple of days ago, and tried making cheese and rosemary biscuits this time.

It's mostly butter and cheese. It was very greasy. I added more flour than the recipe called for too, although since I used cheddar rather than parmesan it probably had a lot more grease to absorb. It's not that I didn't have parmesan handy, but it was easier grating the cheddar.

I don't think my oven is the correct temperature. The recipe called for 6-8 minutes but after 10 they were still quite not yet done. I cooked them a bit more to brown them up.

Once they cooled down they were quite tasty, the rosemary sharpens the cheese taste quite well. I added some chillies too for a little bite in the after-taste.

They were a little crumbly and fragile so not really a great biscuit, although they do 'melt in your mouth'.

Tagged cooking.
Thursday, 24 June 2010, 21:33

Cheese & Chive Biscuits

I tried making some cheese and chive biscuits.

It looks like I added a bit too much garlic chives. And the food processor over-processed them.

And the garlic was a bit overpowering. The recipe called for oil and not butter, which seems wrong.

They look a lot better than they tasted. Very tough and too heavy, with a strange flavour. It was almost like a thick tough party-pie base. Not cheesy enough either, despite using plenty of parmesan.

Tagged cooking.
Thursday, 24 June 2010, 14:40

CLGLTexture2d

Hmm, bit of a slow and frustrating today although it felt like I did something by the end of it. I was up till nearly 4 but couldn't last it out to watch the Australian soccer game - pity, it sounds like it was a decent match. Just as well as some tree loppers came cutting trees around 10am and woke me up with the chainsaw and mulching machine. The fence and wall where my bedroom is seems to act like a wave guide for audible sound waves, so anything on the street immediately in front of the house is loud and clear through the closed window.

I was trying to play with using image2d_t objects in the OpenCL code to see how they work. I didn't get very far - not a lot of documentation and the whole api is still a little too new. I eventually found one simple but show stopper bug, but there was also other puzzling behaviour such as not being able to use mono-channel textures. I think I filed the first bug for JOCL on the jogamp bugzilla with that patch which either implies it's quite stable, or nobody else is using it. I haven't had much luck building JOCL so I might have to wait for a new build to include the patch I sent before progressing along that track.

I guess i'll leave that stuff for now and just try to get the basic algorithm working. I'll be happier once I can remember the bits of API i need without having to lookup every single change so I don't have to spend so much time mucking about with the setup.

Tagged hacking, opencl.
Thursday, 24 June 2010, 03:10

This call may be monitored ...

For once I asked that the unsolicited call I got from someone trying to sell me something that I didn't want the call monitored.

And they said well we can't continue as it has to be monitored ...

Well fuck 'em then, not like I was going to change my phone account anyway as I know i'm on the cheapest one and I never make calls. Pity everyone didn't do it, hardly a way to sell something if they can't talk to you, they'd quickly change their tune.

Telstra btw ... actually they shouldn't even be calling me, last time they asked me (a very long time ago) I said I didn't want to know about 'special offers' anymore - probably just trying to drum up more business with the NBN deal.

Update: I'm not entirely sure of this, but it seems from some whirlpool discussions that any call involving a potential financial transaction must be recorded for legislative requirements. Still, they should not have been calling me anyway, and as soon as Internode offer a way to switch from ADSL2+ to Naked ADSL2+ in a way which I wont potentially lose an ADSL capable phone line (crowded copper issues), then the big T can get the big F..

Tagged rants.
Wednesday, 23 June 2010, 11:22

convert_uint4

Got the main algorithm i'm working on converted to run on the GPU today - it doesn't seem to be producing the correct results yet, but it should be doing the right steps. Speed is a bit disappointing so far too. But on linux you're flying pretty blind as to what is going on on the card so I don't know if it's merely an uncoalesced memory access problem, not mapping well to the multiprocessors, or something else. I guess I should work out the FLOPS and see if i'm just pushing any limits.

Because local memory is so limited i'm storing data packed in bytes (I need to re-read the same section of data many multiple times), but the unpacking into vectors is pretty inefficient. Even the built-in functions provided aren't as efficient as they could be:

  uchar4 *src;
  uint4 work = convert_uint4(src[0]);

Compiles into more code than:

  uchar4 *src;
  uchar4 v = src[0];
  uint4 work = ((uint4) { v.x, v.y, v.z, v.t }) & ((uint4)0xff);

Which surprisingly compiles into more code than:

  uint *src;
  uint v = src[0];
  uint4 work = (uint4) { v.x>>24, (v.y<<8)>>24, (v.z<<16)>>24, (v.t<<24)>24 };

The latter would be pretty nasty with SIMD (without an unpack or shuffle at least), but with at least the AMD GPU it is only a half dozen shifts and and's.

Changing a single convert_float4 to a variation of the last one above gave me a 20% speedup ... which probably means i'm just doing it the wrong way to start with ... If I had more local memory I'd convert to a more convenient working size as I loaded it, but alas I don't have enough for this algorithm.

Next i'm going to look at using image objects to access the memory instead. This gives automatic format conversion and caching, and removing the need to manually cache stuff in local memory allows for more flexibility in parallelisation. Unfortunately JOGL doesn't seem to expose the basic image routines from OpenCL without using OpenGL as well which I was hoping to put off for a bit.

I guess I should really try and get a version that at least gives correct results too.

Tagged hacking, opencl.
Tuesday, 22 June 2010, 08:46

vloadn

Well I finally have a lowly OpenCL capable card to play with. Just an ATI HD 5770, mostly because of the power budget. Unfortunately that was still enough to make an almost silent machine into a loud fan, and there's another card in where which almost completely covers the fan inlet on the graphics card, so it might have to be removed.

I have been using a simple bit of code to play around with to learn how the device accesses memory and processes jobs and discovered a few things along the way. The code does a simple debayering, converting 5 channel data into 5 separate data planes.

vloadn seems to be something to avoid. It appears to treat the input as unaligned, even though the documentation implies it should be aligned. Perhaps I need to use an aligned directive on the parameters too ... but an easier solution seems to be just to change the source datatype to a vector type and just index it as an array.

Just by changing the code to use a vector array type versus a vload I got a 24x speedup(!).

Some other less obvious results ... If I remove the output of one or two of the channels, the code runs nearly 50% slower. Running on the CPU (quad phenom II something or other) using the same OpenCL code via AMD's CPU backend is about 8x slower than the GPU. I wonder what hand tuned SIMD code could manage ... given they have comparable power profiles.

Splitting the job into multiple parts effectively - both `horizontally' to allow coalesced memory access, and `vertically' to allow greater concurrency - seems to be a bit of an art, and no-doubt very architecture dependent. And unfortunately critical to getting better output.

Horizontally I assign a work-item to each column of 8 output bytes, for 16 bytes of input, across 128 work items. I think that should be an optimal memory access pattern.

Vertically I'm sticking to powers of 2 since the algorithm needs it, but it seems splitting the local work-groups into sets of the number of compute units (i.e. 10 on this card) seems to work better. But I'm not really clear how the global/local work dimensions really maps to the hardware once you get beyond the trivial case of single jobs.

I'm not sure if i'm interpreting the disassembly correctly, but the processor appears to be more VLIW than SIMD. Each of the 5 channels seems to execute independent instructions on independent data. I guess this should allow it to execute scalar code better, but it must come at a pretty significant cost to die space and power. I wonder if this is also why they still clock relatively slowly versus something like the CELL.

My final code equates to about 12000 decodes per second of a 1024x768 frame, which is more like what I was expecting - my first cut was doing about 400 which was obviously way out. I'm not sure if using image accessors rather than arrays would be a win either - it's a bit fiddly to fit it in with this code. It might be though, since I think you get format conversion 'for free' rather than requiring a bunch of shifts and fart arsing about.

Tagged hacking, opencl.
Newer Posts | Older Posts
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!