About Me

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as Zed
  to his mates & enemies!

notzed at gmail >
fosstodon.org/@notzed >

Tags

android (44)
beagle (63)
biographical (104)
blogz (9)
business (1)
code (77)
compilerz (1)
cooking (31)
dez (7)
dusk (31)
esp32 (4)
extensionz (1)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (459)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (231)
java ee (3)
javafx (49)
jjmpeg (81)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (10)
opencl (120)
os (17)
panamaz (5)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
players (1)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
vulkan (3)
wanki (3)
workshop (3)
zcl (4)
zedzone (26)
Wednesday, 23 June 2010, 11:22

convert_uint4

Got the main algorithm i'm working on converted to run on the GPU today - it doesn't seem to be producing the correct results yet, but it should be doing the right steps. Speed is a bit disappointing so far too. But on linux you're flying pretty blind as to what is going on on the card so I don't know if it's merely an uncoalesced memory access problem, not mapping well to the multiprocessors, or something else. I guess I should work out the FLOPS and see if i'm just pushing any limits.

Because local memory is so limited i'm storing data packed in bytes (I need to re-read the same section of data many multiple times), but the unpacking into vectors is pretty inefficient. Even the built-in functions provided aren't as efficient as they could be:

  uchar4 *src;
  uint4 work = convert_uint4(src[0]);

Compiles into more code than:

  uchar4 *src;
  uchar4 v = src[0];
  uint4 work = ((uint4) { v.x, v.y, v.z, v.t }) & ((uint4)0xff);

Which surprisingly compiles into more code than:

  uint *src;
  uint v = src[0];
  uint4 work = (uint4) { v.x>>24, (v.y<<8)>>24, (v.z<<16)>>24, (v.t<<24)>24 };

The latter would be pretty nasty with SIMD (without an unpack or shuffle at least), but with at least the AMD GPU it is only a half dozen shifts and and's.

Changing a single convert_float4 to a variation of the last one above gave me a 20% speedup ... which probably means i'm just doing it the wrong way to start with ... If I had more local memory I'd convert to a more convenient working size as I loaded it, but alas I don't have enough for this algorithm.

Next i'm going to look at using image objects to access the memory instead. This gives automatic format conversion and caching, and removing the need to manually cache stuff in local memory allows for more flexibility in parallelisation. Unfortunately JOGL doesn't seem to expose the basic image routines from OpenCL without using OpenGL as well which I was hoping to put off for a bit.

I guess I should really try and get a version that at least gives correct results too.

Tagged hacking, opencl.
Tuesday, 22 June 2010, 08:46

vloadn

Well I finally have a lowly OpenCL capable card to play with. Just an ATI HD 5770, mostly because of the power budget. Unfortunately that was still enough to make an almost silent machine into a loud fan, and there's another card in where which almost completely covers the fan inlet on the graphics card, so it might have to be removed.

I have been using a simple bit of code to play around with to learn how the device accesses memory and processes jobs and discovered a few things along the way. The code does a simple debayering, converting 5 channel data into 5 separate data planes.

vloadn seems to be something to avoid. It appears to treat the input as unaligned, even though the documentation implies it should be aligned. Perhaps I need to use an aligned directive on the parameters too ... but an easier solution seems to be just to change the source datatype to a vector type and just index it as an array.

Just by changing the code to use a vector array type versus a vload I got a 24x speedup(!).

Some other less obvious results ... If I remove the output of one or two of the channels, the code runs nearly 50% slower. Running on the CPU (quad phenom II something or other) using the same OpenCL code via AMD's CPU backend is about 8x slower than the GPU. I wonder what hand tuned SIMD code could manage ... given they have comparable power profiles.

Splitting the job into multiple parts effectively - both `horizontally' to allow coalesced memory access, and `vertically' to allow greater concurrency - seems to be a bit of an art, and no-doubt very architecture dependent. And unfortunately critical to getting better output.

Horizontally I assign a work-item to each column of 8 output bytes, for 16 bytes of input, across 128 work items. I think that should be an optimal memory access pattern.

Vertically I'm sticking to powers of 2 since the algorithm needs it, but it seems splitting the local work-groups into sets of the number of compute units (i.e. 10 on this card) seems to work better. But I'm not really clear how the global/local work dimensions really maps to the hardware once you get beyond the trivial case of single jobs.

I'm not sure if i'm interpreting the disassembly correctly, but the processor appears to be more VLIW than SIMD. Each of the 5 channels seems to execute independent instructions on independent data. I guess this should allow it to execute scalar code better, but it must come at a pretty significant cost to die space and power. I wonder if this is also why they still clock relatively slowly versus something like the CELL.

My final code equates to about 12000 decodes per second of a 1024x768 frame, which is more like what I was expecting - my first cut was doing about 400 which was obviously way out. I'm not sure if using image accessors rather than arrays would be a win either - it's a bit fiddly to fit it in with this code. It might be though, since I think you get format conversion 'for free' rather than requiring a bunch of shifts and fart arsing about.

Tagged hacking, opencl.
Monday, 21 June 2010, 04:51

On thin and fat clients

Like chicks (and blokes too for that matter) it seems even the thin ones are now fat.

Moving all your apps to a browser is all well and good, but they seem to be getting bigger and slower - and no matter how 'nice' they might be in relation to other 'web apps', they just don't stack up against 'thick clients'.

Case in point: google mail.

It actually makes a pretty shitty mail client, and it only gets worse once you're on a few mailing lists with multiple conversations going on. It's quite difficult - and rather slow - navigating conversations and everything needs a click-and-wait. Not to mention the single 'modal' message editor. And is it just me or has it become a lot slower lately? Maybe it's because i'm using a different machine with higher resolution, but it's taking about a second to do things like delete one message. It took 3 seconds to open the inbox after I was looking at 'all messages'. I've only got 7K messages in total, hardly enough to blink at.

I used to be able to easily read a few hundred mails a day with evolution, and there's no way I could get even close to that using gmail; and still get anything else done. So one tends to skim the subjects a lot more rather than scanning the message - and you can often miss important things doing this.

So now I have firefox using more memory than evolution did, handling a tiny fraction of the mail in a less usable manner, more slowly, and using more cpu time, much more network bandwidth, and more power. That's progress?

Scribd

I noticed that 'scribd' has moved from it's awful flash app (which meant I never bothered to follow scribd links) to an awful HTML app (which means I will continue not to follow scribd links). Try overriding your fonts and loading a page - weird things happen that make it completely unreadable. Absolute winner for something designed to share documents, wouldn't you think?

Well that's if it'll even let you read anything - I get the following on one of my computers when I go to the front page (or click on the big link it gives me on this page).

Even if it worked it is such a horrid thing anyway - I have a perfectly serviceable (if a little flawed) PDF viewer on my computer, why not just give me a link to a PDF? If you want to share documents, just put a PDF somewhere people can see it, you don't need these horrid online viewers.

Or just use plain HTML without all those shitful scrolling boxes and other useless clingons which make it harder to use.

facebook

Hmm, i got suckered into using facebook again - since none of my 'friends' ever mail me any more there doesn't seem to be any other way to stay in touch or talk to them. Which sucks. The shallowness and emptiness of it all rubs me up the wrong way, like Twatter(tm) every utterance needs to be made in tiny truncated sentence form. Also I'm not really sure I want to know what teenage nieces and nephews are up to in the sort of detail they tend to put online. And I find it deeply disturbing that visiting totally unrelated websites now have dynamic 'facebook social' crap coming up if you were ever logged into facebook with 'keep me logged in'. It doesn't seem to be something you can 'log out' of.

And although I know google tracks everything you do to the same extent, knowing that, and being shown blatantly that a visit to a news story is being linked back to your account - is another matter entirely.

Tagged rants.
Monday, 21 June 2010, 04:30

The Wall

Current state of my back yard.

The wall will only be 4 high, the other bricks are there to get them out of the way.

I've finally just about levelled off the main area which will be planted with buffalo grass - the grass partially visible grew in under a season from a few cuttings so it wont take long. The sand left-over from under the long-removed pool will top-dress the grass.

Looking down the yard, on the left is a large garden bed - which will be herbs and vegetables. It gets full sun in winter and most of the day in summer.

Behind that - under the Golden Rain Tree is to be paving, made from the old bricks stacked up next to the house. Unless they end up being too ugly.

Tagged house.
Thursday, 10 June 2010, 13:17

GPGPU & OpenCL

Hmm, after getting a bit of stuff working with the CPU OpenCL yesterday I thought i'd better dig deeper into the GPU side of things. Kinda wish I hadn't. It's ... a bit ... messy. It's kind of like it wants to be a vector processor but can't make up it's mind. To get the best performance you have to know a lot more details about the internals (on the specific card in question) than you do on something like the CELL CPU, and it's more difficult to realise it. The minuscule local store could be a real bottleneck. I wish I still had linux on my PS3 to be able to try out some CELL code to compare, but fucking Sony did that in didn't they?

I guess it'll be a challenge.

Tagged opencl.
Wednesday, 09 June 2010, 14:48

Winter.

Damn, winter hit properly all of a sudden this week. Went to the pub on Monday night and spent most of the way riding home shivering a lot, and what was with some new winter gloves and a good riding jacket. Even resorted to wearing long thermal underwear to try to stop my feet freezing during the day, but with only limited success. It's not even really that cold in absolute terms - about 10C, but I seem to be feeling it more of late.

I got a fair chunk of retaining wall done a few weeks ago with the last of the sunshine, but have been very sedentary lately - the weather, the short days, eating oddly, sleeping irregularly, it all adds up. Always seem to feel extra flat around this time of year too and I seem to be even more extra-flat for various reasons. And overly anxious for no good reason.

I started looking at OpenCL this week for work at last. The basics look pretty simple and straightforward, and i've got some little prototype code working already, although it looks somewhat more involved to get the most out of it. The work synchronisation stuff looks pretty rich and relatively flexible and will hopefully be able match the problem I'm working on. I'm just using the ATI CPU implementation so far but will hopefully get some real hardware soon. I don't think i'm quite as excited about it as I thought I might be, but I guess there are other factors at play.

Been making a lot of noodles and flat breads lately. I've just about got a decent naan-ish bread worked out. Need to use a hotplate or frying pan to cook them - my oven just isn't hot enough (there's a thought, perhaps I could try the grill). Also tried making crumpets which sort of worked - only having wholemeal flour at the time I think that was a bit of a handicap. I think my attempt at making sourdough has spontaneously aborted itself though so I might have to reset on that - it's been too cold to keep it active amongst other errors. I'm not sure all this bread and noodles is really all that great though - well apart from being somewhat fattening while i'm otherwise being such a lazy arse, it seems to aggravate the sleep apnoea. Although that's so fickle it's hard to tell - particularly when you haven't had enough sleep.

I watched a bit of the oil spill cameras there for a while. Interesting certainly, and in many cases didn't really inspire confidence in the equipment they're using or the procedures they were attempting. One day I saw a sub spend an hour picking up a block and tackle from a snag-riddled box only to put it back where it got it from after finally untangling itself from all the snags. And cutting off the final bit of the riser (which I saw live) just created a torrent of hot oil which barely seems to have been throttled by the mounting of the LMRP - the later camera shots were from closer in which made the over-flow appear more controlled. I'm glad my sister is getting out of Florida - although she'd decided to leave before this anyway because she's sick of being paid 'minimum wage' $4.20 an hour plus tips in the only work she can find there. Although I'm not sure what she expects to find back here in oz, it seems to be headed toward the stupid-right as well.

On an unrelated note, I never really understood the service that facebook provides, and didn't particularly like it and do not use it much. But I think i've finally worked out why I dislike it. To cut a long babble short it pretty much just makes me feel lonelier, more alienated from society, and more of a boorish, boring loser than I might otherwise feel. What rubs it in even more is these are people I was at least once friends with; although most were some time ago. But even with those people I have very little in the way of shared interests or anything much even to talk about.

Hmm, this could be a long winter.

Tagged biographical.
Thursday, 20 May 2010, 13:38

Australian Bananas Suck

They do, they really do.

We get to eat tasteless over-firm shitty bananas so some dumb queensland farmers can protect their third-world crop from foreign competitors. It's just the public who suffers - subsidising their production and having to put up with sub-par produce.

Tagged rants.
Friday, 30 April 2010, 12:34

Shared Libraries n Stuff

Hmm, another long weekend followed by what turned into a long week. Damn I'm tired.

Well I gave up on building XBMC on the beagleboard for now, and I haven't had time to work on setting up a cross compilation environment yet either. Work is getting in the way too much, just too worn out to concentrate enough. I should try this weekend, although I need to get outside a bit too for some exercise and my own sanity.

I spent the rest of the long weekend past trying to grok shared libraries and thinking about how to get newlib into one for WoofƆs. I didn't make a lot of progress. It would be nice just to use something like Amiga libraries which are more like objects, including an instance pointer to all functions, but that just makes it too hard to re-use existing code. Partly because all the code also must be re-entrant and not reference .data or.bss, not to mention the object pointer. Easier to map/load, not so easy to port existing libraries. I just about had a handle on the whole elf-shared-library thing, but I was a bit too tired/distracted so it's still a bit up in the air.

Well at least one nice thing is it only takes 17 seconds(!) to do a complete build of newlib on my workstation, althoughI fear I will need a few iterations before I work it out. I was trying to enable _REENT_ONLY, but it seems to be an old option which is no longer supported and fixing it all up will be more hassle than it's worth, so I will probably just have to work out elf shared libraries to make any progress there.

It's not that I really need a C library just yet, but I was thinking about how to write the application server and how to hook system startup into application startup, i.e. accessing dynamic memory and so on, and so I was just thinking of killing two birds with one stone. Might have to leave that on the back burner for a bit, and try something a bit more modest. Actually I should probably put it all on the backburner for a wile, I have other things that need attention.

I didn't do anything on the yard at all last weekend - a bit hungover Saturday and then the weather turned. It's been a bit cold and crappy most of the week (in relative terms) which is always pretty draining. And the days are so short already ... (particularly when you get up at 11 or 12!)

Sleep. Damn. Eating poorly, sleeping worse. It adds up. My mouth has been a bit sore so I have avoided using the mouth splint every night - but boy do I notice if I don't use it. It's like the `Rain God' truck driver in So Long and Thanks for All the Fish who upon experiencing rain so heavy he imagines it makes no difference whether the wipers are on or off tries turning them off, only to discover it really does make it a lot worse. However to his dismay it doesn't get any better when he turns them on again. That's exactly what it is like with the mouth splint and sleep apnoea.

Well i'm glad I stayed in rather than going to the pub tonight - starting to feel a bit weird, and last weekend was too depressing - my drinking buddy went off to a show and I had a couple more at another pub, and what I can remember got pretty boring by the end.

Tagged biographical, os.
Newer Posts | Older Posts
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!