About Me
Michael Zucchi
B.E. (Comp. Sys. Eng.)
also known as Zed
to his mates & enemies!
< notzed at gmail >
< fosstodon.org/@notzed >
Gran Turismo 5
I finally got over being angry at Sony and played a bit of Gran Turismo 5 on the weekend (it had nothing to do with the discovery of the broken PS3 security over the same weekend - but yay for that).
In a word: unfinished.
Compared to GT3 - which I consider the pinnacle of the series - there are so many graphical issues, slow loading times, poor models and frustrating game-play mechanics.
- Models
- Some of the models are really bad - almost certainly direct PSP imports. If as reported they spent 6 months on modelling each of the 200 'premium' cars then it was mostly a waste of time - you can barely see much of the interior if you do drive from inside, and there's little reason to anyway. The sounds are pretty weak too.
- 1080p@ under 50 fps
- The sharpness and crispness only shows up how plain some of the models and race-tracks are. I'd rather 2x AA on 720p, so at least the frame-rate could keep up and you could improve the textures (also allow simpler models). It makes the screenshots look bare, and the moving game less than impressive.
- Screen tearing
- Tearing tearing everywhere. Trying to do too much AA or too high a resolution is all lost as soon as tearing rips the screen in two. I've definitely seen worse tearing ... but that was in games that weren't gran turismo.
- Pop in
- There's just too much pop in. Occasionally a whole road's worth of shadows pops in while you're driving on it too.
- Shadows
- Shadow mask is way too low a resolution, as are the environmental maps used for reflections (additionally they have low temporal resolution). Again this looks particularly bad at 1080p.
- Volumetric mist/dust
- I think the dust and mist actually looks quite nice - at least you can't see it as a bunch of layered animated sprites. Except at the edges that is - where it looks like total shit. Nice algorithm, simply unfinished or simply unsuitable for such a high screen resolution.
- Slow/inconsistent menus
- Obviously different teams/individuals work on every separate menu system in isolation as they still have the problem where they all look and work differently.
- OK pox
- Way too many 'are you sure' 'ok?' boxes and so on. Win some money - click ok. Win a car - click ok. Then go to the ticket, click on it, click ok to get it added to your garage. Click ok after it shows it driving up in the dark. Click ok/cancel to 'use it now'. I mean ... really? Its even worse when one considers this is the only game these guys have been working on for well over a decade, and yet the basic navigation mechanics haven't progressed at all - or gotten worse!
- Loading slog
- This is supposed to be a `next gen' console, why does it still have so much loading delay? Loading a track is one thing but almost every action from going to the car settings to the options to changing menu's triggers another load. I don't consider a 10G `install' a viable option to mitigate this poor design and reportedly it's hardly greased lightning with it anyway. These sort of delays had a reason on the PS2 with it's tiny RAM but there's no such reason on the PS3 - only excuses.
I really can't see where all the time went to come up with this result. Although the modelling and engine coding no doubt took a long time, all of these things can be done in parallel and they would have been playing with the PS3 as early as Evolution Studios anyway (Motorstorm devs who are now polishing their 3rd game), so it can't be the coding. There doesn't seem to be any obvious delay on the 'critical path' that could have caused such a long release cycle for such an unfinished game. Apart that is from poor management. Sony for letting them get away with it and Polyphony for fucking around too much. I'm not counting the over-detailed modelling here either, i'm talking about the engine and menu systems.
It just looks like Sony demanded that they stop fucking around and release it by xmas 2010, perhaps because the PS4 is coming up, or simply to ensure the PS3 hadn't hit it's inevitable decline.
It's only beta quality - a decent beta, but still lacking all the polish we should all expect from such a triple-a game. If I were to review it I wouldn't give it more than 60% given all of the above (a low 'C'/'Pass' grade), and the fact the game itself isn't much other than `GT4-HD'.
If it was a launch title then there would be good excuses for all of these problems, but it isn't.
Branchy code
This week I was looking at feature detectors - and one of those I was trying is FAST. This is pretty much the definition of branchy code - a function which is a single if
statement of 2900 lines.
I didn't think it would be something that would map to OpenCL particularly well, but I was pleasantly surprised.
I simply took the if statement and wrapped it in a kernel which I call for every pixel, and added a simple list append function at the end (more on that below). With a bit of playing with the kernel work size I got it down to about 75uS for a 1024x768 frame at around 1000 output points.
I still haven't done non-maximum suppression or the like but it certainly lives up to it's name - it's damn FAST. I've been playing with SURF and others and even a partial implementation is licking 2000uS/frame. FAST seems to be very sensitive to noise and camera focus though, so i'm not sure I can use it - hopefully the non-maximum suppression will help.
GPU List Append
One problem with GPU coding is that it particularly likes having large well-defined data-sets to work with, and what I needed to do was just generate points beyond a threshold. In the past i've just had a separate post-process which 'reduces' the data, but that input had already been reduced and wasn't just a whole frame's-worth.
So I came up with something very simple based on atomics. I don't know whether it's the best solution but it seems to work ok in this case.
kernel void somekernel(..., global uint *indexp, global float *posp) {
// do stuff
if (result > threshold) {
uint index = atom_inc(&indexp[0]);
if (index < 1024) {
posp[index] = (float2) { x, y };
}
}
}
Anything that then uses the 'index' count just has to limit it to the maximum (e.g. 1024) and away it goes.
Leave
As of about an hour ago i'm now on leave until next year. Yay. I've been hanging out for it for the last few days and it's been pretty difficult keeping the motivation level up (despite working on some interesting stuff). Yard should keep me busy and i'll probably get into some hacking before long too. Maybe i'll finally play GT5 and 'upgrade' the PS3 firmware. But for the rest of the day it's time for SFA. And maybe a brewski or twoski. I've got one more wort busy in the cellar, and once i've bottled that next week i'll have over 100 longnecks - and even if that isn't enough for the whole summer it should be a good start.
Stuff
ImageZ
When I get a spare time and a bit of inspiration I hack bits and pieces and work on making this usable for what I need it for. I'm using it now for screenshots - i really like the save requesters and simplicity of the interface. I added a 'resize image' function which works fine, but I was having trouble with getting the scrolled pane to recognise it's content had changed size. I have something that mostly works but I got sick of fighting with it so left it for the moment.
I also thought about the tool mechanics - the affine tool doesn't really work as a per-application tool, and i'm thinking of making it, and all of the tools per-window instead. I don't have a good feel as to whether this is a good, bad, or insignificant move. At the least it might help me clean up some messy input routing - although there's a few hours of menial work getting that sorted out.
VideoZ
Not much progress here, still thinking of what to do. I setup a webcam input using V4L4J - although i'm not terribly happy with the way it passes data around (it takes 3 copies to get it onto the graphics card). I also started looking into some OpenGL 3.x stuff (no fixed function pipeline) but my heart wasn't really in it and I didn't get very far - although for the most part it looks pretty straightforward. Thinking I will just try to get a couple of video input streams going and work on some simple cross fading and swipe functions to start with and that will let me play with ideas such as timebase correction and chroma keying and the like.
But I just haven't gone anywhere near sound - if sound on linux is fucked, it's even more fucked in java. That will be a hassle.
Leaks
Well good to see at last Rudd stating the obvious - the leaks are the ultimately the yanks fault for not being able to handle security properly. Although I really just think he's playing the astute politician here - partly blow back against the US diplomats for being a bit blunt about him. And also I would suggest to play up to the local feel and to contrast with the totally nonsense statements from the PM and the AG. But she deserves all she gets in that regard, what an utter idiot. Still, his statements are the only clear ones by a member of the ruling party that they might have another purpose of being in power other than serving the interests of the USA.
Booze n Fat
Well the experiment continues, although i'm drinking in moderation again (work bbq today aside!), and eating in moderation too. I was surprised at how quickly I got used to eating basically half or less of what I was eating before - and feeling less hungry than I had been. My scales are a bit crap and it varies so much depending on when one weighs oneself but i'm shedding in the order of a kilogramme a week. Mood has been all over the shop though - and when I do drink a bit it gets pretty low - which is another reason to slow it down. And sleep hasn't been great either. Although I think in general although i'm getting less sleep i'm normally a bit more refreshed - but that's so subjective it's almost impossible to tell.
Yard
I have a good batch of leave coming up so i'm starting to think about the yard. I should be doing the shed but i'm putting that off ... but in the meantime I have some garden beds, lawn, and paving to work out. The more I think about it the more vegetable growing areas I want - they are just so much more useful than grass or flowers. Although I'm also aiming to get another citrus in - a tart lime like an african lime would be my choice. I have one decent spot left for that but i'm growing a tomato there at the moment ;-)
I had a good think about it yesterday and think i've settled on a plan - need to move a few tons of dirt and a pile of metal out the way to lay the wall foundations though (which is why I ended up hacking yesterday rather than going further). My chilli plants are going pretty well (already 10cm fruit on the cayenne pepper plants), tomatoes are looking good, i'm using the mints daily in dinner and last years basil has started growing again. I also pruned most of the roses right back since they have finished flowering for now and had too many long stems which were falling over. Citrus is also sprouting like crazy - maybe i'll finally get some fruit again this year. I also planted a few seeds in pots for a bit more variety - although it's a bit late in the year to start with seeds so i'm not expecting a lot. If the rain keeps up this year though it should be a great growing season.
A bit wet.
Got out with an umbrella during a lull in the downpour and took a couple of shots of the road outside.
My small rainwater tank is already overflowing too - it was well under 1/2 full this morning (it's 2100L).
Well that looks like about it for a couple more hours anyway. Fortunately the forecast hail held off - at least for now.
Update: We ended up with 70mm of rain in about 18 hours, which although not unheard of isn't a typical spring storm around here.
Pretty clear cut ...
From the 1st of december, a statement which shocked me somewhat:
“I absolutely condemn the placement of this information on the WikiLeaks website,” Gillard said today. ‘It is a grossly irresponsible thing to do and an illegal thing to do.’’
From today:
But asked directly what Australian laws had been broken by either WikiLeaks or Assange, Ms Gillard said the Australian Federal Police were investigating."The foundation stone of it is an illegal act," Ms Gillard said today.
But the "foundation stone" was the leaking of the documents to the website, not the publishing of the cables.
"It would not happen, information would not be on WikiLeaks, if there had not been an illegal act undertaken," Ms Gillard said.
Obviously the spinmeisters told this lawyer (who should have known better) that calling things illegal without basis probably wasn't terribly wise. Now of course there will be the plausible deniability that she was only ever talking about the 'foundation stone'(?) and not any action of wikileaks ... although to suggest that from her first statement above really beggars disbelief.
Well at least the opposition - deceitful hypocrites though they are - are making some noise about this, finally spotting what an outrageous statement it was for the country's leader to make. The greens could be a bit more vocal mind you (although that may simply be the media's tendency to ignore them).
Update: The ABC has just published an open letter from more than a few influential people who aren't too happy with the bend-over behaviour of the supposedly sovereign and free democratic government of ours.
The green, green grass of home.
We've had some spring weather this year which seems more like the spring weather of 20 odd years ago than it has been of late. Warm thunderstorms, short but heavy downpours, a bit of humidity and warmth. So perhaps it wasn't just nostalgia for what it used to be like making it feel like the weather has been more dreary the last few years.
The heat and damp has sent the grass totally boonta. I mowed that 2 days ago and it's almost ready again for another haircut.
The weekend.
Another weekend upon us. And it's a scorcher, headed for 36 today. The evap is getting a good workout and it's really nice inside. I had plans to go for a ride to a mate's near the beach and other sorts of things but after last night I don't feel up to much so I might just sit inside and do sfa. Spent the morning watering the pot plants and the garden (after a massive down-pour on Wednesday it's amazing how quickly things dry out in the heat), mowing the lawn, fertilizing, and constructing a frame to support the tomatoes. I also had the bright and in hindsight totally obvious idea of keeping the mint plants out of the full sun so they develop large soft leaves rather than the small hard ones.
And so to some thoughts of the moment ...
Just because I like to see microsoft suffer i've been watching what happened with their latest phone release ... and by what limited data is available it sounds like it was a bit of a flop. Ahh well and good. Over a few beers last night I was chatting to a mate about their weird-arse advertising campaign and I think I worked out what they were trying to do. Someone looked at some charts - one showing the total population of the world, and the other showing how many people already owned a jesus phone or alternative. Then they decided they'd try to sell to the biggest slice of the pie by making fun of the minority rather than trying to steal those customers. At best, the adverts seem to be aimed at those of us who think jesus phone users are wankers - but we're just not interested in the technology itself, so you're not going to sell anything to us (and incidentally I don't see any problem with fishing a phone out of a pissing trough - these things get dropped all the time and it's not like you're going to leave it there). Then a big section of the market will never be interested in such devices because they're simply too complex and although they might appreciate the adverts (if they understand them at all) they are not potential sales. So that leaves the other section of the market - call them the 'aspirational smart-phone' buyers - who want to `grow up' and become jesus phone owning wankers themselves one day. And you're not going to sell to them by pointing out they want to be jesus phone owning wankers. But apart from all that if people are going spend a bit of their hard-earned on a 'cool gadget' they want it to be 'cool'. And that has never been microsoft apart from a pretty small band of retards who don't know any better.
Despite efforts to shut it down, wikileaks seems to be soldering on. Good thing. The media coverage of their latest spill, particularly here in Australia where we expect things to be a bit better than the US, has been utterly appalling. Apart from a couple of alternate viewpoints published on the ABC blogging site (aka 'the drum') the media (and all TV channels) have been in one voice - roundly condemning wikileaks and running full fisted with the US state department line of how terrible it all is and how illegal it has to be. And the australian pollies have all but given up the idea of australian sovereignty with their limp wristed kow-towing to the USA, basically offering an australian citizen up for whatever the yanks want to do to him. Disgraceful gormless arseholes - I can't imagine any of the actual cables that might mention Australia could paint a worse picture of their gutlessness than they're doing by themselves on prime-time TV. At most they're probably worried that Australians might realise how irrelevant their country is in the wider scheme of things and that we don't actually 'punch above our weight' in any sense.
Oh and the whole 'wikileaks is just a CIA/mossad conspiracy' thing is exactly the sort of 'grass roots' conspiracy you'd expect the CIA to use to discredit them. It does sound plausible at first, but ... then a bit of common-sense prevails.
And apparently the fishing fleets have run out of new fisheries to plunder. That is utterly mind-blowing when one considers just how big the oceans are - they are 2/3 of the surface of the planet. Even with centuries of logging there are still areas of un-touched rainforest - which has only ever covered a fraction of the land surface.
OpenCL Images and Arrays
Curiosity got the better of me and I ran a bunch of tests on separable convolution filters using opencl image types compared to float types.
Not surprisingly perhaps the card seem to be designed for graphics workloads more than computational workloads.
Tests
The test is running a 31x31 separable convolution kernel over a 1024x768 image. Implemented using two passes - a horizontal and then vertical convolution.
The image version is also executed over normalised unsigned byte data as well as float data (4x channel). The array version only uses single-channel float planes.
In both cases a single thread calculates each output pixel. Timings are from the NVidia Compute Visual Profiler and the card is an NVidia GTX 480.
Array version
For the X convolution it copies the kernel and 128 elements of the source array to local memory - which is then shared amongst the 64 threads in the work unit.
For the Y convolution this makes things slower because of the way it accesses memory, so it just relies on memory coalescing for the accesses and also for the memory accesses to be interleaved with processing to hide the latency.
The code must manually handle the edges - it just clips to the boundary.
Timings: X=192μS Y=400μS Total=592μS (per plane) 1776μS (3x planes).
I also tried changing the array types to float4 and processing 4 packed planes at once. This pretty much scaled linearly - i'd expected it to scale better than linearly.
Timings: X=820μS Y=1460μS Total=2280μS (4x planes) 570μS (per plane)
Image version
The first image version was a very simple implementation that just reads pixels directly from the source image. Although the data is stored in UBYTE RGBA format it only calculates 3 channels (4 channels can be done for <10% extra time). The X and Y convolution code is more or less identical save for the direction it works in.
Timings: X=618μS Y=618μS Total=1236μS (3x channels) 1269μS (4x channels)
A pretty clear win - but this is only with octet data.
I then tried using floating point as the storage, and things weren't so rosy for the image version.
Timings: X=1824μS Y=2541μS Total=4365μS (3x channels)
So I started moving some of the optimisations required for the array version into the image version. First I just copied the kernel to local memory first in both X and Y versions. Pretty major improvement.
Timings: X=1176μS Y=2117μS Total=3293μS
And finally I added the code which copies 128 elements of the data to local memory. To do this for the Y convolution I also had to change the local work size to be 64 in Y rather than X - and this probably explains why it ran faster since it creates more work groups.
Timings: X=770μS Y=732μS Total=1502μS
What is strange though that this version is slower on the byte data. I guess the extra complication and overhead of copying stuff locally slows it down too much.
Timings: X=712μS Y=731μS Total=1444μS
And if I remove the local copy of the image data the timings improve further.
Timings: X=677μS Y=725μS Total=1402μS
But they are still behind the naive version for BYTE data.
Conclusions
Storing data in array buffers, with properly written code can achieve similar performance to image storage - even though they have radically different data paths and cache characteristics. Array types can process individual planes separately - but can also process vector/multi-channel types fairly easily too.
Although a trivial implementation worked well for 32-bit backed pixel types, non-byte image types require almost identical treatment to the array based implementation in order to gain good performance.
Even though it might not be the most efficient, the same code can also be executed for different image storage types - the image read/write methods just use floating point values in registers which is the most convenient for the arithmetic (and tuned for the GPU). For the array code it would require completely different code for each data type - e.g. normalising to float or using fixed point arithmetic.
In short, the NVidia GPU seems optimised for accessing data through image types. And particularly for typically screen-sized images stored in 32 bit packed format. Not so surprising for a graphics card.
It would be interesting to compare to the ATI card I have - I suspect it would be pretty much a similar result and perhaps even more so, since it doesn't have have any L1 cache for array accesses. But profiling that is somewhat more work and I can't be bothered right now. I have also yet to try it with single-channel images.
Update Actually I need to know about single-channel images so I tried that and it was a bit disappointing for BYTE data: X=593μS Y=600μS Total=1193μS, the texture cache probably stores all channels anyway and for all I know the image is being stored in memory at 32 bits per pixel. For the float data using the optimised version things are somewhat better - X=263μS Y=301μS Total=564μS. And bizarrely now the optimised version is faster for the BYTE data as well - X=242μS Y=295μS Total=537μS. Presumably this is because the smaller amount of processing isn't able to hide the memory latency but the manual caching is (and the smaller local array sizes are less of a limitation for concurrency - the minuscule local memory is the main bottleneck for optimising OpenCL).
I'm running into some memory stress for work and if the byte data were stored packed it might be a big benefit here - right now i'm using float arrays. Using images might simplify some of the code too, although it looks like the more memory heavy stuff will still need to use local memory - although at least in this example that extra work would make it run faster than array types.
Copyright (C) 2019 Michael Zucchi, All Rights Reserved.
Powered by gcc & me!