NEON yuv + scale

Well I still haven't checked the jjmpeg code in but I did end up playing with NEON yuv conversion yesterday, and a bit more today.

The YUV conversion alone for a 680x480 frame on the beagleboard-xm is about 4.3ms, which is ok enough. However with bi-linear scaling to 1024x600 as well it blows out somewhat to 28ms or so - which is definitely too slow.

Right now it's doing somewhat more work that it needs to - it's scaling two rows each time in X so it can feed into the Y scaling. Perhaps this could be reduced by about half (depending on the scaling going on), which might knock about 10ms off the processing time (asssuming no funny cache interactions going on) which is still too slow to be useful. I'm a bit bored with it now and don't really feel like trying it out just yet.

Maybe the YUV only conversion might still be a win on Android though - if loading an RGB texture (or an RGB 565 one) is significantly faster than the 3x greyscale textures i'm using now. I need to run some benchmarks there to find out how fast each option is, although that will have to wait for another day.

yuv to rgb

The YUV conversion code is fairly straightforward in NEON, although I used 2:6 fixed-point for the scaling factors so I could multiply the 8 bit pixel values directly. I didn't check to see if it introduces too many errors to be practical mind you.

I got the constants and the maths from here.

        @ pre-load constants
        vmov.u8 d28,#90                 @ 1.402 * 64
        vmov.u8 d29,#113                @ 1.772 * 64
        vmov.u8 d30,#22                 @ 0.34414 * 64
        vmov.u8 d31,#46                 @ 0.71414 * 64

The main calculation is calculated using 2.14 fixed-point signed mathematics, with the Y value being pre-scaled before accumulation. For simplification the code assumes YUV444 with a separate format conversion pass if required, and if executed per row should be cheap through L1 cache.

        vld1.u8 { d0, d1 }, [r0]!       @ y is 0-255
        vld1.u8 { d2, d3 }, [r1]!       @ u is to be -128-127
        vld1.u8 { d4, d5 }, [r2]!       @ v is to be -128-127

        vshll.u8        q10,d0,#6       @ y * 64
        vshll.u8        q11,d1,#6

        vsub.s8         q1,q3           @ u -= 128
        vsub.s8         q2,q3           @ v -= 128
        
        vmull.s8        q12,d29,d2      @ u * 1.772
        vmull.s8        q13,d29,d3

        vmull.s8        q8,d28,d4       @ v * 1.402
        vmull.s8        q9,d28,d5

        vadd.s16        q12,q10         @ y + 1.722 * u
        vadd.s16        q13,q11
        vadd.s16        q8,q10          @ y + 1.402 * v
        vadd.s16        q9,q11

        vmlsl.s8        q10,d30,d2      @ y -= 0.34414 * u
        vmlsl.s8        q11,d30,d3
        vmlsl.s8        q10,d31,d4      @ y -= 0.71414 * v
        vmlsl.s8        q11,d31,d5

And this neatly leaves the 16 RGB result values in order in q8-q13.

They still need to be clamped which is performed in the 2.14 fixed point scale (i.e. 16383 == 1.0):

        vmov.u8         q0,#0
        vmov.u16        q1,#16383

        vmax.s16        q8,q0
        vmax.s16        q9,q0
        vmax.s16        q10,q0
        vmax.s16        q11,q0
        vmax.s16        q12,q0
        vmax.s16        q13,q0
        
        vmin.s16        q8,q1
        vmin.s16        q9,q1
        vmin.s16        q10,q1
        vmin.s16        q11,q1
        vmin.s16        q12,q1
        vmin.s16        q13,q1

Then the fixed point values need to be scaled and converted back to byte:

        vshrn.i16       d16,q8,#6
        vshrn.i16       d17,q9,#6
        vshrn.i16       d18,q10,#6
        vshrn.i16       d19,q11,#6
        vshrn.i16       d20,q12,#6
        vshrn.i16       d21,q13,#6

And finally re-ordered into 3-byte RGB triplets and written to memory. vst3.u8 does this directly:

        vst3.u8         { d16,d18,d20 },[r3]!
        vst3.u8         { d17,d19,d21 },[r3]!

vst4.u8 could also be used to write out RGBx, or the planes kept separate if that is more useful.

Again, perhaps the 8x8 bit multiply is pushing it in terms of accuracy, although it's a fairly simple matter to use shorts instead. If shorts were used then perhaps the saturating doubling returning high half instructions could be used too, to avoid at least the input and output scaling.

Stop Press

As happens when one is writing this kind of thing I noticed that there is a saturating shift instruction - and as it supports signed input and unsigned output, it looks like it should allow me to remove the clamping code entirely if I read it correctly.

This leads to the following combined clamping and scaling stage:

        vqshrun.s16     d16,q8,#6
        vqshrun.s16     d17,q9,#6
        vqshrun.s16     d18,q10,#6
        vqshrun.s16     d19,q11,#6
        vqshrun.s16     d20,q12,#6
        vqshrun.s16     d21,q13,#6

Which appears to work on my small test case. This drops the test case execution time down to about 3.9ms.

And given that replacing the yuv2rgb step with a memcpy of the same data (all else being equal - i.e. yuv420p to yuv444 conversion) still takes over 3.7ms, that isn't too shabby at all.

RGB 565

An alternative scaling & output stage (after the clamping) could produce RGB 565 directly (I haven't checked this code works yet):

        vshl.i16        q8,#2           @ red in upper 8 bits
        vshl.i16        q9,#2
        vshl.i16        q10,#2          @ green in upper 8 bits
        vshl.i16        q11,#2
        vshl.i16        q12,#2          @ blue in upper 8 bits
        vshl.i16        q13,#2

        vsri.16         q8,q10,#5       @ insert green
        vsri.16         q9,q11,#5
        vsri.16         q8,q12,#11      @ insert blue
        vsri.16         q9,q13,#11

        vst1.u16        { d16,d17,d18,d19 },[r3]!

jjmpeg android video player work

Yesterday I was too lazy to get out of the house, so after getting over-bored I fired up netbeans and had a poke at the android video player JJplayer again.

And now it's next year.

I tried a few buffering strategies, copying video frames, loading the decoded frames directly with multiple buffers (works only on some decoders), and synchronously loading each decoded frame into a texture as it is decoded. I previously had some code to load the texture on another GL context but I didn't see if it still worked (it was rather slow which is why i let it rot, but it's probably worth a re-visit).

Just copying the raw video frame seems to be the most reliable solution, even with it's supposed overheads. Actually it didn't seem to make much difference to performance how I did it - they all ran with similar cpu time (according to top).

I've looked into doing the scaling/yuv/rg565 conversion in NEON but haven't got any code up and running yet (one has to be a bit keen to get stuck into it). I doubt it will be quicker as this means I wont be using the GPU for this processing, but given how slow the texture loading is it might be a win and it will let me avoid redundant copies.

I also fixed some of the android behavioural stuff - pause/resume pauses/resumes the playback, it runs in full-screen with a hidden 'ui' (although for whatever reason, setting the slider to invisible doesn't always work, and never on the mele), and it now opens the video in another thread with a busy spinner.

Although it's playing back most SD sized videos ok on my (dual core) ainol tablet, it is struggling on the mele. Even when the mele can decode fast enough the timing is funny - jumping around (and oddly, the load average is well over 2 yet it decodes all frames fast enough). I guess the video decode was behind the audio so it was just displaying frames as fast as it decoded them rather than with per-frame timing. So I tried another timing mechanism rather than just using an absolute clock from the first frame decoded, have it based on the audio playback position. This works quite a bit better and lets me add a tunable delay, although it isn't perfect. It falls down when you seek backwards, but otherwise it works reasonably well - and should be fixable. OTOH even with the busted timing the audio sync was consistent - unlike Dolphin Player which loses sync quite rapidly for whatever reason.

Unfortunately opening some videos seems to have become quite slow ... not sure what's going on there, whether it's just slow i/o or some ffmpeg tunable (I was poking around with some other stuff before getting back to jjplayer so i might've left something in). I don't know how to drop the decoding of video frames yet either - which would be useful. I think I need to add some debugging output to the display to see what's going on inside too, and eventually run it in the profiler again when I have most of a day free.

The code is still a bit of a pigs breakfast and I haven't checked it in yet. But i should get to it either today or soon after and will update the project news. I should probably release another package too - as the current one is too broken to be interesting.

In the not too distant future I will probably poke at a JavaFX version as well. If I don't find something better to do ... must really look at the Eiphany SDK sometime.

50K

Sometime in the next 12 hours this blog should breach the 50K pageview tally. So 'yay' for me. Anyway, I keep an eye on the stats just to see what's hot and what's not, and i'll review the recent trends.

So last time I did this the main posts hit were to do with performing an FFT in Java. The new king is 'Kobo Hacking'. I'm pretty sure this is mostly people just trying to hack the DRM or the adverts out, but there does seem to be a few individuals coding for the machine as well. And what little organisation there is seems to be centred on the mobileread forums, although it seems like me it's mostly individuals working on their own for personal education and entertainment.

Another big source of pageviews is the JavaFX stuff (actually it's the biggest combined) - and this is only because they were added to the JavaFX home page (it wasn't something I asked for btw). I must admit i've done a couple of 'page hit' posts under the tag, but it's always just been stuff i've been playing with anyway.

The post about the Mele A2000 gets quite a few hits although i'm not sure what people are still looking up (probably wrt the allwinner A10 for hacking). The ARM android tv boxes keep leaping ahead in performance and specifications and it's already quite dated. I still use mine mainly for playing internet radio, and sometimes watching stuff recorded with MythTV (I normally use the original PS3, but I try to save some power when I remember to).

The Android programming posts get regular visits - although I think it's mostly people looking up stuff they should be able to find in the SDK. Last time I had to do some android code I didn't like it at all - the added api to work around earlier poor api design choices, the lack of documentation, and the frustration with the shitty lifecycle model more than made up for any 'cool' factor. I'd rather be coding JavaFX with real Java - actually the fact that Oracle have an android version internally but are trying to monetise it ... puts me off that idea a bit too. You can't really monetise languages and toolkits anymore, there's too much good free stuff (e.g. Rebol finally went free software, least it wallow in obscurity forever).

jjmpeg also contributes a steady stream. Given how many downloads there have been i'm a bit, i dunno, bummed i guess, about how little correspondence has been entered into regarding the work. Free Software for most just seems to mean 'free to take'. Still, I made some progress with the Android player yesterday so it might actually become useful enough to me at some point to be worth working on.

The image processing and computer vision posts have picked up a bit in the last few months, although i'm sure most of the hits are people just looking for working solutions or finished homework, or how to use OpenCV. I hate OpenCV so don't ask me!

I get a good response to any NEON posts containing code - although I haven't put too many up at this point. That's always fun to work on because one only ever considers fairly small problems that can be solved in a few days, keeping it fresh.

I even get a few hits on the cooking posts although I wish the hot sauces had more interest - because I think they're pretty unique and very nice.

The future

So I mentioned earlier how Free Software just seems to mean "free to take" - i certainly have no idea who is using any of the code I put out apart from the Kobo touch software using the toolkit/backend from ReaderZ - which I only found out by accident. Although the code is out there for this very reason, it's a bit disappointing that there is almost zero feedback or any indication where the code is being used - and i'm sure for example that jjmpeg is being used somewhere.

Once you hit a certain point in developing something I find the fun goes out of it - problems get too big to handle casually, earlier design mistakes require big rewrites, and unless you're using the software in anger there's no real reason to work on it at all after the initial fun phase. A lot of the stuff I have sitting in public has hit that point for me and it's not like i'm using most of the applications I come up with (or would even if I finished them).

For me it's about the journey and not the destination. And particularly for anything I work on in my spare time it has to be fun or what's the point? I already get paid to work on software, I don't want to "work" on it for free as well.

It's just for fun

Not sure where i'm going here ... I guess I will amble along and keep doing what i'm doing, mucking about with whatever takes my fancy or is interesting at the time.

So probably not much will change then.

Meat drier

I've been meaning to make a meat drier for a while and I decided not to hang around now i'm on holiday and spent a couple of days putting together a drying cabinet, and then making some content to put into it.

I had some cheap shitty old all-steel screw together shelving unit that I hadn't unpacked since I moved from Perth, and I figured that was about the right size to at least get some parts out of. I originally intended for it to stand upright with the shelving standing up length-wise, but after I got it screwed together I realised I could just flip it on it's side and it also had legs ... which solved one of the problems I had wrt mounting any additional hardware ...

I'm just using a 40w globe for heat/airflow for now, and I'll see how that goes. After a few hot and humid weeks we're back to cool and dry which should be fine.

I also had a fly-screen lying around not doing anything, and by complete chance it happened to fit the width exactly. For now I made a very simple wire clip to hold it in place and have the foam to prop it up a little to make it cover the top.

The only metal I had to cut were the sides and the length of the rails. The sides were off-cuts from the shed, and I only had to cut about 50mm off those. And I had to drill a few holes where the existing ones didn't align. The only other fabrication was to pound down the wider inner seam under the shelving so the 4 panels fit together snugly - a 6" piece of railway line and a hammer did the job. Oh, and bend a few bits of thick wire for the clips and an off-cut for the drip cover.

I already had the coffee-tin bulb holder from previous efforts, and the profile of the steel on the sides provides a ledge for the rails at several heights.

For the biltong, I just did the main muscle in a whole rump (which is the cheapest bulk meat I have handy), which leaves plenty of drying space, although if I used wire hooks it would hang sideways and let me fit more in.

Now hopefully it comes out ok ... I've made it before using a cardboard box, but it's been a while and although the details don't seem to matter much, sometimes they do.

Hacking n garden n stuff

Well, work has finished for another year. The expected JavaFX stuff didn't eventuate, and I spent the last couple of months on a little bit of maintenance and a lot of non-coding work.

So I wasn't in the right mindset for hacking of any sort - ARM assembly, android, video stuff (I noticed that JellyBean now has a Java api to the hardware video codecs though), nor had time to look at the Parallella SDK which I got last week.

Or kobo touch. Although I indirectly found a project that was using my "ReaderZ" code including the widget toolkit to write another frontend for it. I only found it as I noticed the mobileread site showing up in my stats information for this blog.

And I just haven't had the energy or need to work on anything much in my spare time.

So when I do have some time I've been looking after the garden - mostly the lawn, pot plants, and veggie patches. On the whole we had a very cold and dry spring this year so it's been pretty slow going getting some of the plants going, but I now have some well established purple beans, cucumbers, chillies, egg-plants, sweet potatoes, and even some tomatoes which sprouted on their own. Expecting a bumper crop of some purple beans - the vines are growing around 10-15cm per warm day and one is already 3m tall and has reached the gutter on the rear verandah!

Haven't been cooking too much interesting stuff, although i'm slowly working my way through the condiments and frozen curries - the lime chilly marmalade/chutney is the current favourite with a good strong cheese and some crackers.

And when i'm done with all that, trying to get some reading in. Which usually amounts to 2-3 pages to find out where I was, 2 pages of reading, and then waking up half an hour later with the device asleep and my spot lost again.

Now i'm on leave I'll wind down a bit, drink a bit, poke at the garden a bit, try to get out on the bike and visit friends a bit more often, try to at least get started on the shed floor and other junk in the yard ... and i'm sure even find time to hack on something if the inspiration hits.

I don't get tablets ...

But that's ok, they're not aimed at me.

I didn't quite understand how the i(ncontinence)pad took off like it did at the time, but now I think I know why.

Most people really just don't like computers at all (of course, M$ has to take the lion's share of the blame here), but they use them because they like what computers let you do. Obviously there is a not-so-subtle distinction between the two.

Apple sold an appliance, and by a fortuitous combination of technology that matured at around the same time - everything from Ghz class portable cpu's to wireless networks to ubiquitous home-lans to cheap manufacturing centres in China and internet publishing, they were able to make a slick-enough device to finally create an appliance-like machine. After all, similar things had been tried before - many times - and failed, mostly because the technology just wasn't there yet. Too slow, too clunky, too expensive, and so on.

And this is the precise reason the M$ $urface will crash and burn.

It just shows that M$ don't get it in the least.

They're still trying to peddle the "computer" experience - and if anything can be gained by the market in the last 5 years is that people do not want that. Least of all the M$ version of it.

And it's all the funnier because that is precisely their biggest selling point!

A netbook by any other name ...

We had small computers with underpowered cpu's, tiny screens, shitty battery life, and fold-out keyboard years ago - they were called netbooks. And at least they were cheap.

LBP for object recognition

So I finally got around to trying out my LBP-P (pixel) classifier algorithm for the task of object recognition. This is the one i've mentioned in numerous previous posts which i've applied with moderate success to the problem of object detection.

I don't have any code or numbers to publish at this time but if I haven't made a mistake it appears to work much better than histogram/block-based LBP algorithm (which is one of the algorithms implemented in OpenCV) with a similar sized descriptor on the data i'm testing. Computational cost is somewhat lower too - particularly if one can delve into some assembly language and the cpu has a vtbl (NEON) like instruction. I have still yet to incorporate scale into the calculation, but i can only assume that could improve it: if I can work out how to combine the probabilities in a meaningful way.

Given that it works at all as a detector means it has to have a pretty good false-positive rate, and so I thought it should work quite well at the recognition task too.

And as I now know how to generate ROC curves ... one day I will do this for the object detector; but that is something for another day, as today is hot, and I am lazy.

I've also been trying various other LBP code variations.

The LBP 8,1 u2 code I started with is is still clearly the best of the rest, but I found that the CS-LBP variation - centre-symmetric - usually works at least or good or even better in many cases. This is pretty significant as it's only a 4-bit code which means classifiers and/or descriptors can be 1/4 the size of the 59-code (~6 bit) LBP 8,1 u2 encoding. Rather than do 8 comparisons around a centre pixel this just does 4 comparisons against opposite pixels: n/s, ne/sw, e/w, se/nw.

For the LBP histogram algorithm I generally had no luck trying to pre-process the images - usually anything I did adversely affected the results - although that depended on the image size. The size of the image seems to have more of an impact on the LBP histogram algorithms. And surprisingly a fairly small image had the strongest result: I suspect that scale is doing some of the pre-processing for me in this case. Unless i'm missing something, for LBP based algorithms I never understood why anyone would perform any intensity based normalisation (histogram equalisation and so on) as I don't see how they could affect the results as the LBP code is a local gradient operator and scaling couldn't affect that (that's kind of the whole point of using them).

Although i'm now quite a fan of the LBP, I should also get a chance to try a similar set of algorithms using wavelets: it will be interesting to see how they perform, and even if they aren't as good they might be useful if the individual results aren't correlated too closely. Although given they also perform localised directional gradient measures, I suspect they will correlate. Still, they'd have to be quite a bit better to justify the increased processing required.

Update: Ahh bummer, when I did a more valid study the LBP-P classifier isn't quite as good as my first results suggested. That's a pity. I can still get a decent result, but it starts to take a lot of training and/or too much memory to get really good results.

Community is Exclusion

I've been meaning to write something along these lines for a while. But I finally pulled my thoughts together with the help of many thousand millions of tireless, beautiful, and unique yeast cells, and am finally ready to have a crack at it.

Community blah

Community this, and community that - it seems to be the 'meme' of anything to do with any project or movement or idea these days. Do anything, start a community.

But I don't like the word, it's just a bullshit and divisive word.

On the surface, a community is all about inclusion, but in reality it is precisely the opposite: it is all and only about exclusion. And exclusion only means one thing: division and the politics thereof¹.

So when people talk about the "Linux community", "Ubuntu/Fedora/etc community", the "FOSS/FLOSS/Open Sauce(tm) community", or even "Free Software Community" they are really talking about a group who identifies themselves just as much by who they don't consider fellow members as who they do.

Once you identify with a particular "community" and are able to label another's, you tend to huddle together and treat those with different labels as 'forners', to be distrusted as a direct threat to you and your homies. An analogous situation is the wearing of religious paraphernalia in an attempt to isolate yourself from your fellow human beings on purpose to provoke a reaction. Or wearing gang colours or tramp stamps.

Even within so-called "communities" there are stark divides. For example, for several years I have forced my browser to a fixed font type and size and so didn't realise Groklaw had at some point directly separated the 'haves' and the 'have nots' by deliberately de-emphasising anonymous posters from those with name accounts by using a tiny font for the former. I was utterly shocked at this insidious behaviour from a site purporting to be about freedom and legal rights.

And mailing lists and on-line forums all suffer from even worse problems: say something out of line and you are simply banished. Rather than starve the true trolls who will eventually leave (or seek medication), or simply use them for entertainment, they simply punish any and all divergent views in precisely the same way a cult would.

You're standing in it

Here's a better word: World.

The GNU/Linux World. The Free Software World. And so on.

World is an all-inclusive word that encompasses not only all of humanity, but also all other known living entities and the entirety of the system on which we depend for our fragile existence.

World is a word which recognises a whole spectrum of opinions: not merely a black or white, true or false, the simple and stupidly stark 0 or 1 result of binary state which is all a machine can cope with.

World also implies there are issues and complications which sometimes simply cannot be solved at all, let alone on internet time-frames.

World also goes beyond the bronze-age tribal nonsense we've been saddled with since then - and it's about time we evolved socially to cope as the next stage of human evolution least we perish.

There's been some recent noise about the dangers of the "cult of personality" that infests the tech world - but I think in the case of GNU/Linux the problem is more simply a plain old "cult" based on community based tribal identification (Microsoft and Apple and any other multi-national with a strong brand presence are also well in on the act - but that is on purpose and for marketing reasons). I suggest that this leads directly to a tribal "herd" mentality and ultimately the "group think" which simply silences dissenting voices rather than acknowledge the breadth of human experience.

This situation is simply not healthy - and downright embarrassing after 100 000 years of so-called human "progress".

I think it's about time to grow up out of this tribal bronze age absurdity and call it out for the archaic and worthless nonsense it is.

1. Politics goes well beyond who we do or do not vote for on polling day. As soon as you have more than one person in a room: you have politics.

About Me

Tags