About Me

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as Zed
  to his mates & enemies!

notzed at gmail >
fosstodon.org/@notzed >

Tags

android (44)
beagle (63)
biographical (104)
blogz (9)
business (1)
code (77)
compilerz (1)
cooking (31)
dez (7)
dusk (31)
esp32 (4)
extensionz (1)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (459)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (231)
java ee (3)
javafx (49)
jjmpeg (81)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (10)
opencl (120)
os (17)
panamaz (5)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
players (1)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
vulkan (3)
wanki (3)
workshop (3)
zcl (4)
zedzone (26)
Friday, 11 January 2013, 04:13

A/V sync

Had a bit of a limp stab at a/v sync today.

I started with something simple although it ended up a bit too complicated - trying to synchronise multiple decoding and display threads is a little messy. I was trying to hide as much as possible inside MediaReader and the Audio/VideoDecoder classes, but it got ugly.

But as far as the timing, the simple approach nearly worked as far as sync goes, there's just a couple of issues remaining:

The simple approach was to have a central MediaClock object which tracks the audio renderer position, and then does various timing calculations for the video sync. It manages pause as well as interacts with seek. Maybe it isn't so simple ...

Eventually I should work it out.

Tagged hacking, jjmpeg.
Wednesday, 09 January 2013, 11:01

Preserving arbitrary Aspect Ratio in JavaFX

I had an attempt at displaying the proper aspect ratio in JavaFX, and after a couple of false-starts came up with a pretty simple solution. The ImageView does it's own aspect ratio preservation but for a WritableImage the pixels are always square - as far as I can tell.

So one must adjust it outside. First I just used:

        vout.setScaleX(aspect);

To set the ratio - this displayed the video properly but didn't take the adjust size into account during layout and fitting to the display area. Not really a show-stopper as the user can just adjust the window until it fits, but I was sure I could do better than that.

I tried various things such as placing the vout (an ImageView) into a Group and so on - but this didn't work (of course) as I was setting the dimensions of the ImageView relative to the window size.

Actually it turned out to be extremely simple: since I'm scaling the ImageView when it is being displayed, I just have to scale the inverse when I'm binding it's dimensions:

        vout.fitWidthProperty().bind(root.widthProperty().divide(aspect));

The height is simply bound to the root.height.

And also remember to take it into account on the initial scene size too:

        Scene scene = new Scene(root, width * aspect, height);

So apart from calculating the aspect ratio, those were the only lines of code required. Seems to work although i've only tested it on one PAL 16:9 video so far ...

Although the screen capture stuff just stores the unscaled frame still ... which is probably what I want tbh.

Well not a bad haul today for a poor nights sleep and feeling a bit crap overall.

Update: Added some shots to show it it works in JJPlayer.

Window too narrow - automatically scaled to fit horizontally:

Window too wide - automatically scaled to fit vertically (and controls fading out):

A raw non-corrected frame grab showing the image using square pixels:

Tagged hacking, javafx, jjmpeg.
Wednesday, 09 January 2013, 05:50

More video player work

Mucked around with a few things in JJPlayer last night and this morning. Nothing on TV last night and I just wanted a couple of hours today.

The new buttons and styling.

There were a few annoying things along the way. The keyboard input was a bit of a pain to get working, I had to override Slider.requestFocus() so it wouldn't grab keyboard focus when used via the mouse (despite not being in the focus traversal), and I added a 'glass-pane' over the top of the window to grab all keyboard events. I had to remove all buttosn from the focus traversal group and only put the 'glass-pane' in it. I made the glass-pane mouse-transparent so the buttons still work.

Another strange thing was full-screen mode. Although JavaFX captures ESC to turn it off, the ESC key event still ends up coming through to my keyboard handler. i.e. I have to track the fullscreen state separately and quit only if it's pressed twice.

Next to fix is those a/v sync issues ...

Update Well instead I added a frame capture function. Hit print-screen and it captures the currently displayed frame (raw RGB) and opens a pannable/zoomable image viewer. From here the image can be saved to a file.

Although I haven't implemented it, one could imagine adding options to automatically annotate it in various ways - timestamp, filename, and so on.

Update 2: I subsequently discovered "accelerators", so i've changed the code to use those instead of the 'glass pane' approach - a ton of pointless anonymous inner Runnables, but it feels less like a hack.

        scene.getAccelerators().put(new KeyCombination(...), ...);

Update 3: I tried it on my laptop which is a bit dated and runs 32-bit fedora with a shitty Intel onboard GPU (i.e. worthless). It runs, but it's pretty inefficient - 2-4x higher load than mplayer on same source. Partly due to Java2D pipeline I guess but it's probably all the excess frame copying and memory usage. I tried to do some profiling on the workstation but didn't have much luck. Just seems to be spending most of it's time in Gtk_MainLoop, although the profiler doesn't seem to know how to sort properly either.

Tagged android, hacking, javafx, jjmpeg.
Monday, 07 January 2013, 02:38

JJPlayer controls

This morning I added sound and started work on some controls for the JavaFX version of JJPlayer.

It's mostly pretty buggy but it kind of works ... some of the time.

I got JavaSound working easily enough at least - although i'm just converting to stereo 16-bit for now as I did on Android.

But there are some big problems with the way i'm handling the a/v sync when pause or seeking, so things get messed up some of the time. Too lazy to fix it right now ...

I kept playing a bit with the UI and added a fade in/out of the controls as well as hiding the mouse pointer, and some stylesheet stuff.

Clearly the styling and the ASCII-icon buttons both leave something to be desired at this point.

I thought i'd have trouble getting the hiding to reverse if the user started moving the mouse whilst it was hiding, but it turned out to be pretty easy. Just change the rate on the fading out animation to reverse it, and it still runs to completion at the other end instead. So it fades in and out smoothly depending on user action, and without any eye-jars.

Tagged hacking, javafx, jjmpeg.
Saturday, 05 January 2013, 03:21

JJPlayer for JavaFX

I finally had enough of Android and started poking at a JavaFX version of JJPlayer this morning.

I'm hoping that working on both of them i'll be able to fine tune the design a bit and help with debugging - e.g. i already fixed the end-of-file bugs and "discovered" the av_frame_get_best_effort_timestamp() function (guh, what a name, bad memories of GNOME dev flooding back).

So far I just have unscaled video going (no sound). There are no controls but I should think JavaFX will make that pretty easy to add (and an opportunity to add some bling). But it can play multiple videos in sequence and the window sizes to fit without eye-jars. Once I get sound going i'll look at filling out the GUI for it - I prefer the OpenAL API but I might try with JavaSound this time.

Performance seems ok, although my first measurements had it on par with mplayer with the GL backend but that seemed to be a specific video (actually it's a touch lower cpu usage on that video). Uses 3-4x memory, but that's cheap isn't it? Unfortunately as one can only write RGB data to WritableImage's, it has to do a YUV-RGB step separately and then perform a redundant copy but there's not much option there.

No screenshot yet as it's pretty basic ...

The code (GPL3) will be in the jjmpeg-1.0 branch of jjmpeg by the time you read this or shortly thereafter inside jjmpeg-javafx.

Tagged hacking, javafx, jjmpeg.
Friday, 04 January 2013, 04:28

JJPlayer 1.0-a1

I decided I had enough hacked into the JJPlayer code to warrant a release - this one is actually (somewhat) usable as a player now. Unlike the previous which were totally broken. I'm getting a bit bored with it so it seemed as good a point as any.

My ainol elf 2 can manage a test 720p h.264 file pretty well - with only the occasional group of stutters on complex scenes with an eventual catch up. Unfortunately the Mele can barely handle PAL MPEG ... Update: I was working on some decoding-skipping throttle code, and for whatever reason, the Mele can now handle PAL MPEG just fine (nothing to do with the new code). No idea ... maybe i/o related - the Mele seems to go funny after suspend too.

I managed to fit in a few useful things in no particular order:

There are still some unsolved issues wrt audio sync, pixels are assumed square, the possibility of more aggressive frame dropping (i.e. not decoding B frames and/or I frames, etc), very inefficient memory use (e.g. i have 31 AVFrame buffers and 31 YUV textures although I only need 2 of the latter), performance, jarring UI manoeuvres when the UI is shown/hidden, and many others I can't be bothered enumerating at this point. And that doesn't include the 'missing features' like a pause button or subtitles.

Most of the Android specific code is a huge mess - experiments are still on-going.

See the downloads link in the jjmpeg project.

But if you're after a more polished free software Android video player, go look at Dolphin Player. Although that seems to have some audio glitches and drift, and is implemented afaict as a native SDL app ported to Android rather than as a Java player.

Whilst writing this my ADSL network went down - probably heat related - so maybe thats a hint to go drink beer instead of hack. Might start with the coffee i brewed and forgot to drink this morning - that'll make a nice iced coffee. Mmm, blenders.

We're in the middle of a heat-wave and I should probably be at the beach or something - but the house is (relatively) cool and I can get out and water the garden to keep it alive here ... meant to be 44 degrees today, but that isn't even enough to get a record. Yesterday I got out an IR thermometer and measured 75 degrees on the lid of a plastic green sulo bin that isn't even in full-sun all day. Today has the added bonus of being windy too - feels like being inside a fan-forced oven when you walk outside. Maybe it's better not to go riding in such weather ...

Tagged android, jjmpeg.
Thursday, 03 January 2013, 03:15

NEON YUV vs GPU

This morning I did some experiments with Android and the YUV code - although patience is wearing thin for such a shitty alternative to GNU/Linux that Android is. As icing on the cake most of the android developer site just doesn't render on most of my browsers anymore - I just get junk. Well I can always go elsewhere with my spare time ...

I changed the code to perform a simple doubling up of the U and V components without a separate pass, and changed to an RGB 565 output stage and embedded it into the code in another mess of crap. Then I did some profiling - comparing mainly to the frame-copying version.

Interestingly it is faster than sending the YUV planes to the GPU and using it to do the YUV conversion - and that is only including the CPU time for the frame copy/conversion, and the texture load. i.e. even using NEON it uses less CPU time (and presumably much less GPU time) even though it's doing more work. The volume of texture memory copied is also 33% more for the RGB565 case vs YUV420p one.

Still, 1ms isn't very much out of 10 or so.

The actual YUV420p to RGB565 conversion is only around 1/2 the speed of a simple AVFrame.copy() - ok considering it's writing 33% more data and I didn't try to optimise the scheduling.

Stop press Whilst writing this I thought i'd look at the scheduling and also using the saturating left shift to clamp the values implicitly. Got the inner loop down from 54 to 35 cycles (according to the cycle counter), although it only runs about 10% faster. Better than a kick in the nuts at any rate. Fortunately due to the way I already used registers I could decouple the input loading/formatting from the calculations, so i simply interleaved the next block of data load within the calculations wherever there were delay slots and only made the data loading conditional.

The (unscheduled) output stage now becomes:

        @ saturating left shift automatically clamps to signed [0,0xffff]
        vqshlu.s16      q8,#2           @ red in upper 8 bits
        vqshlu.s16      q9,#2
        vqshlu.s16      q10,#2          @ green in upper 8 bits
        vqshlu.s16      q11,#2
        vqshlu.s16      q12,#2          @ blue in upper 8 bits
        vqshlu.s16      q13,#2

        vsri.16         q8,q10,#5       @ insert green
        vsri.16         q9,q11,#5
        vsri.16         q8,q12,#11      @ insert blue
        vsri.16         q9,q13,#11

        vst1.u16        { d16,d17,d18,d19 },[r3]!

Which saves all those clamps.

As suspected, the 8 bit arithmetic leads to a fairly low quality result, although the non-dithered RGB565 can't help either. Perhaps using shorts could improve that without much impact on performance. Still, it's passable for a mobile device given the constraints (and source material), but it isn't much chop on a big tv.

Of course, all this wouldn't be necessary if one had access to the overlay framebuffer hardware present on pretty well all ARM SOCs ... but Android doesn't let you do that does it ...

Update: I've checked a couple of variations of this into yuv-neon.s, although i'm not using it in the released JJPlayer yet.

Mele vs Ainol Elf II

The Elf is much faster than the Mele at almost everything - particularly video decoding (which uses multiple threads), but pretty much everything else is faster (Better memory? The Cortex-A9? The GPU?) and with the dual-cores means it just works a lot better. Can't be good for the battery though.

(as an aside, someone who spoke english should've told the guys in China that "anal elf 2" is probably not a good name for a computer!)

But the code is written with multiple cores in mind - demux, decoding of video and audio, and presentation is all executed on separate threads. Having all of the cpu-bound tasks executed in a single thread may help on the Mele, although by how much I will only know if and when I do it ...

Tagged android, beagle, hacking, jjmpeg.
Wednesday, 02 January 2013, 06:31

NEON yuv + scale

Well I still haven't checked the jjmpeg code in but I did end up playing with NEON yuv conversion yesterday, and a bit more today.

The YUV conversion alone for a 680x480 frame on the beagleboard-xm is about 4.3ms, which is ok enough. However with bi-linear scaling to 1024x600 as well it blows out somewhat to 28ms or so - which is definitely too slow.

Right now it's doing somewhat more work that it needs to - it's scaling two rows each time in X so it can feed into the Y scaling. Perhaps this could be reduced by about half (depending on the scaling going on), which might knock about 10ms off the processing time (asssuming no funny cache interactions going on) which is still too slow to be useful. I'm a bit bored with it now and don't really feel like trying it out just yet.

Maybe the YUV only conversion might still be a win on Android though - if loading an RGB texture (or an RGB 565 one) is significantly faster than the 3x greyscale textures i'm using now. I need to run some benchmarks there to find out how fast each option is, although that will have to wait for another day.

yuv to rgb

The YUV conversion code is fairly straightforward in NEON, although I used 2:6 fixed-point for the scaling factors so I could multiply the 8 bit pixel values directly. I didn't check to see if it introduces too many errors to be practical mind you.

I got the constants and the maths from here.

        @ pre-load constants
        vmov.u8 d28,#90                 @ 1.402 * 64
        vmov.u8 d29,#113                @ 1.772 * 64
        vmov.u8 d30,#22                 @ 0.34414 * 64
        vmov.u8 d31,#46                 @ 0.71414 * 64

The main calculation is calculated using 2.14 fixed-point signed mathematics, with the Y value being pre-scaled before accumulation. For simplification the code assumes YUV444 with a separate format conversion pass if required, and if executed per row should be cheap through L1 cache.

        vld1.u8 { d0, d1 }, [r0]!       @ y is 0-255
        vld1.u8 { d2, d3 }, [r1]!       @ u is to be -128-127
        vld1.u8 { d4, d5 }, [r2]!       @ v is to be -128-127

        vshll.u8        q10,d0,#6       @ y * 64
        vshll.u8        q11,d1,#6

        vsub.s8         q1,q3           @ u -= 128
        vsub.s8         q2,q3           @ v -= 128
        
        vmull.s8        q12,d29,d2      @ u * 1.772
        vmull.s8        q13,d29,d3

        vmull.s8        q8,d28,d4       @ v * 1.402
        vmull.s8        q9,d28,d5

        vadd.s16        q12,q10         @ y + 1.722 * u
        vadd.s16        q13,q11
        vadd.s16        q8,q10          @ y + 1.402 * v
        vadd.s16        q9,q11

        vmlsl.s8        q10,d30,d2      @ y -= 0.34414 * u
        vmlsl.s8        q11,d30,d3
        vmlsl.s8        q10,d31,d4      @ y -= 0.71414 * v
        vmlsl.s8        q11,d31,d5

And this neatly leaves the 16 RGB result values in order in q8-q13.

They still need to be clamped which is performed in the 2.14 fixed point scale (i.e. 16383 == 1.0):

        vmov.u8         q0,#0
        vmov.u16        q1,#16383

        vmax.s16        q8,q0
        vmax.s16        q9,q0
        vmax.s16        q10,q0
        vmax.s16        q11,q0
        vmax.s16        q12,q0
        vmax.s16        q13,q0
        
        vmin.s16        q8,q1
        vmin.s16        q9,q1
        vmin.s16        q10,q1
        vmin.s16        q11,q1
        vmin.s16        q12,q1
        vmin.s16        q13,q1

Then the fixed point values need to be scaled and converted back to byte:

        vshrn.i16       d16,q8,#6
        vshrn.i16       d17,q9,#6
        vshrn.i16       d18,q10,#6
        vshrn.i16       d19,q11,#6
        vshrn.i16       d20,q12,#6
        vshrn.i16       d21,q13,#6

And finally re-ordered into 3-byte RGB triplets and written to memory. vst3.u8 does this directly:

        vst3.u8         { d16,d18,d20 },[r3]!
        vst3.u8         { d17,d19,d21 },[r3]!

vst4.u8 could also be used to write out RGBx, or the planes kept separate if that is more useful.

Again, perhaps the 8x8 bit multiply is pushing it in terms of accuracy, although it's a fairly simple matter to use shorts instead. If shorts were used then perhaps the saturating doubling returning high half instructions could be used too, to avoid at least the input and output scaling.

Stop Press

As happens when one is writing this kind of thing I noticed that there is a saturating shift instruction - and as it supports signed input and unsigned output, it looks like it should allow me to remove the clamping code entirely if I read it correctly.

This leads to the following combined clamping and scaling stage:

        vqshrun.s16     d16,q8,#6
        vqshrun.s16     d17,q9,#6
        vqshrun.s16     d18,q10,#6
        vqshrun.s16     d19,q11,#6
        vqshrun.s16     d20,q12,#6
        vqshrun.s16     d21,q13,#6

Which appears to work on my small test case. This drops the test case execution time down to about 3.9ms.

And given that replacing the yuv2rgb step with a memcpy of the same data (all else being equal - i.e. yuv420p to yuv444 conversion) still takes over 3.7ms, that isn't too shabby at all.

RGB 565

An alternative scaling & output stage (after the clamping) could produce RGB 565 directly (I haven't checked this code works yet):

        vshl.i16        q8,#2           @ red in upper 8 bits
        vshl.i16        q9,#2
        vshl.i16        q10,#2          @ green in upper 8 bits
        vshl.i16        q11,#2
        vshl.i16        q12,#2          @ blue in upper 8 bits
        vshl.i16        q13,#2

        vsri.16         q8,q10,#5       @ insert green
        vsri.16         q9,q11,#5
        vsri.16         q8,q12,#11      @ insert blue
        vsri.16         q9,q13,#11

        vst1.u16        { d16,d17,d18,d19 },[r3]!

Tagged android, beagle, hacking, jjmpeg.
Newer Posts | Older Posts
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!