rmi

So i'm stuck a bit with a problem at work, we're using some crappy obsolete API that only works in 32-bit mode, but the rest of the application needs to run in 64-bit to fully utilise the hardware. Bit of a shit really - it's unclear if there is a 64-bit API available - I think there is - but I don't have access to the device and he who does is currently travelling ...

As a fall-back in-case there isn't another way I hacked up a quick system based on RMI - and I was surprised how easy it was considering I started from scratch. I think it helped that I already had a relatively simple API that abstracted different devices, so I just wrote a proxy object in a few dozen lines of code. And of course it helped that I've played with CORBA before and know how it all works.

The nice thing about RMI is it lets you send classes across the wire (with local processing behaviour as well), which meant I could implement some backend dependent code like getControls() in the GUI front-end process with only a few minor changes to use a more generic property interface.

No doubt it could be (much) more efficient, but I think it should suffice for this project. It's something i've been meaning to try for a while and if this experience is any guide it wont be the last either.

Yeah, i'm happy (as happy as i get anyway) to be hacking again after being stuck in maths for so long, although we had a phone meeting today and I've some more ideas to try so it's not over yet.

jjmpeg video creation

Been a bit quiet around here lately - actually it hasn't at all. Building going on next door means I haven't had a decent sleep in a couple of weeks and it's really wearing me down. Add to that I was stuck for a few weeks trying to grok some hairy maths for work and I just haven't had the inclination nor energy to pursue much other than eating, drinking, and some TV.

But back OT, I've started looking at moving my client's application to using jjmpeg - as I need 64-bit microsoft to work, and i'm also having some troubles with gc load with lots of transient objects.

Getting video frame reading going was trivial but I had to code up a fair bit of extra stuff to be able to create videos in proper containers which is the other requirement I have. I've checked in a first cut at that - although I need to do more testing particularly wrt GC performance.

I tried to come up with a helper class with a nice API to use it, and the following demonstrates it's use:

        AVFormatContext.registerAll();

        JJMediaWriter writer = new JJMediaWriter(filename);
        JJVideoStream vstream = writer.addVideoStream(width, height, fps, 400000);
        BufferedImage image = vstream.createImage();

        writer.open();

        Graphics2D gg = image.createGraphics();
        gg.setBackground(Color.black);
        gg.setColor(Color.white);
        for (int i = 0; i < fps * 5; i++) {
                gg.clearRect(0, 0, width, height);
                gg.drawString("Moving Text!", i, i);
                vstream.addFrame(image);
        }

        writer.close();

Well I can't imagine it being much simpler than that. This also (for the most part) avoids transient object creation, so should be (relatively) efficient.

Unfortunately things aren't so clean under the bonnet, but I guess what you don't see wont hurt you will it?

The full example is in VideoWriter.java

Bad typesetting, and evince sucks.

I've been reading a lot of papers of late, and any that aren't typeset using TeX are blatantly obvious and hard to read. Even very old papers are much easier to read than some of the modern ones - I wonder how they are being formatted. The microsoft word ones are really obvious - and utterly awful - but there are others which are pretty crap too. The mathematics is the obvious really low-point, but even the typefaces and layouts are pretty rank.

It is however quite a pleasure when I do come across a properly typeset manuscript in the familiar CMR typeface.

I am also utterly fed up with evince. It refuses to remember anything I ask it - from the location I save files to, to the size of the window, and the lack of global close is just annoying. I had a very wide two-page book I just played with at full two-monitor wide size ONE FUCKING TIME a few days ago, and Now every time I open EVERY document, evince opens at the same ginormous size. Fucking ticking me off big-time.

I even tried mupdf - which is just a bit bare-bones for my liking (although it renders much better), and discovered jmupdf along the way who's tiny demo test application is almost as 'featureful' as evince (not that that means much ...), so I might feel another project coming on ... apart from reading them, i'm staring to get a sizeable collection of papers and it's hard to keep track of them based-on-the-title-in-the-filename.

Fucked up Fridays

What is it about Fridays lately ...Well the latest little thing to ruin my day has been the inability of Firefox 7 to function correctly with the primary selection.

~~It seems to want to ignore middlemouse.contantLoadURL for some reason~~

. Given that it's a recently new setting and fully documented I presume it's just a bug, but what a pain.

It's not something I use constantly but discovering it doesn't work is pretty annoying.

Update: So now it decides it's going to work. Well what can I say ... except maybe that I need to get AFK more often.

I'm totally sick of the upgrade treadmill and feel somewhat annoyed by being forced to install a newer version of Fedora just to get my graphics card working. I had everything working just nicely and was familiar enough with any of the the warts left to not notice them. And now I have to go through all that crap again. The thought that firefox will become 'versionless' horrifies me, as does the love-fest that is HTML5+JavaScript where I will no longer be able to ignore CO2 belching crap like I can now by just disabling flash.

socles demos

I finally got off my fat arse - or is that sat on it further enlargening[sic] it - and tied up some of the test driver code I have for socles into a set of demos.

I also implemented the colour mode for the DCT denoising algorithm. Over-all it's a little slow still - i.e. not fast enough for real-time video. One of these days i'll get around to the complex wavelet version, that should be a lot faster and can also sharpen. I haven't been able to suss out DCT sharpening and so far my attempts add too many artefacts to be useful (i.e. pixel-level chess pattern).

The demos so far are:

AdaptiveBlur: An interactive window that shows an experimental algorithm I came up with some time ago for de-noising. It uses sobel filter to detect edges, then uses that to progressively blend between a blurred and non-burred image. Works ok sometimes.
ConvolveNonSeparable: Simple non-separable convolution that blurs an image.
ConvolveSeparable: Separable convolution to do the same thing (~~and demonstrates the code is broken atm~~ - demo was broken, fixed)
DCT8x8Mono, DCT8x8Colour: Interactive DCT based denoise demo for mono/colour images.
WebcamFX: Another old interactive demo I wrote which uses Video4Linux to access a webcam and apply a bunch of effects including KLT motion detection and viola-jones face detect. It also shows the first half of a low-overhead video display path: the GPU does the colour conversion from raw frames. Well as low as possible with v4l4j anyway.

They're in the soclesdemo sub-module in socles' cvs.

Hmm, another week nearly down. I've been reading lots of papers and trying to suss out some fiddly crap for work, so this stuff has been a nice distraction. That's finally going somewhere so might keep me busy for a bit.

GC, finalisers

So I was doing some memory profiling the other day (using netbeans excellent excellent profiler - boy I could've used this 10 years ago) to try to track down some resource leakages and I noticed that xuggle was really exercising the system heavily.

So it seems I might look at moving to use jjmpeg in my client's application fairly soon. There are some other reasons as well: i.e. not being able to run in a 64-bit JVM on microsoft windows is starting to become a problem, and the bundled ffmpeg is just a bit out of date.

Since I haven't implemented memory handling completely in jjmpeg I went about looking how to do it 'properly'. I was just going to try to use finalisers, but then I came across this article on

~~java finalisers~~

java finalisers which said it probably wasn't a good idea.

I was going to have a short look this morning but suddenly it was 4 hours later and although I had something which works i'm not sure yet that I like it. It seems the cleanest way to implement the suggestions of using weak references, and mixing the auto-generated and hand-crafted code I want, so I will probably end up running with it. The public api didn't need to change.

Previously, the binding worked with an object class hierarchy something like this

 AVNative [
   ByteBuffer p (points to allocated/mapped native memory)
 ]
   +- AVFormatContextAbstract [
    Generated field accessors and native methods
    Most methods are object methods
   ]
    +- AVFormatContext [
      Public factory methods/constructors
      Hand-coded specific methods
      Hand-coded helper native methods
      Hand-coded finalise/dispose methods
    ]

The new structure:

WeakReference<AVObject>
+- AVNative [
   ByteBuffer p pointing to native memory
   internal dispose() method
   weak reference queue/cleanup as from article above
   Weak reference is AVObject
 ]
 +- AVFormatContextNativeAbstract [
   Generated field accessors and native methods
   All methods and field accessors are static
   ]
   +- AVFormatContextNative [
     Hand-coded helper native methods
     Implements native resource dispose
   ]

Together with

AVObject [
  AVNative n (the pointer to the native wrapper object)
  public dispose method
  ]
  +- AVFormatContextAbstract [
      Generated public access methods which use AVFormatContextNative(Abstract) methods.
    ]
    +- AVFormaContext [
      Public factory methods/constructors
      Hand-coded specific methods
      ]

So yeah - a bit more complicated, and it requires 2 objects for each instance (and often 3 including the C side instance it's wrapping), as well as the overhead of the weakreference instance data and the list entry for tracking the references. The extra layer of indirection also adds another method invocation/stack frame to every method call.

On the other hand, it lets the client code use dispose() when it wants to, or if it forgets then dispose will automatically be called eventually. And makes it obvious in the code where dispose needs to sit.

As usual it's a question of trade-offs. If the article is correct then presumably these trade-offs are worth it.

In this case the whole point of using jjmpeg is to avoid numerous allocations every frame anyway: I can allocate working and output buffers once and just use them directly. In this case the actual number of objects is quite small and doesn't happen very often, so I suspect that either mechanism would work about as well as the other.

Well this distraction has blown my morning away; I'd better leave it for now so I can clock up some work hours after lunch.

Update I figured i'd gone too far down this route to do anything other than keep it. I've checked this in now as well as a bunch of other stuff described on the project page. Update 2: Oracle keeps breaking links, but i've updated the pointer. I'm looking at this again (September 2012) because of some issues in jjmpeg.

OpenCL DCT Denoise

I've just checked in an OpenCL implementation of the DCT de-noising algorithm I mentioned previously. I've only done the mono version so far.

It's not terribly fast - 10ms wall-clock for a 512x512 mono image, and given that it requires 64 DCT's per 8x8 block and needs to accumulate the results, it probably never will be.

The kernel source.Update: Colour version implemented now.

Its beaten me. For now.

I should've stayed outside in the sun today gardening - but curiosity got the better of me. I hope the (absolutely stunning) weather continues tomorrow, otherwise i've blown it on nothing ...

I tried working on the AMD performance of the Viola & Jones detector in socles: I tried a whole bunch of stuff, from copying the image tiles pre-scaled (as summed area table) to local memory, to completely re-arranging the data structures so they are workgroup aligned, to even trying the cpu single-thread-per-location version.

I got some minor improvement, the most being the copying the tile to local store and removing some of the calculations (since it doesn't need to scale the rects): but that only took a simple test case from about 25ms to 20ms. Barely really noticeable in my webcam test harness.

I think the problem is with the fact it has to read so much data for each single test. It requires 3-4 uint4's just to describe the test, and 8-12 uint texture lookups for the summed area table lookups. The cascade I have has ~6 400 regions to test grouped in ~3&nbsp000 features, and although most aren't tested it's just a lot of data. It's too much for constant memory for example.

With a fix to use the atomic counters AMD hardware provides at least it's now in the same order of magnitude as the nvidia hardware, but still 2-4x slower.

Maybe ... if the stages were broken up into smaller parts it could work more efficiently, but it does seem a pretty long shot to me as the problem remains with the sheer amount of stuff that needs to be loaded for each test.

Time probably better spent on something else.

About Me

Tags

rmi