About Me
Michael Zucchi
B.E. (Comp. Sys. Eng.)
also known as Zed
to his mates & enemies!
< notzed at gmail >
< fosstodon.org/@notzed >
ZedZone sitemap.xml
I've added a sitemap.xml to the site.
Maybe that'll make google consider adding it to it's index as it's
only indexing 130 odd pages out of 1K+. bing is probably a lost
cause, it's only indexed 10 pages.
It's a
patch to blogz which
generates both the short (id only) and long (date-title) urls as
well as a sitemap for the few plain html articles which mostly
cover the project home pages.
I also noticed some very old url's were being accessed so I added
a redirect from /blog?post=xx to /post/xx. Huh now i think of it
/blog?post=xx was blog by date, but no matter, it's good enough.
Been a bit sick and not really on top of it the last few days.
incremental javac, make
I've been looking into incremental javac compilation again. I had
most of the code for one approach done weeks ago but it never
really got to the point of doing anything useful.
The goal is to simplify a GNU make based Java build system while
ensuring consitent and complete builds.
javac -m <module>
comes very close to what I
want but the main problem is that it doesn't remove stale files.
These come about for the same reasons that might occur with C
development, for example the .java file is renamed or deleted.
But there are many more cases that occur regularly in Java, for
example an inner class or anonymous inner class is removed or
renamed. And in C these aren't such an issue since a link line or
whatever is just going to ignore any stale files anyway but with
Java you can't easily calculate all the possible .class files
(without recompiling the source) so you just grab all the files in
the directory when creating a jar or module, so you don't want
stale ones lying about.
So far i've created a tool called ijavac
that uses
the --module-source-path
only to automatically find
all source files that need recompiling. It optionally supports
per-module mode where it restricts processing to in-module
classes. It also automatically removes all stale files before
they are recompiled. It works by parsing all the existing .class
files, matching them up with their source based
on --module-source-path
and checking timestamps. The
parsed .class files are used to create the full set of down-stream
dependent classes, then match them to the corresponding set of
.java files, and then invoke javac with this set.
In per-module mode it isn't quite as fast as using javac -m, but
it's close and it ensures stale files are removed. Because it's
only performing file-level dependencies it can recompile more than
is necessary. In whole-project mode it depends on what was
changed and how many files could need recompiling. However i'm
not sure I can fit this in with my build system as I want to
support generated files which may require a per-module build
order.
There are also cases where module mode fails, regardless of
whether the stale files are removed or not. For example:
// module a
public class Bob {
int x;
}
// module b
public class Foo {
Bob bob;
int baz() {
return bob.x;
}
}
If x is renamed in class Bob then a per-module rebuild will only
rebuild Bob.class. Subsequently running a dependency-aware module
build on module b will not recompile Foo.
The whole-project mode will catch this case succesfully assuming a
per-module build hasn't already updated Bob.class independently.
Although if you have a deeply dependent object (that is used
widely in a project) it's about the same speed just to delete all
the classes and rebuild them all together.
The main reason is that the per-module mode restricts it's view to
only in-module classes and sources. I guess it should be able to
handle cross-module checks with some additional work.
Another idea i'm toying with is creating a cleanup routine that is
run as a post-process after javac -m
. Becasue this
only needs to match each .class with a .java it doesn't have to
worry about building the whole dependency graph and can get away
with only parsing the Source attribute. I'm not sure why javac -m
doesn't expunge stale files but alas it does not.
I also have the code to generate the module-level dependency lists
which is what would go into a makefile. The makefile wouldn't
track .class files as one would with C.
But for now i'm not sure I really got anywhere so I guess it'll
just go on the backburner again.
Apparently 'best practice' using maven is just to delete and
rebuild every time which is nonsense.
Kinect2 device for FFmpeg, Microsoft Kinect for Windows SDK 2.0
So further on the previous post about Kinect for Microsoft Windows
v2 support in ffmpeg ...
- My patch to ffmpeg-devel was completely ignored. Not even the
courtesy to say it wasn't welcome due to it being obsolete
hardware or any other comment whatsoever.
- My patches to libfreenect2 also seem to have been simply
ignored.
- I couldn't get libfreenect2 to work on microsoft windows 10.
It compiles and in some cases detects the camera but bulk
transfers always timeout. I tried everything!
- Using the libusb.dll in the binary download. This detects
but transfers timeout.
- Using visual studio to compile the libusb.dll. Doesn't
work at all.
- Using visual studio to compile libfreenect2 and a very
basic capture program but uses the libusb.dll from
libfreenect2 distribution. This only works if i put a couple
of Sleep() calls in the right place. Otherwise it also fails.
- Adding Sleep() calls in the ffmpeg module just times out.
- So I wrote a C dll to wrap the microsoft sdk so it could be
called from gcc compiled code as c++ has no ABI so there it can't
call it directly.
- Then I wrote another simple ffmpeg device 'kinect2' which uses
this C dll to capture on a microsoft platform.
- Due to delays in processing this locked up the capture when
trying to record all streams so I needed to use a capture thread
and dump copies of the frames to a limited-size write queue.
This works and is stable but is microsoft platform only. The
colour signal comes through as YUYV so it needs to be recompressed
for saving vs the libfreenect2 indev which supported raw jpeg
capture, so it takes considerably more cpu resources using a
software codec. Also the C dll can only be compiled with visual
studio as the Kinect headers (apart from being an auto-generated
unreadable mess) are are not standard C or c++ compatible. But
ffmpeg is still cross-compiled from a proper development system.
The c++ kinect sdk isn't too bad - apart from all the COM
overheads and unredable headers. The documentation is abysmal.
This is the first time iv'e ever used visual studio for C or c++,
and hopefully the last, it's pretty pants.
Given my shitty experience with ffmpeg-devel I probably wont
bother even submitting a patch for it there. If I decide to
publish it at all I might send a link to ffmpeg-devel just to get
some exposure.
I've found that it's almost impossible to find this site with any
search engine even if you look for something specific so it's
pretty pointless publishing it here anyway. Yesterday I was forced
to login to bing and google's search consoles to find out what the
hell was going on. bing has only indexed about a dozen pages and
google has indexed the site but barely ever shows anything, and
some of what it shows (10+ pages in) is dead links from previous
iterations of the blog software. A few months ago I added jjpeg
and zcl (at least) to the GNU Free Software Directory ... and even
there the jjmpeg entry has been sitting at 'not approved' ever
since and doesn't show the anything unless you dig further.
I've seen people asking on the microsoft developer forums for a
piece of software which does exactly this - mostly from academic
users - but if they can't find it what's the bloody point?
Kinect2 device for FFmpeg, libfreenect2
I've been working on a kinect2 device backend for FFmpeg the last
few days. Actually it's only about a day's work so far and i've
got the code talking to the c++
(sigh) libfreenect2,
building, and most of the glue written - I just haven't tested it
with the hardware yet. I hope it should be quite straightforward
but FFmpeg is a fairly complex library and there are a lot of
details that could be wrong.
One feature it has is that the jpeg frames are not decoded; which
means cheaper recording and no loss of capture quality. I had to
make some minor modifications to libfreenect2 for this to be possible.
It exports 3x streams: the RGB data as jpeg, the IR data as
grey16, and the depth data as grey16. I have options to enable
various subsets of these streams, so for example depth+ir decoding
can be skipped as it requires a good amount of flops. Cameras are
referenced by serial number or index. Device queries work as do
some basic capture settings. I'm also considering other options
which libfreenect2 provides such as streams with the depth/rgb
aligned to each other.
Once I have it at a working state and ported to git head (FFmpeg
git was down when i started working on it) I will see if FFmpeg is
interested in it. The fact that it requires c++ and a patched
libfreenect2 might be a downside but there is already a c++ device
in the source tree. Otherwise i'll just upload it
to code.
This was going to be for work but they decided they'd rather use
some junk matlab (ffs) on a shit platform (fffs) so i'm a little
annoyed about the whole thing. While it should be possible to get
this to work on their chosen shit platform as well it's a bit more
involved.
Parallel Streams, Blocking Queues
I've been using Java Streams a bit to do various bits of work.
One useful feature is the ability to go wide using parallel
streams. But I've often found it doesn't perform all that well.
- It only uses one thread per cpu thread so i/o heavy tasks are
underworking the machine;
- fork/join breaks work up by powers of 2 and not all jobs fit
this very well;
- Although it uses work stealing it is still basically
statically scheduled;
- Overheads.
I have written a Stream Parallel.map(Stream,
Function)
call which wraps a stream mapping process in a
one that farms it out to a pool of threads and recombines it
afterwards. This works well for some tasks particularly as you
can set the number of threads, but it is still quite coarse and
you can't recursively call it (actually you can, it just launches
lots of threads).
Anyway so i'm trying to think of a way to break up multiple levels
of parallel streams into smaller blocks of tasks that can be more
freely scheduled. Whilst trying to fit it within the Stream
processing model which is pull oriented.
I'm still nutting out the details but for now I have written a
lockless, bounded, blocking, priority queue. It only supports a
fixed set of discrete priority levels.
I did some micro-benchmarks against ArrayBlockingQueue (without
priority) and i'm surprised how well it worked - from about 5x to
20x faster depending on the contention level.
Each priority level has it's own lockless queue implemented using
an array and 3 counters. The array is accessed using cyclic
addressing so all operations are O(1).
static class Queue<T> {
volatile int head;
volatile int tail;
volatile int alloc;
final T[] queue;
}
The trick is that it doesn't implement any waiting operations
because it uses external arbitration to avoid the need to.
This makes put()
particularly simple. I'm using
pseudo-code atomics below, but these are implemented using
VarHandles.
void put(T value) {
int a = atomic_inc(alloc);
int b = a + 1;
volatile_set(queue, a & (queue.length-1), value);
while (!atomic_cas(head, a, b))
Thread.onSpinWait();
}
First, the allocation cannot fail and simply assigns a working
slot for the new item. The item is then filled and then the
atomic_cas() (compare-and-set) is used to ensure that the head
pointer is incremented sequentially regardless of the number of
threads which reserved slots.
The poll()
method is slightly more complex.
T poll() {
int h, n, t;
T node;
do {
t = tail;
h = head;
if (h == t)
return null;
node = volatile_get(queue, t & (queue.length - 1));
n = t + 1;
} while (!atomic_cas(tail, t, n));
return node;
}
First it checks it the queue is empty and if so simply exits. It
then takes the current queue tail and then updates the tail
counter. If the tail pointer changed because
another poll()
completed first, then it just retries.
The order of reading the head and tail counters is important here!
If tail is read second it is possible to double-read the same
value.
This isn't a full queue implementation as a number of important features still missing:
- Limiting the number of put()s so that the queue isn't overwritten;
- Blocking on full-write when the queue is full, without busy-waiting;
- Blocking on empty-read when the queue is empty, without busy waiting.
All of these can be almost trivially implemented using a pair of
Semaphores and an atomic integer.
- A sempaphore with (queue.length-1) reservations limits put()
calls. A successful poll releases a reservation.
- The first semaphore does this as well.
- An atomically updated counter and another semaphore is used to
implement wake-up on empty-read.
It's a bit tricky to benchmark and the results are quite noisy
despite setting the CPU to a specific (low) clock speed.
But in general this is around 10x faster than using
ArrayBlockingQueue for "low-contested" situations (6x writers, 6x
readers on a 12x thread cpu). In a "high-contested" situation
(32x writers, 32x readers), it's more like 15-20x faster, and
scales better. Despite tight loops the ArrayBlockingQueue is
unable to saturate the CPU resources (via top) and much of the
time is spent in overhead (?somewhere?). Profiling in netbeans
didn't offer any particular insight on where.
These are of course highly-contrived situations but the
performance was a pleasant surprise. It might not work on systems
with a weaker memory model than AMD-64 but I don't have access to
such exotic systems.
This still doesn't solve the base problem I was working on but
it might be a useful part thereof.
jjmpeg callbacks
Well I went and sorted out the AVIOContext callback code rather
than posting the previous post so there is more of a gap than the
date would imply ...
It turns out my idea to pass a reference to this
to
open2()
was a bit dumb! Because of
course open2()
is the routine which creates the
pointer that this
refers to in the first place.
Wrong!
Anyway the weak references were on the right track, and once I got
it working further testing showed that I had to use a weak
reference for the custom i/o handlers as well. In many cases it
would still have worked but if you created a non-static anonymous
class inside a method which stored the AVIOContext as a member
variable then it would leak all three objects. A non-static
anonymous inner class keeps a reference to this, which keeps a
reference to the AVIOContext which keeps a reference to the
non-static anonymous inner class.
So anyway the C now just uses weak references and the Java has a
holder for the AVIOInterrupt or AVIOHandler object with which it
is associated and some exploratory testing seems to confirm it
works as expected.
I reverted the nativez changes too, the reference holder wasn't a
completely terrible idea but it was duplicating existing JVM
functionality unnecessarily.
jjstuff n stuff
Well i've mostly been busy with work for the last couple of weeks.
I more or less ended up taking a couple of weeks off around Easter
and ANZAC Day so I had to do some more hours. There's no
deadlines at the moment and I was waiting for some resources to
become available so there wasn't much else to do anyway, and
between all the hacking I did I managed to get a little of the
waning sun before winter really hit a week ago.
I did a little bit on jjmpeg over the weekend. Partly as I was
using it to prototype some ideas I was evaluating for work; I will
probably end up doing it in C but its an easy platform to
experiment in.
I did a fair bit of cleanup around JJMediaWriter, although it
still needs work. Video works well but audio is probably broken.
The VideoPlay demo is a little nicer and I moved to the java.time
stuff for date formatting. Fixed a bug with error handling.
Another bug I fixed was with the initialisation of native methods
for functions which don't derive from AVObject. AVObject contains
the code which loads the native library so if you don't use it
first you don't get any native linkage. I hacked a simple
mechanism using a NOP function that seems to work properly. You
can't just load it in another class.
JNI Callbacks
I spent a bit too much time working on filling out AVIOContext to
make it more 'complete'. AVIOContext is used to read/write files
and network connections and allows for user-supplied callbacks to
implement custom i/o. jjmpeg supports both ffmpeg and custom i/o
but there was an updated open2() api for the former which takes an
interrupt-check callback when opening a context. This posed a
problem because for the custom i/o case I had a free
pointer opaque
I use to track a hard reference to the
Java side callbacks but for the ffmpeg i/o case it is used by
ffmpeg.
jjmpeg relies on the object pointers being the actual objects so I
can't easily put this through a proxy object that contains both
pointers. Actually there is only one place it is used but even
then it would be very messy to make it work sanely.
The solution I came up with was to have the Java object retain a
reference via another NativeZ managed object which retains any
number of JNI side references which gets freed JNI side when it
gets cleaned up. I then just poke an instance of this onto the
AVIOContext so it lasts at least as long as the AVIOContext
itself. I put this manager object into NativeZ because it might
be useful elsewhere.
While it should work, it does seem like a bit messier than it
might be. Just now I thought perhaps I could just store the Java
listener in the Java object, but the C callback still needs a
reference to this
to retrieve the listener and it
can't hold a hard reference otherwise the object will leak. But I
just looked up the JNI docs and found a WeakGlobalRef API so that
should probably work. I will try it and if so I will use that
instead and roll back the code I added to NativeZ as well.
Documentation & The Rest
I also started a longer term and possibly pointless exercise of
cleaning up the accessor methods and documentation. The accessor
methods just access the struct fields directly but most fields are
also accessible via the pretty messy libavutil/opt.h interface
which is mirrored in part in AVOptions. Originally I went through
the structures and just inserted the ones that looked like they
might be useful but there's a lot of cruft in there and AVOption
covers all the important onces. So I went throuhg AVCodecContext
and retained the useful looking ones and cleared out the rest. I
also cross-referenced the options table and documented the
corresponding names if they exist.
After that I got the javadocs to build. I had to fix a bunch of
badly formatted comments before I read the javadoc manual and a
bunch of broken comments that get imported by the constant
extractors. It all looks pretty bare though. It will take a lot
of work to make them something worth using. Will see, will see.
I even did a little OpenCL today. Just a tiny bit of code to run
a random fern inferencer. It's been so long since i've written
any it took a few iterations to get things going. Didn't need to
make any changes to zcl though, so that's a plus. But even with
all the niceties of zcl it's still a bit, well, messy. The CLTask
is a good kernel of an idea (pun intended) but there might be ways
to build on that to be something a little more convenient. How
deep I look into this will depend on how much OpenCL I end up
writing but that probably wont be much.
Tutorial
Oh another thing I did over the weekend was some Java tutoring for
a mate. He's doing JavaFX, so yeah it wasn't much effort even
with the hangover I had acquired from the night* before.
I was somewhat surprised at how bad the assignment was though.
For a first assignment the scope wasn't so bad but right from the
start things were a bit strange.
Must be Eclipse, must be JavaFX11. Why JavaFX11? Who knows, it's
not really that different as a toolkit compared to JavaFX8 apart
from being a much bigger pain to install not to mention the
complication with the module stuff. Just what you want for an
introductory course. Eclipse is pretty much total pants too if
you ask me.
There is a page for an eclipse module or plugin or whatever it is
but the 'for the lazy' page is so short it seems to be missing all
the details. And the screenshots are so low resolution you can't
read any of the text. The only remotely likely looking available
option didn't match the description and looked wrong - and it
didn't work when you installed it regardless. We did get it
running but I had to put in JVM arguments and frob around a bit.
Admittedly I need to do that in netbeans too but at least it
supports modules and you don't need to use explicit --add-module
JVM args.
(Being an apple machine i wasn't familair with it's weird
filesystem heirarchy and the funky mess of a thing 'Finder' has
turned into so that didn't help either).
Back to the assignment itself. Rather than a list the
requirements, it stepped you through the parts that needed to be
done. This is after first insisting that it wasn't a guideline of
the steps to follow - even though you couldn't do the assignment
without following them. In the addition to this odd idea, the
formatting was completely miserable, like an email where carriage
returns have been inserted randomly between sentences. What text
there was was also often contradictory, and when it wasn't it
offered questionable solutions to common problems.
For example one paragraph would talk about placing code in a
constructor then the next sentence would say the same thing needed
to go in some other method. In one case this was talking about
the root Application where you pretty much never use the
constructor and you put this stuff in start() so it was just
wrong. In another case it talked about creating a 'XYStage' and
then the code suggested for the constructor is 'stage = new
Stage()'. It even mentioned deriving from Stage although not
consistently! For the rest of the assignment it kept talking
about this stage variable which shouldn't exist. I really don't
like using the word wrong for software since so many ways will
work - poor, snot, junk, totally shithouse, sure all of those, but
there really was a lot of utter wrong here!
Even strange stylistic choices that indicate the author doesn't
write or read any code of any language. Who creates any
rectilinear object like a Window using (sizex,sizey,x,y) syntax
rather than (x,y,width,height)? And lets start out teaching bad
ideas like using undecorated windows so you can implement your own
buggy and non-accessible window manipulation features yourself.
Garish colour suggestions, check! No namespaces. Variables which
change names or types from paragraph to paragraph, go you covered
there too.
Bit bored tonight. I almost did some more work-related stuff but
decided I needed a break from it. Not much else to do though
which is why I posted this. Well it looks like i'm now playing
with jjmpeg for a little while at least.
Minimalistic Perfect Hashing
As a small diversion I played a little bit with minimal hash
functions over the weekend.
The goal here is to convert a static list of words into a token
value which could be used to implement a string switch statement
or other such uses.
The GNU gperf tool is a production-ready general-purpose solution
to this problem but I wanted to see if I could reduce the code
size and try a more naive approach.
The approach is to just perform an exhaustive search over a
limited set of possible functions. The function I am using is a
multiply and shift. The input is an integer formed from the next
1, 2, or 4 bytes. I map these to a power-of-two table and just
find a combination which produces no collisions over the table
size. Then there are a couple of tables used to map these to an
index and to a string which is used to verify the string is the
same and not just the hashcode.
Using the 32 keywords for the C language this is one solution that
takes only a single multiply-shift stage:
#include <string.h>
int hash_keyword(const char *word)
{
static const char *words =
"char\0" "long\0" "register\0" "if\0" "signed\0" "unsigned\0"
"float\0" "switch\0" "int\0" "for\0" "while\0" "volatile\0"
"static\0" "auto\0" "do\0" "union\0" "enum\0" "typedef\0"
"struct\0" "sizeof\0" "const\0" "extern\0" "case\0" "default\0"
"return\0" "break\0" "continue\0" "short\0" "else\0" "void\0"
"double\0" "goto\0";
static const unsigned char index[] =
{ 166, 0, 166, 166, 166, 4, 8, 16, 166, 18, 166, 166, 166, 166,
166, 166, 166, 24, 32, 37, 43, 166, 46, 166, 49, 166, 166, 54, 62, 166, 68, 72, 74, 79, 166,
166, 83, 90, 96, 102, 107, 166, 166, 113, 166, 117, 166, 124, 130, 166, 166, 166, 135, 166, 143,
166, 166, 166, 166, 148, 152, 156, 162, 166, };
static const char value[] =
{ -1, 3, -1, -1, -1, 17, 18, 15, -1, 21, -1, -1, -1, -1, -1, -1,
-1, 28, 12, 25, 16, -1, 13, -1, 31, -1, -1, 30, 23, -1, 0, 7, 27, 10, -1, -1, 26, 24,
22, 4, 11, -1, -1, 2, -1, 6, -1, 19, 1, -1, -1, -1, 5, -1, 20, -1, -1, -1, -1, 9, 29,
8, 14, -1, };
const unsigned char *p = (void *) word;
unsigned int h = 0;
{
unsigned int v = 0;
for (int i = 0; i < 4 && *p; i++)
v = (v << 8) | (*p++);
h ^= (v * 54991) >> 12;
}
h &= 63;
return (strcmp(word, words + index[h]) == 0)
? value[h] : -1;
}
This is not a minimal perfect hash function because the hash size
is 64 for 32 input words, but it does calculate the result using a
single hash table calculation and probe.
Using smaller sample inputs and more steps provides more solutions
but it requires more code at runtime. This one compiles into
.text=87, .rodata=352 bytes.
By comparison equivalent functionality using gperf creates 595
bytes of .text and constants and 720 bytes of .data. The primary
reason it is so much larger is because it stores the word and
index in a structure which will take at least 16 bytes extra per
entry (8 bytes for the string pointer, and 8 bytes alignment
including the index value). This more than offsets the smaller
table it uses.
The runtime should be comparable although being 1/3 the size
should help the cpu caches.
It's slow to search and might not produce a solution but with some
tuning it has worked for a limited number of cases i've tried.
There are some other options it could try:
- Use the word length. This finds more solutions;
- The samples needn't be linearly spaced;
- Other operations apart from multiply-shift;
- Other combining operators than EOR.
I think it's already as minimal as possible in terms of .text and
.data size, at least on amd64.
As a further comparison I compared a smaller problem to using a
linear search. This case was just for 6 strings. Apart from the
hash generator I wrote a cascaded if-else-if and a compacted linear
search where all strings are stored in 1 string with embedded nuls
used to separate them.
.text .(ro)data
phash 79 64
if-else 206 41
linear 74 42
So the only one that's smaller uses atypical coding anyway.
Copyright (C) 2019 Michael Zucchi, All Rights Reserved.
Powered by gcc & me!