About Me
Michael Zucchi
B.E. (Comp. Sys. Eng.)
also known as Zed
to his mates & enemies!
< notzed at gmail >
< fosstodon.org/@notzed >
Fast incremental Java builds with openjdk 11 and GNU make
This post wont match the article but I think i've solved all the
main problems needed to make it work. The only thing missing is
ancestor scanning - which isn't trivial but should be
straightforward.
Conceptually it's quite simple and it doesn't take much code but
bloody hell it took a lot of mucking about with make and the javac
TaskListener.
I took the approach I outlined yesterday, I did try to get more
out of the AST but couldn't find the info I needed. The module
system is making navigating source-code a pain in Netbeans (it
wont find modules if the source can't). Some of the 'easy' steps
turned out to be a complete headfuck. Anyway some points of interest.
Dependency Tracking
Even though a java file can create any number of classes one
doesn't need to track any of the non top-level .class files that
might be created for dependency purposes. Any time a .java file
is compiled all of it's generated classes are created at the same
time. So if any java file uses for example a nested class it only
need to track the source file.
I didn't realise this at first and it got messy fast.
Modified Class List
I'm using a javac plugin (-Xplugin) to track the compilation via a
TaskListener. This notifies the plugin of various compilation
stages including generating class files. The painful bit here is
that you don't get information on the actual class files
generated, only on the source file and the name of the class being
generated. And you can't get the actual name of the class file
for anonymous inner classes (it's in the implementation but hidden
from public view). In short it's a bit messy getting a simple and
complete list of every class file generated from every java file
compiled.
But for various other reasons this isn't terribly important so I
just track the toplevel class files; but it was a tedious
discovery process on a very poorly documented api.
When the compiler plugin get the COMPILATION finished event it
uses the information it gathered (and more it discovers) to
generate per-class dependency files similar to `gcc -MD'.
Dependency Generation & Consistency
To find all the (immediate) dependencies the .class file is
processed. The ClassInfo records provide a starting point but all
field and method signatures (descriptors) must be parsed as well.
When an inner class is encountered it's container class is used to
determine if the inner class is still extant in the source code -
if not it can be deleted.
And still this isn't quite enough - if you have a package private
additional class embedded inside the .java file there is no
cross-reference between the two apart from the SourceFile
attribute and implied package path. So to determine if this is
stale one needs to check the Modified Class List instead.
The upshot is that you can't just parse the modified class list
and any inner classes that reference them. I scan a whole package
at a time and then look for anomilies.
One-shot compile
Because invoking the compiler is slow - but also because it will
discover and compile classes as necessary - it's highly beneficial
to run it once only. Unfortunately this is not how make works and
so it needs to be manipulated somewhat. After a few false starts
I found a simple way that works:
The per-module rules are required due to the source-tree naming
conventions used by netbeans (src/[module]/classes/[name] to
build/modules/[module]/[name]), a common-stem based approach is
also possible in which case it wouldn't be required. In practice
it isn't particularly onerous as I use metamake facilities to
generate these per-module rules automatically.
I spent an inordinate amount of time trying to get this to work
but kept hitting puzzling (but documented) behaviour with pattern
and implicit rule chaining and various other issues. One big one was
using concrete rules (made files) for tracking stages, suddenly
everything breaks.
I resorted to just individual java invocations as one would do for
gcc, and trying the compiler server idea to mitigate the costs.
It worked well enough particularly since it parallelises properly.
But after I went to bed I realised i'd fucked up and then spent a
few hours working out a better solution.
Example
This is the prototype i've been using to develop the idea.
modules:=notzed.proto notzed.build
SRCS:=$(shell find src -name '*.java')
CLASSES:=$(foreach mod,$(modules),\
$(patsubst src/$(mod)/classes/%.java,classes/$(mod)/%.class,$(filter src/$(mod)/%,$(SRCS))))
all: $(CLASSES)
lists='$(foreach mod,$(modules),$(wildcard status/$(mod).list))' ; \
built='$(patsubst %.list,%.built,$(foreach mod,$(modules),$(wildcard status/$(mod).list)))' ; \
files='$(addprefix @,$(foreach mod,$(modules),$(wildcard status/$(mod).list)))' ; \
if [ -n "$$built" ] ; then \
javac -Xplugin:javadepend --processor-module-path classes --module-source-path 'src/*/classes' -d classes $$files ; \
touch $$built; \
rm $$lists ; \
else \
echo "All classes up to date" ; \
fi
define makemod=
classes/$1/%.class: src/$1/classes/%.java
$$(file >> status/$1.list,$$<)
$1: $2
if [ -f status/$1.list ] ; then \
javac --module-source-path 'src/*/classes' -d classes @status/$1.list ; \
rm status/$1.list ; \
touch status/$1.built ; \
fi
endef
$(foreach mod,$(modules),$(eval $(call makemod,$(mod),\
$(patsubst src/$(mod)/classes/%.java,classes/$(mod)/%.class,$(filter src/$(mod)/%,$(SRCS))))))
-include $(patsubst classes/%,status/%.d,$(CLASSES))
In addition there is a compiler plugin which is about 500 lines of
standalone java code. This creates the dependency files (included
at the end above) and purges any stale .class files.
I still need to work out a few details with ancestor dependencies
and a few other things.
Java 11 Modules, Building
I'm basically done modularising the code at work - at least the
active code. I rather indulgently took the opportunity to do a
massive cleanup - pretty well all the FIXMEs, should be FIXMEs,
TODOs, dead-code and de-duplication that's collected over the last
few years. Even quite a few 'would be a bit nicers'. It's not
perfect but it was nice to be able to afford the time to do it.
I'm still trying to decide if I break the projects up into related
super-projects or just put everything in the single one as
modules. I'm aiming toward the latter because basically i'm sick
of typing "cd ../blah" so often, and Netbeans doesn't recompile
the dependencies properly.
I'm going to reset the repository and try using git. I don't like
it but I don't much like mercurial either.
Building
At the moment I have a build system which uses make and compiles
at the module level - i.e. any source changes and the whole module
is recompiled, and one can add explicit module-module dependencies
to control the build order and ensure a consistent build.
One reason I do this is because there is no 1:1 correspondance
between build sources and build classes. If you add or remove
nested or anonymous (or co-located) classes from a source file
that adds or removes .class files which are generated. So to
ensure there are no stale classes I just reset it on every build.
This isn't too bad and absolutely guarantees a consistent build
(assuming one configures the inter-module dependencies properly)
but the compiler is still invoked multiple times which has
overheads.
Building Faster
Really the speed isn't a problem for these projects but out of
interest i'm having a look at a couple of other things.
One is relatively simple - basically use JSR-199 to create a
compiler server something like the jdk uses to build itself.
The more complicated task is incremental builds using GNU Make. I
think I should be able to hook into JavacTask and with a small bit
of extra code create something akin to the "gcc -MD" option for
auto-generating dependencies. It has the added complication of
having to detect and remove stale .class files, and doing it all
in a way that make understands. I've already done a few little
experiments today while I was procrastinating over some weeding.
Using JavacTask it is possible to find out all the .class files
that are generated for a given source file. This is one big part
of the puzzle and covers the first-level dependencies
(i.e. %.java: %.class plus all the co-resident classes). One can
also get the AST and other information but that isn't necessary
here.
To find the other dependencies I wrote a simple class file decoder
which finds all the classes referenced by the binary. Some
relatively simple pattern matching and name resolution should be
able to turn this into a dependency list.
Actually it may not be necessary to use JavacTask for this because
the .class files contain enough information. There is extra
overhead because they must be parsed, but they are simple to
parse.
Concurrent Hash Tables
So I got curious about whether the GC in NativeZ would cause any
bottlenecks in highly contested situations - one I already faced
with an earlier iteration of the library. The specific case I had
was running out of memory when many threads were creating
short-lived native objects; the single thread consuming values
from the ReferenceQueue wasn't able to keep up.
The out of memory situation was fairly easily addressed by just
running a ReferenceQueue poll whenever a new object is created,
but I was still curious about the locking overheads.
A field of tables
I made a few variations of hash tables which supported the
interface I desired which was to store a value which also provides
the key itself, as a primitive long. So its more of a set but
with the ability to remove items by the key directly. For now
i'll just summarise them.
- SynchronisedHash
This subclasses HashMap and adds the desired interfaces, each of
which is synchronised to the object.
Rather than use the pointer
directly as the key it is shifted by 4 (all malloc are 16-byte
aligned here) to avoid Long.hasCode() pitfalls on pointers.
- ConcurrentHash
This subclasses ConcurrentHashMap and adds the desired
interfaces, but no locking is required.
The same shift trick is
used for the key as above.
- PointerTable
This was the use-weakreference-as-node implementation
mentioned in the last post - a simple single-threaded chained hash
table implementation. I just synchronised all the entry
points.
A load factor of 2.0 is used for further memory
savings.
- CopyHash
This is a novel(?) approach in which all modifications to
the bucket chains is implemented by copying the whole linked list
of nodes and replacing the table entry with the new list
using compare and set (CAS). A couple of special cases can avoid
any copies.
In the event of a failure it means someone else
updated it so it simply retries. Some logic akin to a simplified
version of what ConcurrentHashMap does is used to resize the table
when needed, potentially concurrently.
It uses a load factor of
1.0.
I also tested a fast-path insert which doesn't try to find
an existing value (this occurs frequently with the design) but
that so overloaded the remove() mechanism it wasn't actually a
good fit!
-
- ArrayHash
This is similar to CopyHash but instead of a linked list of
container nodes it either stores the value directly in the table,
or it stores an array of objects. Again any modifications require
rewriting the table entry. Again resize is handled specially and
all threads can contribute.
This also has a small modification
in that the retry loop includes a ficonacci-increasing
Thread.sleep(0,nanos) delay if it fails which may slow down
wall-clock time but can improve the cpu load.
I had some ideas for other approaches but i've had enough
for now.
Tests
I created a native class which allocates and frees memory of a
given size, and a simple 'work' method which just adds up the
octets within. I ran two tests, one which just allocated 16 bytes
whic immediately went out of scope, and another which allocated
1024 bytes, added them up (a small cpu-bound task), then let it
fall out of scope.
Note that lookups are never tested in this scenario apart from the
implicit lookup during get/put. I did implement get() - which are
always non-blocking in the case of the two latter
implementations (even during a resize), but I didn't test or use
them here.
I then created 8 000 000 objects in a tight loop in one
or more threads. Once the threads finish I invoke System.gc(),
then wait for all objects to be freed. The machine I'm running it
on has 4 cores/8 threads so I tried it with 1 and with 8 threads.
Phew, ok that's a bit of a mouthful that probably doesn't make
sense, bit tired. The numbers below are pretty rough and are from
a single run using openjdk 11+28, with -Xmx1G.
8 threads, alloc 8 threads, alloc+sum 1 thread, alloc 1 thread, alloc+sum
x 1 000 000 x 1 000 000 x 8 000 000 x 8 000 000
Elapsed User Elapsed User Elapsed User Elapsed User
SynchronisedHash 8.3 38 11 54 8.4 27 16 40
ConcurrentHash 6.9 35 11 52 8.2 27 17 38
PointerTable 7.2 26 13 44 7.9 17 19 29
CopyHash 6.6 31 8.2 42 8.1 24 18 33
ArrayHash 6.0 28 8.2 39 8.5 23 16 24
ArrayHash* 6.9 23 12 30 8.1 21 17 23
* indicates using a delay in the retry loop for the remove() call
To be honest I think the only real conclusion is that this machine
doesn't have enough threads for this task to cause a bottleneck in
the hash table! Even in the most contested case (alloc only)
simple synchronised methods are just about as good as anything
else. And whilst this is represenative of a real life scenario, it's
just a bloody pain to properly test high concurrency code and i'm just not that
into it for a hobby (the journey not the destination and so forth).
I haven't shown the numbers above but in the case of the Copy and
Array implementations I count how many retries are required for
both put and get calls. In the 8-thread cases where there is no
explicit delay it can be in the order of a million times! And yet
they still run faster and use less memory. Shrug.
All of the non java.util based implementations also benefit in
both time and space from using the primitive key directly without
boxing and storing the key as part of the containee object and not
having to support the full Collections and Streams interfaces.
PointerTable also benefits from fewer gc passes due to not
needing any container nodes and having a higher load factor.
BTW one might note that this is pretty bloody slow compared to C
or pure Java - there are definitely undeniably high overheads of
the JNI back and forths.
The other thing of note is that well hashtables are hashtables -
they're all pretty good at what they're doing here because of the
algorithm that they share. There's not really all that much practical
difference between any of them.
But why?
I'm not sure what posessed me to look into it this deeply but I've
done it now. Maybe i'll post the code a bit later, it may have
enough bugs to invalidate all the results but it was still a
learning experience.
For the lockless algorithms I made use of VarHandles (Java 9's
'safe' interface to Unsafe) to do the CAS and other operations,
and some basic monitor locking for coordinating a resize pass.
The idea of 'read, do work, fail and retry' is something I
originally learnt about using a hardware feature of the CELL SPU
(and Power based PPU). On that you can reserve a memory location
(128 bytes on the SPU, 4 or 8 on the PPU), and if the ticket is
lost by the time you write back to it the write fails and you know
it failed and can retry. So rather than [spin-lock-doing-nothing
WORK unlock], you spin on [reserve WORK write-or-retry]. I guess
it's a speculative reservation. It's not quite as clean when
using READ work CAS (particularly without the 128 byte blocks on
the SPU!) but the net result is (mostly) the same. One significant
difference is that you actually have to write a different value
back and in this instance being able to merely indicate change
could have been useful.
ConcurrentHashMap does something similar but only for the first
insert into an empty hash chain, after that it locks the entry and
only ever appends to the chain.
Some of the trickiest code was getting the resize mechanism to
synchronise across threads, but I really only threw it together
without much thought using object mointors and AtomicInteger.
Occasionally i'm gettting hangs but they don't appear to make
sense: some number of threads will be blocked waiting to enter a
synchronised method, while a few others are already on a wait()
inside it, all the while another thread is calling it at will -
without blocking. If I get keen i'll revisit this part of the
code.
Other JNI bits/ NativeZ, jjmpeg.
Yesterday I spent a good deal of time continuing to experiment and
tune NativeZ. I also ported the latest version of jjmpeg to a
modularised build and to use NativeZ objects.
Hashing C Pointers
C pointers obtained by malloc are aligned to 16-byte boundaries on
64-bit GNU systems. Thus the lower 4 bits are always zero. Standard
malloc also allocates a contiguous virtual address range which is
extended using sbrk(2) which means the upper bits rarely change. Thus
it is sufficient to generate a hashcode which only takes into account
the lower bits (excluding the first 4).
I did some experimenting with hashing the C pointer values using
various algorithms,
from Knuth's
Magic Number to various integer hashing algorithms
(e.g. hash-prospector),
to Long.hashCode(), to a simple shift (both 64-bit and 32-bit).
The performance analysis was based on Chi-squared distance between
the hash chain lengths and the ideal, using pointers generated
from malloc(N) for different fixed values of N for multiple runs.
Although it wasn't the best statistically, the best performing
algorithm was a simple 32-bit, 4 bit shift due to it's significantly
lower cost. And typically it compared quite well statically
regardless.
static int hashCode(long p) {
return (int)p >>> 4;
}
In the nonsensical event that 28 bits are not sufficient the hash bucket index
it can be extended to 32-bits:
static int hashCode(long p) {
return (int)(p >>>> 4);
}
And despite all the JNI and reflection overheads, using the two-round
function from the hash-prospector project increased raw execution time
by approximately 30% over the trivial hashCode() above.
Whilst it might not be ideal for 8-bit aligned allocations it's
probably not that bad either in practice. One thing I can say for
certain though is NEVER use Long.hashCode() to hash C pointers!
Concurrency
I also tuned the use of synchronisation blocks very slightly to
make critical sections as short as possible whilst maintaining
correct behaviour. This made enough of a difference to be worth
it.
I also tried more complex synchronisation mechanisms
- read-write
locks, hash bucket row-locks and so on, but it was at best a
bit slower than using synchronize{}.
The benchmark I was using wasn't particularly fantastic - just one
thread creating 10^7 `garbage' objects in a tight loop whilst the
cleaner thread freed them. No resolution of exisitng objects, no
multiple threads, and so on. But apart from the allocation rate
it isn't an entirely unrealistic scenario either and i was just
trying to identify raw overheads.
Reflection
I've only started looking at the reflection used for allocating
and releaseing objects on the Java side, and in isolation these
are the highest costs of the implementation.
There are ways to reduce these costs but at the expense of extra
boilerplate (for instantiation) or memory requirements (for
release).
Still ongoing. And whilst the relative cost over C is very high,
the absolute cost is still only a few hundred nanoseconds per
object.
From a few small tests it looks like that maximum i could achieve
is a 30% reduction in object instantiation/finalisation costs, but
I don't think it's worth the effort or overheads.
Makefile foo
I'm still experiemnting with this, I used some macros and implicit
rules to get most things building ok, but i'm not sure if it
couldn't be better. The basic makefile is working ok for
multi-module stuff so I think i'm getting there. Most of the work
is just done by the jdk tools as they handle modules and so on
quite well and mostly dicatate the disk layout.
I've broken jjmpeg into 3 modules - the core, the javafx related
classes and the awt related classes.
GC JNI, HashTables, Memory
I had a very busy week with work working on porting libraries and
applications to Java modules - that wasn't really the busy part, I
also looked into making various implementation's pluggable using
services and then creating various pluggable implementations,
often utilising native code. Just having some (much faster)
implementation of parts also opened other opportunities and it
sort of cascaded from there.
Anyway along the way I revisited my implementation of
Garbage Collection with
JNI and started working on a modular version that can be
shared between libraries without having to copy core object, and
then along the way found bugs and things to improve.
Here are some of the more interesting pieces I found along the way.
JNI call overheads
The way i'm writing jni these days is typically just write the
method signature as if it were a Java method and just mark it
native. Let the jni handle Java to C mappings direclty. This is
different to how I first started doing it and flies in the
convention i've typically seen amongst JNI implementations where
the Java just passes the pointers as a long and has a wrapper
function which resolves these longs as appropriate.
The primary reason is to reduce boilerplate and signficiantly
simplify the Java class writing without having a major impact on
performance. I have done some performance testing before but I
re-ran some tests and they confirm the design decisions used in
zcl for example.
Array Access
First, I tested some mechanisms for accessing arrays. I passed
two arrays to a native function and had it perform various tests:
- No op;
- GetPrimitiveArrayCritical on both arrays;
- GetArrayElements for read-only arrays (call Release(ABORT))
- GetArrayElements for read-only on one array and read-write
on the other (call Release(Abort, Commit));
- GetArrayRegion for read-only, to memory allocated using alloca
- GetArrayRegion and SetArrayRegion for one array, to memory using alloca
- GetArrayRegion for read-only, to memory allocated using malloc
- GetArrayRegion and SetArrayRegion for one array, to memory using malloc
I then ran these tests for different sized float[] arrays, for
1 000 000 iterations, and the results in seconds are below. It's some intel laptop.
NOOP Critical Elements Region/alloca Region/malloc
0 1 2 3 4 5 6 7
1 0.014585537 0.116005779 0.199563981 0.207630731 0.104293268 0.127865782 0.185149189 0.217530639
2 0.013524620 0.118654092 0.201340322 0.209417471 0.104695330 0.129843794 0.193392346 0.216096210
4 0.012828157 0.113974453 0.206195102 0.214937432 0.107255090 0.127068808 0.190165219 0.215024016
8 0.013321001 0.116550424 0.209304277 0.205794572 0.102955338 0.130785133 0.192472825 0.217064583
16 0.013228272 0.116148320 0.207285227 0.211022409 0.106344162 0.139751496 0.196179709 0.222189471
32 0.012778452 0.119130446 0.229446026 0.239275912 0.111609011 0.140076428 0.213169077 0.252453033
64 0.012838540 0.115225274 0.250278658 0.259230054 0.124799171 0.161163577 0.230502836 0.260111468
128 0.014115022 0.120103332 0.264680542 0.282062633 0.139830967 0.182051151 0.250609001 0.297405818
256 0.013412645 0.114502078 0.315914219 0.344503396 0.180337154 0.241485525 0.297850562 0.366212494
512 0.012669807 0.117750316 0.383725378 0.468324904 0.261062826 0.358558946 0.366857041 0.466997977
1024 0.013393850 0.120466096 0.550091063 0.707360155 0.413604094 0.576254053 0.518436072 0.711689270
2048 0.013493996 0.118718871 0.990865614 1.292385065 0.830819392 1.147347700 0.973258653 1.284913436
4096 0.012639675 0.116153318 1.808592969 2.558903773 1.628814486 2.400586604 1.778098089 2.514406096
Some points of note:
- Raw method invocation is around 14 nanoseconds, pretty much
irrelevant once you do any work.
- Get/SetArrayElements is pretty much the same as using
GetSet/ArrayRegion with malloc but with less flexibility.
- For small arrays 2 calls to malloc/free is nearly 50% of the
processing time. Given the gay abandon with which most C
programmers throw these around like they cost nothing, the extra
JNI overhead is modest.
- For larger arrays memcpy time dominates.
- For one way transfers shorter than 64 float using
Get/SetRegion to the stack or pre-allocated memory is the fastest.
- For all other cases including any-sized two-way transfers,
GetPrimitiveArrayCritical is the fastest. But it has other
overheads and isn't always applicable.
I didn't look at ByteBuffer because it doesn't really fit what i'm
doing with these functions.
Anyway - the overheads are unavoidable with JNI but are quite
modest. The function in question does nothing with the data and
so any meaningful operation will quickly dominate the processing
time.
Object Pointer resolution
The next test I did was to compare various mechanisms for
transferring the native C pointer from Java to C.
I created a Native object with two long fields, native final long
p, and native long q.
- No operation;
- C invokes getP() method which returns p;
- C invokes getQ() method which returns q;
- C access to .p field;
- C access to .q field;
- The native signature takes a pointer directly, call it resolving the .p field in the caller;
- The native signature takes a pointer directly, call it resolving the .p field via a wrapper function.
Again invoking it 1 000 000 times.
NOOP getP() getQ() (C).p (C).q (J).p J wrapper
0 1 2 3 4 5 6
0.016606942 0.293797182 0.294253973 0.020146810 0.020154508 0.015827028 0.016979563
- final makes no difference.
- method invocation is 15x slower than a field lookup!
- Field lookups are much slower in C than Java, but the absolute
cost is insignificant at ~2.5nS per lookup.
In short, just passing Java objects directly and having the C
resolve the pointer via a field lookup is slightly slower but
requires much less boilerplate and so is the preferred solution.
Logger
After I sorted out the basic JNI mechanisms I started looking at
the reference tracking implementation (i'll call this NativeZ from
here on).
For debugging and trying to be a more re-usable library I had added
logging to various places in the C code using
Logger.getLogger(tag).fine(String.printf());
It turns out this was really not a wise idea and the logging calls
alone were taking approximately 50% of the total execution time -
versus java to C to java, hashtable lookups and synchronisation
blocks.
Simply changing to use the Supplier versions of the logging
functions approximately doubled the performance.
Logger.getLogger(tag).fine(String.printf());
->
Logger.getLogger(tag).fine(() -> String.printf());
But I also decided to just make including any of the code optional
by bracketing each call to a test against a final static boolean
compile-time constant.
This checking indirectly confirmed that the reflection invocations
aren't particualrly onerous assuming the're doing any work.
HashMap<Long,WeakReference>
Now the other major component of the NativeZ object tracking is
using a hash-table to map C pointers to Java objects. This serves
two important purposes:
- Allows the Java to resolve separate C pointers to the same object;
- Maintains a hard reference to the WeakReference, without
which they just don't work.
For simplicity I just used a HashMap for this purpose. I knew it
wasn't ideal but I did the work to quantify it.
Using jol
and perusing the source I got some numbers for a jvm using
compressed oops and an 8-byte object alignment.
Object | Size |
HashMap.Node | 32 | Used for short hash chains. |
HashMap.TreeNode | 56 | Used for long hash chains. |
Long | 24 | The node key |
CReference | 48 | The node value. Subclass of WeakReference |
Thus the incremental overhead for a single C object is either 104
bytes when a linear hashchain is used, and 128 bytes when a tree
is used.
Actually its a bit more than that because the hashtable (by
default) uses a 75% load factor so also allocates 1.5 pointers for
each object but that's neither here nor there and also a feature
of the algorithm regardless of implementation.
But there are other bigger problems, the Long.hashCode() method just
mixes the low and high words together using xor. If all C
pointers are 8 (or worse, 16) byte aligned you essentially only
get every 8 (or 16) buckets ever in use. So apart from the
wasted buckets the HashMap is very likely to end up using Trees
to store each chain.
So I wrote another hashtable implementation which addresses this
by using the primitive long stored in the CReference directly as
the key, and using the CReference itself as the bucket nodes. I
also used a much better hash function. This reduced the memory
overhead to just the 48 bytes for the CReference plus a (tunable)
overhead for the root table - anywhere from 1/4 to 1 entry per
node works quite well with the improved hash function.
This uses less memory and runs a bit faster - mostly because the
gc is run less often.
notzed.nativez
So i'm still working on wrapping this all up in a module
notzed.nativez which will include the Java base class and a shared
library for other JNI libraries to link to which includes the
(trivial) interface to the NativeZ object and some helpers to help
write small and robust JNI libraries.
And then of course eventually port jjmpeg and zcl to use it.
Bye Bye Jaxby
So one of the biggst changest affecting my projects with Java 11
is the removal of java.xml.bind from the openjdk. This is a bit
of a pain because the main reason I used it was the convenience,
which is a double pain because not only do i have to undo all that
inconvience, all that time using and learning it in the first
place has just been confirmed as wasted.
I tried using the last release as modules but they are
incompatible with the module system because one or two of the
packages are split. I tried just making a module out of them but
couldn't get it to work either. And either i'm really shit at
google-foo or it's just shit but I couldn't for the life of me
find any other reasonable approach so after wasting too much time
on it I bit the bullet and just wrote some SAXParser and
XMLStreamWriter code mandraulically.
Fortunately the xml trees I had made parsing quite simple. First,
none of the element names overlapped so even parsing embedded
structures works without having to keep track of the element
state. Secondly almost all the simple fields were encoded as
attributes rather than elements. So this means almost all objects
can be parsed from the startElement callback, and a single stack
is used to track encapsulated fields. Becuase I use arrays in a
few places a coule of ancilliary lists are used to build them (or
I could just change them to Lists).
It's still tedious and error-prone and a pretty shit indightment on
the state of Java SE in 2018 vs other languages but once it's done
it's done and not having a dependency on half a dozen badly
over-engineered packages means it's only done once and i'm not
wasting my time learning another fucking "framework".
I didn't investigate where javaee is headed - it'll no doubt
eventually solve this problem but removing the dependency from
desktop and command-line tools isn't such a bad thing - there
have to be good reasons it was dropped from JavaSE in the first
place.
One might point to json but that's just as bad to use as a DOM
based mechanism which is also just as tedious and error prone.
json only really works with fully dynamic languages where you
don't have to write any of the field bindings, although there are
still plenty of issues with no canonicalised encoding of things
like empty arrays or null strings. In any event I need file
format compatability so the fact that I also think it's an
unacceptably shit solution is entirely moot.
Modules
By the end of the week i'd modularised my main library and ported
one of the applications that uses it to the new structure. The
application itself also needs quite a bit of modularisation but
that's a job for next week, as is testing and debugging - it runs
but there's a bunch of broken shit.
So using the modules it's actually quite nice - IF you're using
modules all the way down. I didn't have time to look further to
find out if it's just a problem with netbeans but adding jars to
the classpath generally fucks up and it starts adding strange
dependencies to the build. So in a couple of cases I took
existing jars and added a module-info myself. When it works it's
actually really nice - it just works. When it doesn't, well i'm
getting resource path issues in one case.
I also like the fact the tools are the ones dictating the source
and class file structures - not left to 3rd party tools to mess
up.
Unfortunately I suspect modularisation will be a pretty slow-burn
and it will be a while before it benefits the average developer.
Netbeans / CVS
As an update on netbeans I joined the user mailing list and asked
about CVS - apparently it's in the netbeans plugin portal. Except
it isn't, and after providing screenshots of why I would think
that it doesn't exist I simply got ignored.
Yeah ok.
Command line will have to do for me until it decides to show up in
my copy.
Java After Next
So with Oracle loosening the reigns a bit (?) on parts of the java
platform like JavaFX i'm a little concerned about where things
will end up.
Outside of the relatively tight core of SE the java
platform there are some pretty shitty "industry standard" pieces.
ant - it's just a horrible to use tool. So horrible it looks like
they've added javascript to address some of it's issues (oh yay).
maven has a lot of issues beyond just being slow as fuck. The
ease with which it allows one to bloat out dependencies is not a
positive feature.
So yeah, if the "industry" starts dictating things a bit more,
hopefully they wont have a negative impact.
Java Modules
So I might not be giving a shit and doing it for fun but I'm still
looking into it at work.
After a couple of days of experiments and quite a bit of hacking
i've taken most of the libraries I have and re-combined them into
a set of modules. Ostensibly the modules are grouped by
functionality but I also moved a few bits and pieces around for
dependency reasons.
One handy thing is the module-info (along with netbeans) lets you
quickly determine dependencies between modules, so for example
when I wanted to remove java.desktop and javafx from a library I
could easily find the usages. It has made the library slightly
more difficult to use because i've moved some methods to static
functions (and these functions are used a lot in my prototype code
so there's a lot of no-benefit fixing to be done to port it) but
it seems like a reasonable compromise for the first cut. There
may be other approaches using interfaces or subclasses too,
although I tend to think that falls into over-engineering.
Spi
One of the biggest benefits is the service provider mechanism that
enables pluggability by just including modules the path. It's
something I should've looked into earlier rather than the messy
ad-hoc stuff i've been doing but I guess things get done
eventually.
I've probably not done a good job with it yet either but it's a
start and easy to modify. There should be a couple of other
places I can take advantage of it as well.
Redesign
I'm also mid-way through cleaning out a lot of stuff - cut and
paste, newer-better implementations, or just experiments that take
too much code and are rarely used.
I had a lot of stream processing experiements which just ended up
being over-engineered. For example I tried experimenting with
using streams and a Collector to calculate more than just
min/sum/max, instead calculating multi-dimensional statistics
(i.e. all at once) on multi-dimensional data (e.g. image
channels). So I came up with a set of classes (1 to 4
dimensions), collector factories, and so on - it's hundreds of
lines of code (and a lot of bytecode) and I think I only use it in
one or two places in non-performance critical code. So it's going
in the bin and if i do decide to replace it I think I can get by
with at most a single class and a few factory methods.
The NotAnywhereZone
Whilst looking for some info on netbeans+cvs I tried finding my
own posts, and it seems this whole site has effectively vanished
from the internet. Well with a specific search you can still find
blog posts on google, but not using the date-ranges (maybe the
date headers are wrong here). All you can find on duckduckgo is
the site root page.
So if you're reading this, congratulations on not being a spider
bot!
Not Netbeans 9.0, Java 11
Well that effort was short-lived, no CVS plugin anymore.
It's not that hard to live without, just use the command line
and/or emacs, but today i've already wasted enough time trying to
find out if it will ever return (on which question I found no
answer, clear or otherwise).
It was also going to be a bit of a pain translating an existing
project into a 'modular' one, even though from the perspective of
a makefile it's only a couple of small changes.
Copyright (C) 2019 Michael Zucchi, All Rights Reserved.
Powered by gcc & me!