faster faster loops

Given i haven't touched opencl for a while I thought i'd stop faffing about with threads and streams and see what this APU can do.

But I found a silly bug in zcl which rendered it broken so just mucked around playing with that and got nowhere with my original aim ...

I call a bunch of JNIEnv *A() functions, these take an array of jvalue's which should presumably be more efficient than walking a varargs list (if insignificantly so). But in an effort to clean up the way I was using it I broke it and hadn't gotten around to actually running anything until now. I will drop another zcl at some point but considering nobody's downloaded it i'm in no rush.

I also worked out some issue with the GPU driver, and possibly slackware. The mandelbrot demo works fine with javafx but other non-gui code just wasn't finding the GPU device. I couldn't work out what was going on but a strange error indicated it was probably some xfce session setup thing. I found an acceptable workaround in just setting export COMPUTE=localhost:0.0.

And then i spent the rest of the evening trying to work out why SVM wouldn't work on the GPU. It "works fine" on the CPU driver but although other operations are fine any kernel invocation leads to an insta-crash. After re-checking every code path i came to the conclusion that it's not the way i'm calling it, it just doesn't want to work.

And just now I tried a stripped down C implementation and it just crashes when I invoke a (do-nothing) kernel with an SVM argument :-/ Blast. I checked the BufferBandwidth sample and once I figured out netbeans was sucking too much for it to run and closed it; it worked fine. After a pretty long look i can't see why it isn't working so i must've done something really silly and small.

One part of svm - the common address pointers - aren't as useful from Java as from C anyway but the ability to share buffers without explicit map/unmap calls in the fine grained case should be, particularly on this APU.

netbeans

Netbeans is really starting to struggle for some reason. I was doing a big cleanup of a prototype which the boss gave to the customer (sigh) and moving lots of code around and suddenly it decided I had no main class and wouldn't even compile. After cleaning caches and other junk it was 'just' a non-obvious parser error. But I still had to resort to emacs+makefile to go through the compile errors one by one until I could get it to run. And the line that got it working in netbeans again was just a reference to a deleted import - i'd moved it to a common library.

But then it just started scanning dozens of files (dozens of times each) every time i switch windows; pausing for 250-1000ms each time. Cleaning the cache made no difference and it's already on an SSD. And it's running out of memory constantly - which messes up the incremental compilation something fierce. The last thing I did was I tried the same thing I did at home and disabled a dozen or so plugins i'll never use; but it didn't make much long-term difference at home so i'm not confident it will at work either.

But opening the zcl projects back up at home has pretty much busted it here - it's constantly running out of memory and taking a second or so to save any file and often not compiling it. I mean, it's actually become unfit for purpose.

Maybe lambda parsing is throwing it for a loop; but why? Any parameter/type matching should be somewhat limited in scope unlike C++.

Faster loops

I've been playing with streams and iterators and stuff again a bit. Although i've found that having a custom calculation loop is pretty good for performance trying to call a (lambda) function for each item can have some fairly large overheads once the jvm decides it wants to de-optimise the calling loop. But the latter is just so convenient to use it's worth a little more effort if it will make a performance difference.

de-optimisation

This stuff is just based on observation and not some internal knowledge of hotspot

So I had another look at trying to optimise the case of processing per-item in a loop with minimal overheads in a way that is practical and efficient. So far i've found the jvm will inline a call inside a loop (depending on size?) if the loop only calls up to two different class types. Unfortunately each new dynamicinvoke counts as a new class type, so for example the first of the following will optimise fine, but the second wont.

  // these will remain optimised
  IntUnaryOperator op = Integer::reverse;
  A.forEach(op);
  A.forEach(op);
  A.forEach(op);

  // the third invocation will cause a de-optimisation,
  //  subsequently all will run as de-optimised
  A.forEach(Integer::reverse);
  A.forEach(Integer::reverse);
  A.forEach(Integer::reverse);

And this applies globally so using a singleton wont get you very far.

So how to address this?

forEacherator

I found that if i used bytecode manipulation and created a new copy-class the optimisation stayed around - because the loop only ever calls one function. So the goal then was to create the simplest class so the overhead of doing this (and the subsequent code) remained small.

After a couple of iterations I settled on the following pair of interface+implementation.

public interface ByteRowFunction {
        public void forEach(byte[] data, int offset, int length);
}

public class ByteRow implements ByteRowFunction {

        private final IntUnaryOperator op;

        public ByteRow(IntUnaryOperator op) {
                this.op = op;
        }

        public void forEach(byte[] data, int offset, int length) {
                for (; length > 0; length--, offset++)
                        data[offset] = (byte) op.applyAsInt(data[offset] & 0xff);
        }
}

This form of for loop is also the fastest I could find, at least with this hotspot on this platform. (i suppose also it's really a map() or apply() call, just read it as the one you prefer).

It still has the same issue as the examples above in that using it with 3 or more different 'op' values will de-optimise it, even if it can be used itself as a singleton itself (since the forEach call is pure).

Class specialisation

So this is where the bytecode manipulation comes in. I use ASM to simply create a new class for each class of op. And the jvm will worry about in-lining the call if it makes sense otherwise.

public static ByteRowFunction of(IntUnaryOperator op) {
    try {
        Class cp = ASMTools.specialisedClass(ByteRow.class.getName(), op);
        
        return (ByteRowFunction) cp.getConstructor(IntUnaryOperator.class).newInstance(op);
    } catch (Exception ex) {
        throw new RuntimeException(ex);
    }
}

The specialisedClass() function simply takes the original class and creates an exact copy but renames it to a unique value tied to op.getClass(). There is an out-of-date example in the ASM FAQ on how to do this but it's pretty easy using ASM. And that's more or less all it takes.

Actually further ... in this case the forEach() call is pure (re-entrant and side-effect free) so the of() function could return a singleton instance as well. But that adds some other mess to gc and so on so probably isn't worth it or even detrimental in the long run; if necessary a caller could keep track of their own.

Results

I did two tests on a (2^25) byte array. The first tries to isolate just the overheads and invokes Integer.hashCode() on each item. The second calls Integer.reverse() which is somewhat slow on x86 without a bitrev instruction (ignoring that this will always result in zero when byte-integer-byte is used).

In each case i'm calling the same thing in 3 different places with 3 different lambdas (`Integer::xx' will create a new dynamicinvoke handle each time) which should trigger a de-optimisation if it's going to.

                                 hashCode        reverse
  new ByteRow().forEach          0.1800000       0.29
  new of().forEach               0.0000700       0.146

  singleton hash of().forEach    0.0000020       0.146
  singleton saved of().forEach   0.0000013       0.146
  for loop                       0.0000010       0.158

This is the best of 5 runs after a couple of warmup laps. Because it's so short the first column results are a bit noisy but the general trend is clear enough and i took some representative values from a few runs.

The first two include (one) object instantiation. The third uses a (non-synchronised) hash table lookup, the fourth just creates an instance and re-uses it. The last is a simple for-loop over the byte array.

It would be handy to see the generated object code but one can guess that the first vs the rest is the difference between an in-line `foo[x] = foo[x]' and a function call `foo[x] = identity(foo[x])'.

Of course a `memcpy' operation isn't much of a test so with something a little more substantial like Integer.reverse() the overheads are only about 100% - which is still pretty much the "doesn't matter" point in most cases but it's there anyway. Oddly enough the for loop loses out here a little bit but that probably just comes down to different micro-optimisations.

The point of this is to save having to type out yet! another! for! loop! and use lambdas but still retain the performance of using a specialised for-loop. The grand prize would be to re-compile the code for SIMD instructions or HSA or OpenCL - i.e. aparapi. But that requires more than just a bit more effort ;-).

I was hoping that the same technique would be applicable to creating optimised spliterators for use with streams, but with the first approach I tried unfortunately by the time the spliterator gets to see the operator it's been wrapped in so many anonymous inner classes and/or dynamicinvoke handles that the compiler doesn't try or can't inline everything. Bummer I guess.

I guess if expose the spliterator boundaries to the pipeline it could work. Instead of creating a stream of integers it could create a stream of batches or rows, and then some helper 'of()' functions could wrap simple per-item calculations into optimised loop running instances whilst still retaining most of the simplicity of use.

  thing.rows().map(ByteRow.ofMap((v) -> itemcalc)). ...;
  thing.rows().flatMap(ByteRow.ofFlatMap((v) -> itemcalc)). ...;
  etc

But i've had enough of this for today. I dunno why i was even on this thing - i had an overly long work week and spent too much time infront of screens as it is. But with a crappy cold day and that sore foot options are limited. Might see if the footy is on, but that channel 7 commentary makes it unbearable.

C?

But just for a ground-check i thought i'd see how C compares. Unfortunately the compiler detects that a bit reverse of a byte will end in zero and optimises it away to a byte-store of 0. Oops. Well i mean it's great that it does that but not really important here. I'm using the same code as in Integer.reverse().

So I changed it to the byte-correct reverse(i)>>24.

                         reverse(i)>>24    ~i (x5 loops)
 new ByteRow().forEach   0.151                   0.037
 for loop                0.147                   0.035

 C call or forEach       0.118                   0.233
 C inline                0.08                    0.096

So yeah it's slower but only about 2x worst case but in the more realistic case were you're not going to include inline implementations of everything it's only ~30% slower.

I also tried a 'not' function and here java pounces on gcc, even the in-line case is 3x slower and via a function call is 6x slower. This is just with -O2 and it is not doing any loop unrolling or simdisation. But -O3 doesn't make much difference. Using -O3 and -mtune=native results in no real difference either although it generates a bunch of SIMD code and unrolls the loop a few times.

The gcc generated code looks ok at a glance - not that i care enough about x86 mc to be able to determine the finer details. Maybe it's an alignment thing or something?

It is still a bit surprising if not very important but is enough to demonstrate C doesn't automatically mean best-of-speed.

post google code post

Well nobody bothered to comment about the stuff i removed from google code apart from the one lad or lass who lamented the loss of some javafx demos.

I had comments open+moderated for a few weeks but got hit by spammers a couple of days ago so had to go back to id+moderated. Maybe something got lost in those 500 bits of snot but i don't think so. The spam was quite strange; most mentioned web sites but didn't provide links or weren't very readable so i'm not sure what the point was. Perhaps they're just fishing for open sites or naive moderators they can then exploit. Like the "windows computer department" that keeps calling and calling hoping i'll not tell them to fuck off every time (sigh, no i don't normally say that although i would tonight).

I've still got the subversion clones but i'm not inclined to do much with any of it for the forseeable future and i'm not even sure if i'm going to continue publishing other bits of code i play with going forward.

Desktop Java, OpenCL, ARM assembly language; these things are just not very common in the Free Software world. Server Java is pretty common but that's just, well, `open sauce' companies sharing costs and not hobbyists. So i think all i'm really doing is providing hints or solutions for some student's homework or help for graduate programmers to keep their jobs. And even then it's so niche it wouldn't be many, if any.

As an example of niche, I was looking up some way to communicate with adobe photoshop that doesn't involve psd format and one thing i came across was someone linking to one of my projects for some unfinished experiments with openraster format - on the first page of results. This happens rarely but still too often. Of course it could just be the search engine trying to be smart and tuning results to the user, which is a somewhat terrifying possibility (implications beyond these types searches of course). FWIW I came to the conclusion photoshop is just one of those proprietary relics from the past which intentionally refuses to support other formats so it's idiot users can continue to be arse-reamed by its inflated price.

It's just a hobby

As a hobby i have no desire to work on larger projects of my own or other established projects in my spare time. Occasionally i'll send in a patch to a project but if they want a bunch of fucking around then yeah, ... naah. In hindsight i somewhat regret how we did it on evolution but i think i've mentioned that before. Neither do i need to solicit work or build a portfolio or just gain experience.

I'm not sure how many hobbyists are around; anyone with remotely close to enough skill seems to be jumping into the wild casinos of app-stores or services and expecting to make billion$ and not just doing it for the fun of it. Some of those left over just seem to be arrogant egotistical fuckwits (and some would probably think the same of me). Same as it ever was I guess.

I suppose I will continue to code-drop even if it's just out of habit.

For another hobby I made kumquat marmalade on the weekend. Spent a couple of hours in the sun slicing the tiny fruit and extracting seeds (2-3 cups worth of seeds) and cooked it the next day. Unfortunately after all that effort it looks like it wasn't cooked quite enough and it probably wont set - it's a bit runny but at least it tastes good. Not sure what i'll do with 2-odd litres of the stuff though.

oh driveclub :(

Ho hum. The last patch gutted some of the driveclub features i most use or enjoy.

The "community challenges" list now only contains 20 entries which don't update very often at all (when one expires?) - and today of those there were 15 fucking drift events and a couple using locked cars (either due to club levels or paid DLC) and that left just a couple of events that I might even remotely be interested in ... but they weren't really my chop either. The early June update seems to have made some changes here. The list no longer seems hard-fixed at 20, and it seems to change a little more often. It is still fairly limited and has an over-representation of drifting (no doubt reflecting the customer base, unfortunately).

I detest drift events - they're not racing they're just wank. I worked out a couple of days ago how to get a half-ok score (turn-handbrake-release-pause-engage, the pause is the trick) to get through the "tour" but it didn't give me any more enjoyment out of it and so it isn't something I would choose to do. The locked events are kinda silly too, why even show em? (f2p hooks i guess).

The most popular challenge races are just single-lap, so they're getting boring. I don't really enjoy most of the faster cars so that pretty much rules out the rest. They seem to be focused on 'farming fame' but i got to level 50 without really trying so why bother?

Before this update I used to scroll along the list - sometimes far along - until i found something that suited my current mood and just took it, regardless of how many other people played it (it was sorted by descending popularity i think). Usually the slower cars and the shorter tracks that I've managed to memorise so far.

So this is a real bummer for me it sucks a lot of the fun out of it. At least an option to filter by event type would be a big plus, let alone by car class as well.

There have been some changes to the way challenges and time trials work too. Maybe it's just a server hiccup (hard to tell with dc) but a) you always have to set one time/score on your own before it even loads another timer up, ~~b) with time trial events it only shows YOUR best time's ghost now - not any other ghost,~~ c) with community time trial events it only ever seems to show your best ghost and the best overall ghost (or maybe it's the initiators lap) - no longer do you see the ghost matching the next-nearest-target time.

a) kinda makes sense and is otherwise neither here nor there and showing the slowest time ever isn't very useful although it can be funny.

~~b) takes most of the fun out of this game mode. Trying to beat your own time can be fun but the other ghosts just make it much much more fun.~~ Looks like it was just the server on the night, seems it's back now although it can take a few laps to show up on the longer tracks. Hoorah!

c) Mostly the same as b) except not showing nearby times is also a pain since i'm just not that good in most cases i usually don't see the other ghost much so you can't see the driving lines taken, etc. It's not a target you can beat so it's not a target to beat, merely an indication of how shit you're doing. TBH I can't remember if it always worked like this ....

So it seems they keep gutting the features to try to get the server code working. Maybe it'll come back but since it's not been fixed after so long it's more likely that it's simply gone for good - and this may not be the end of the cuts.

I'm not that interested in online and i've given up anyway with the shitzbox penetrode sold me; one day i'll try routing through a gnu box but it's going to have to be a long wet weekend for me to get keen enough.

Of course i've got plenty of other games but i just haven't felt like playing any: driveclub is good for a couple of laps as a 'go' which sometimes ends there if i'm not feeling it or sometimes turns into an evening of engaged occupation. And if i'm useless at it it doesn't really matter i can just practice or try a different car and not get "stuck forever". Speaking of cars i've been doing some ferrari km mostly just to get the free one and because i haven't driven then much; but i really don't like the way they drive "looks like a fish, moves like a fish, steers like a cow" is apt.

Too knackered to code and the idea of tv sounds dreadful and so if it wasn't so early (19:30) i'd go to bed and read.

Fixing netbeans' most utterly useless and annoying mis-feature

I finally got sick of hitting escape every time i used code completion to remove the damn tooltip which always shows up - despite having "turned off" tooltips and all the in-line popups i could find to turn off. This is the one that shows a grey box (in my colour scheme - everything is grey) which hovers just above the line your entering and shows a list of stuff I can't read anyway.

So I worked out what the mis-feature was actually called - it's called "show method parameters".

Then I downloaded the netbeans source code (all of it, i used a zip - netbeans-src, not sure if i could've just got the ide module, mercurial looked like it was going to take forever so i didn't bother).

I worked out where the offending bit of snot lived: "editor.completion"

I fixed it:

--- editor.completion/src/org/netbeans/modules/editor/completion/CompletionImpl.java~   2014-11-18 19:07:58.000000000 +1030
+++ editor.completion/src/org/netbeans/modules/editor/completion/CompletionImpl.java    2015-05-02 22:39:20.935817693 +0930
@@ -1271,16 +1271,6 @@
      * May be called from any thread but it will be rescheduled into AWT.
      */
     public void showToolTip() {
-        if (!SwingUtilities.isEventDispatchThread()) {
-            // Re-call this method in AWT if necessary
-            SwingUtilities.invokeLater(new ParamRunnable(ParamRunnable.SHOW_TOOL_TIP));
-            return;
-        }
-
-        if (ensureActiveProviders()) {
-            toolTipCancel();
-            toolTipQuery();
-        }
     }
 
     /**

And then after a few false starts trying to work out how to compile the module, I got it compiled. I put "cluser.config=ide" in nbbuild/user.build.properties and then ran 'ant jar' in the editor.completion directory. I'm not exactly sure what made it work in the end because things sort of failed and then they didn't. It refuses to build on jdk8 (as documented) but i had a jdk7 lying about already. It didn't take very long - i was genuinely impressed.

I found where the generated module went (nbbuild/netbeans/ide/modules/org-netbeans-modules-editor-completion.jar) and then copied that into both the system (/usr/local/netbeans-8.0/ide/modules/) and local (~/.netbeans/8.0/modules) module dir.

And then I started netbeans and checked to see if it took. So far looks ok.

What were they thinking?

It's a pretty strange feature to start with given it's functional overlap with code completion but it's a mystery why it is also turned on automatically every time you do any code completion or parameter lookup.

For starters the code completion window already shows this information - why pop up another less readable tooltip to duplicate it?

The other problem is that it shows the parameter lists for ALL functions of the same name not just the one specified by the current arguments (or even the type of the argument under the cursor). This just makes it a big mess with insufficient local context to make it readable at 'thinking speed'.

And the icing on the cake is that once you've 'done' your completion or parameter lookup task and moved the cursor - it decides it still wants to hang around a bit longer and jumps to the parameters of the function call you're inside of - i.e. if you're in an in-line lambda or a anonymous class definition it will move up to the function call that the lambda or new Thing is a parameter of. i.e. that code you already finished, perhaps months ago.

So ... out comes ESC. And again, and again. Almost EVERY TIME I use code completion. When i'm cutting lots of code this can be up to dozens of times in a given minute. Of course I type ctrl-space to initiate it that often too, but that's something I asked for and not some undecipherable clutter to piss off.

I already have all the popups and tooltips turned off and only call them up explicitly using ctrl-space; it got to the point there was so many little boxes flashing across the screen as i typed it was like standing around while flies keep landing on your face - once or twice is no big deal but constantly it becomes more than just a bit annoying. Swatting flies is a very apt description of using some ide's in their default configs.

That's it, ... for now ...

But given it was so easy to build I may look at another annoying behaviour. The 'show matching parenthesis' function shows the matching parenthesis of any bracket the cursor is merely touching and gives no indication of which side of the bracket the cursor is on. This just makes it harder to work out where the cursor insertion point is when you're otherwise not interested in the matching bracket.

It could just be familiarity (although i only recently turned it on in emacs so it really isn't) or it's lisp implementation but the way emacs works makes it more useful. It will only highlight the matching bracket if the cursor is to the right of a close bracket or sitting on an open bracket (insertion point is to the left). If it's both, it will highlight the opening bracket of a close bracket since that's more useful. emacs uses a block cursor rather than a line which I also find easier to use for fixed-width character editors for a couple of reasons including that it goes hollow instead of vanishing when the window loses focus - which makes it easier to relocate if you're switching between windows on the same screen(s).

stream of abuse

I know it's a "cool new feature" and all, but really guys, think before you put everything in a stream.

I wont link to the article but i came across one that included some code that used streams everywhere for some examples. Apparently it's "obvious" how using streams simplifies the code and so on.

For a start the code isn't really any simpler; it avoids needing to calculate the output size but that isn't a very complicated calculation. Unless all you knew was streams it certainly isn't any more expressive or concise either.

I converted one function directly to arrays and it was shorter and runs at twice the speed for a given test case. It also uses about 1/6th of the memory and roughly 30 000 fewer memory allocations (oh my poor breaking gc, still, java alloc is rather fast isn't it?).

Apart from that ... oh boy it does some bad bad things to a poor innocent atomic counter.

My first thought was "very poor and unscalable use of atomic counter". Then I looked closer.

    // names changed to protect the innocent
    AtomicInteger count=new AtomicInteger();
    int bob[] = foo.stream()
        .map(f->{
            Thing t=list.get(count.getAndIncrement());
            int p0=f.a; int p1=f.b; int p2=f.c;
            int t0=t.a; int t1=t.b; int t2=t.c;
            return IntStream.of(p0, t0, p1, t1, p2, t2);
        }).flatMapToInt(i->i).toArray();

So what this is attempting to do is iterate through two and merge pairs at matching indices into a flattened array of integers.

Except one of those arrays has been forced into a stream (for no reason other than it can be done it seems) - thus losing the position information. So to "recover" this information so that it can be correctly indexed into the other array an atomic counter has been added. But this solution is both dangerous and confusing. It's confusing because it looks like it should be concurrent code - that's exactly what atomic counters are for and also a common desired side-effect of using streams, but this loop is not being invoked in a concurrent context. It's probably there because it was left-over from such an attempt. It's dangerous because it is vanishingly unlikely to actually work if it actually was invoked on a parallel stream because atomic counters by definition just don't provide the required constraints and thus there is no guarantee of obtaining matching pairs. At best it looks like a thoroughly cheeky and worst-practice approach to work around the final rules for lambdas.

Whatever it is, it's sick. Don't ever do this (in any language).

In the source example this is part of a larger function where the previous two loops generate one each of these lists independently and in-order and then after this merge they are simply discarded. i.e. there is not any practical reason for any of this intermediate garbage creation apart from saving a little arithmetic in pre-calculating the required size of the array required. The two loops could be retained and just write interleaved results and without losing the expressiveness of the implementation.

But for arguments sake if you really did want to merge two array lists of a known length to an interleaved integer array, here's one approach that has worked for a few decades and is still about as good as it's going to get:

 int[] bob = new int[foo.size() * 6];
 for (int i=0, j=0; i < foo.size(); i++) {
   Point2D f = foo.get(i), t=list.get(i);
   bob[j++] = f.a; bob[j++] = t.a;
   bob[j++] = f.b; bob[j++] = t.b;
   bob[j++] = f.c; bob[j++] = t.c;   
 }

Lets also say for arguments sake that the stream example did actually work in parallel, and you would gain anything from it (hint: you wont, see the end), so you got that for free right? What about that ancient and daggy old for loop?

 int[] bob = new int[foosize()*6];
 IntStream.range(0, foo.size()).parallel().forEach((i)-> {
   int j = i*6;
   Point2D f = foo.get(i), t=list.get(i);
   bob[j++] = f.a; bob[j++] = t.a;
   bob[j++] = f.b; bob[j++] = t.b;
   bob[j++] = f.c; bob[j++] = t.c;   
 }

Code re-use

One argument for using streams is code-reuse - which falls over once you have side-effects and so the initial example is no better than the last in that respect.

If you really absolutely positively must use streams for this because of whatever reason (there are some legitimate ones): write a proper spliterator and use StreamSupport.stream() to turn it into a stream. It will have to take the two lists in it's constructor and iterate over a container object which holds the matching pair (like Entry<,>).

One will note however that for example the linked list spliterator just breaks the iterable into batches of arrays which are then spliterated over. Whilst this conceptually may seem that it could be a win due to locality of reference and doing the work piece-meal: in reality the spliterators are run to their limit before anything starts for scheduling purposes. So all it's really doing is breaking a single allocation and copy loop into many (sqrt(n)) smaller ones; which is always guaranteed to run slower and use less memory. i.e. you could just flatten both lists to arrays and then use the per-item or array methods above and get the same result - more efficiently - and with less programmer effort.

So whilst streams can save a lot of effort in writing concurrent code; it still many of the same gotchas that can introduce performance side-effects for the ignorant. For example if you're using thread synchronisation primitives at all in any stream processing chain including custom spliterators or collectors then you're literally "doing it wrong" as:

the entire point of the parallel part of the stream framework is to exterminate this unscalable approach

(I thought it needed a bit more emphasis than em/strong/underline could provide :)

EntrySpliterator

I couldn't leave it there so I tried implementing an "Entry" spliterator: one that takes two streams and passes each one into the stream as an Entry.

Because you really don't want to run this on a linked list i forced it to take arraylists. But there are few cases (random deletes or deletes from the head) where you would want to use linked lists and the container approach to linked lists creates pretty shit lists since you lose the ability to delete randomly in O(1). So even when they might be a win for the name of the data structure they often aren't due to the implementation details. But I digress.

So using a spliterator which is under 10 lines of significant code:

    bob = StreamSupport.stream(new SpliteratorEntry<>(
        (ArrayList<Thing>) foo,
        (ArrayList<Thing>) list), false)
    .map(e -> {
        Thing f = e.getKey();
        Thing t = e.getValue();
        int p0 = f.a; int p1 = f.b; int p2 = f.c;
        int t0 = t.a; int t1 = t.b; int t2 = t.c;
        return IntStream.of(p0, t0, p1, t1, p2, t2);
}).flatMapToInt(i -> i).toArray();

It doesn't make any appreciable difference to the running time or to those underwhelming allocation overheads; but at least it now allows for a reusable function and it isn't just plain "wrong".

FWIW making this concurrent just makes it a bit slower while using more cpu cycles and energy. But I could have guessed "not worth it" beforehand since it just isn't doing enough work to compensate for all the overheads on such a small problem (a few thousand items).

80ks!

My shitty scales said i broke through the 80kg floor this morning. It usually wavers a little over a week but the trend has been clear of late so even if it's not stable it's not far off being so.

Once I got to about 81kg I actually started to actually feel lighter, although the belt is the first indicator followed by the mirror. I've got a waist again.

Not sure how much more will go but 10kg from the start of Feb to the end of April is alright and i'm still eating like a chook.

Now if only my foot worked ...

hmmm

Playing with spliterators again.

This was my first go of the row spliterator forEachRemaining() func, and does what I believe the stream api expects:

public void forEachRemaining(Consumer<? super ByteRow> action) {
    for (int y = this.y, ey = y + height; y < ey; y++) {
        ByteRow row = ...;
        ...
        pixels.getRow(y, row);
        action.accept(row);
    }
}

And this is the way I would rather do it, because well any other way is a bit dumb:

public void forEachRemaining(Consumer<? super ByteRow> action) {
    ByteRow row = pixels.createByteRow();
    for (int y = this.y, ey = y + height; y < ey; y++) {
        ...
        pixels.getRow(y, row);
        action.accept(row);
    }
}

Because spliterators are single-threaded this actually works for some (all?) of the stream operations that make sense like "map()" (even multiple stages), "collect()", "forEach()", and a couple more. This is because these (or the parts which take the partial result) are all run on the same context as the spliterator. It wont provide meaningful results for others like "toArray()" or "sort()" but those operations wouldn't even if they did and one could always map to a new copy.

Of course, a different streams implementation could change that but I think it's one of those "i'll fix it if it breaks" things and I suspect it wont ever be able to be altered because someone who pays money to oracle will also think the same thing but wont want to fix it on their purse.

Maybe keeping valid instances around makes more sense for tiles, but if needed a copying collector could do that instead.

Update: Ok I decided not to post this originally because i just wasn't feeling like it, but now here goes, so this is a pre-post "update".

I followed on these ideas and did a bunch of re-arranging and settled on a consistent approach: by default any in the "stream" are just references to per-spliterator instances. If they need to be unique for further prorcessing they can be duplicated by clone() or the like. If it turns out this assumption is not sufficiently useful i can just change the spliterators then.

I did a bit of refactoring to try and save some small common bits of code in the spliterators but it just seemed like fiddling at the edges and adding extra classes for no real benefit.

Then I ran out of steam a bit and haven't really done much else on it.

Been doing some mildly interesting things with javafx though - i was going to say a few days ago how nice it is to work with, but then i hit some exceedingly frustrating problems with very simple layout needs which really gave me the shits. I got it working but damn it took way longer than it should have. I also spent too long trying to work out how to blend against a generated mask before I remembered about using the clip node. It's only a simple compositing task but having it running interactive speed with "so little work" is nice.

About Me

Tags