Super Cereal!

I got way too caught up with writing a new serialiser over the last couple of days. Actually I finished off another one I had so I ended up with two.

There are two cases i'm interested in. One is tight coupling where simplicity and performance outweights extensibility; basically for IPC. The other is where extensibility and size are the main considerations; for object serialisation / data storage.

So I have an XDR-like implementation for the former. The layout of items is the same as XDR (sans mistakes) but it uses native ordering of elements, so i dubbed it XDRN for xdr-native.

For the latter i have -yes- yet another tagged format. Each field is tagged and each object is also a tagged container. The header is at least 2 bytes - a control byte and a tag byte. I can't be bothered typing out all the details - here is whatI have in the source code at the moment.

      This is a streamable self-describing byte-oriented binary format.
      It is a general purpose format and supports a super-set of the
      ez_blob descriptor.  It supports primitive and struct types and
      sequences thereof and there is room for extension.

      Each item beings with a descriptor byte, then followed by a tag id,
      a possible count, and the payload.

      xxxxttcc control byte

      xxxx type code

      0 uint8       unsigned int, value zero-extended
      1 uint16
      2 uint32
      3 uint64
      - reserved
      5 float16
      6 float32
      7 float64
      - reserved
      f struct

      note that for int/float types, (code&3) == log2 of element size in bytes

      tt log2 of tag size in bytes

      0 1 byte
      1 2 byte
      2 4 byte
      3 reserved, no tag?

      cc log2 of count size in bytes, used to indicate sequence length or non-sequence.

      0 1 byte
      1 2 byte
      2 4 byte
      3 none, single item follows

      ff is struct-end code

      A header is a control byte followed by an optional 1/2/4 byte-length tag,
      followed by an optional 1/2/4 byte-length count.

      A structure payload is a list of tagged fields until a struct-end
      code.  A structure sequence is a list of count struct-encoded blocks.

      Integers can be stored in the smallest number of bytes, i.e. with
      all leading $00 bytes removed.

So basically each field has a type, a tag, and a count. Scalar values are with a special count code so don't require a count value. It also differentiates between scalars and single-item sequences. Sequences all have a count and no end sentinal.

It's versatile enough to hold most likely structures but isn't universal. String encoding is application layer. No 128+bit primitives (but there is room to add them). No map type, but there is room to add it (it could just application layer). Probably the only significant one is a 32-bit limit on sequence (array) lengths (for some level of significant!). There are only 96+1 valid codes defined now so there is room in a single control byte for some but not all of these but it's not likely to be as tidy.

One example: tt+cc only defines 12 codes, one could swap tt,cc when tt=11 and thus use all codes and support 1/2/4/8 byte counts with 1/2/4 byte tags.

    ttcc

    00cc   tag size 1, count size 1/2/4/8
    01cc   tag size 2, count size 1/2/4/8
    10cc   tag size 4, count size 1/2/4/8
    11tt   count size 0, tag size 1/2/4
    1111   spare (primtive) / sentinal (struct)

Ok, maybe that would work, and it's not really any more complex in the code. It could use a lookup table but shifts would probably be faster. And this still leaves room for 8 more data types.

I went through a few similar iterations to get to this point. It has a couple of noteworthy features.

write streamable

It doesn't need to calculate information it doesn't know in advance. For example the size of an encoded object. This was a mess in my initial attempts and sometimes required multiple recursive passes.

self describing / read streamable

To be robust to data format changes it needs to be able to skip over data it doesn't understand. The tag defines the field so can be used to identify known ones. The data type and length fields combine together to define the number of bytes to skip for unknown fields. An unitendified sequence of structs must be skipped one at a time, but they provide enough information to do so.

compact

Well, relatively compact for the features it provides.

Tags and integers only use the significant bytes. The minimum overhead for scalar values is 2 bytes per field for control+8 bit tag, which will cover almost everything. The minimum overhead for sequences if 3 bytes (control, tag, count), and for structures is also 3 bytes (control, tag, sentinal).

Fields all have default values and such values are simply not encoded into the byte stream.

I dunno I feel it's a bit over-engineered, but I couldn't see a way to simplify it as I really need that tag. It takes about 2x the amount of code to implement vs the xdrn implementation although a lot of that is mapping to the ez_blob descriptor tables. As it is a self-describing format it may be useful to have a map or stream based api too, and an implementation of either would be straightforward.

Internally both use a common robust i/o mechanism which is simple and reliable. This helps protect against common coding errors like buffer under/overruns. I may expose this as an api in itself.

I'm pretty useless at writing tests (can't be good at everything!) but I have tried to write a more comprehensive set of tests here. Particularly if i'm dumping information into a database I don't want it breaking.

I could've used an existing design, but well, where's the fun in that?

More developments

libeze and blogz are now on code.zedzone.au.

I still haven't moved zedzone to use it but I probably will this week.

I've already prototyped all the code required to move blogz to using lmdb as an index. Conceptually it's very simple, just 3 tables and a few indexes. But there's a lot of boiler-platish code to map it to tables and enforce referential integrity across multiple secondary indices.

Although I haven't implemented it yet the tables support the required meta-data for versioning, renaming, tagging and so on. It will also support comments, wiki-like, and book-chapter like articles. So it will be a stepping stone toward that.

And votes.

I'm thinking about another blob serialiser that uses a tagged format. This is to support simpler schema upgrades should they be required. On the other hand it may never be necessary and it start to fall into over-engineering so I need to see how it falls out once i've gotten there.

A big bloody rant

Ok it's time for another rant. I had a list of things to rant about and I lost it ... so I created another one. I may remember on the way.

Children or pearl clutching nannikins may want to look away.

In short; i'm getting pretty sick of junk on the internet, junk in software development, junk in politics. There's the other crap like anti-vaxxers and climate change denialists - but well that's more of a collective fuckup that we'll all have to reckon with in the end.

Alright where to begin. Firefox. Fuck firefox. Every time it upgrades it seems to break somefucking thing. Now the preferences have been lobotomised for no obvious reason: rather than a click you have to scroll forever to find what you want. And the default scroll using a wheel-mouse is so frustratingly slow it's a total headfuck. Why can't they just include a proper javascript and image blocker in the base fucking application rather than a buggy plugin (oh no, extension!) that breaks every upgrade. And then when you run it it wants to run you through a fucking tutorial about all the great shit it does. Yeah i've been using a browser for 20 years, they're just not that special you know. The pocket. I don't even know what it is but I know I don't want it. Fancy url rendering: look it's just a bit of text, I want to see what it is, you don't need to hide the http or the www or the com. It doesn't make it any more usable or more readable. Multi process; which just means heavy swapping. Javascript is just such a stupid language the last thing anyone should be doing is encouraging more of it.

cmake, got what a load of junk. After years of failures I finally am able to actually compile some projects with cmake out of the box. But it's still usually fucked and the only solution seems to be to go edit the usually non-documented application-specific configuration files.

C++ is such a shit language. Templates are joke. You have<to::tell::the::compilerExactlyWhat>youMean.EveryTime(YouUseIt(X)). And it's still uses gobs of memory and runs like a total pig and gives overly verbose and low-information error dumps. Everyone uses templates because it inlining absolutely everything makes the fastest and most efficient code. Actually, no, it really doesn't. Over-inlining starves registers (particularly on the terminally register-starved x86 platform), blows out the L1 cache, and wastes memory and disk space. Sure you can use it judiciously but but the time you've done that you may as well have just use something else.

Python. Spaces are important. Oh no, actually the number of white-space characters are important. I haven't got time to wait for the molasses of a cursor to step-one-space-at-a-fuck-ing-time-to-get-to-where-the-action-is. I saw a comment on reddit yesterday about how python is so cool because of the indenting. It really isn't. With C or any other sane language you can type whatever you want and the compiler can work it out. The bit that python is missing is you can then automatically format it in various more readable ways that show exactly how the compiler will treat it. With python you're stuck with what you see. Add a block? Now you how to indent everything else and make sure you do it right, now just add a couple of braces and hit re-fromat. Ugh.

This isn't even the real problem with python. I recently had to try to compile pytorch, a 'lightweight' (ROFLMAO what the fuck?) artificial neural network engine and the abomonation that was once caffe. The webpage says use anaconda. Ok 4-fucking-gigabytes later I get something that doesn't work; this version isn't compiled with the features I need (probably because of fucking software patents and ffmpeg). So I spent a couple of days trying to get it to build from source (actually I tried that first but had to go back to it). Fighting with cmake and whatnot I finally get it compiled. But then the python doesn't run. Just a meaningless backtrace (which is pretty much my experience with anything python). Ok so some package isn't there. Ok so ... one uses pip or whatever to install it like you might with apt or yum. Except it isn't really like apt or yum at all. It isn't even like slackpkg. All it does is download a tar and drop in on the filesystem. Dependencies? What are they? Versions? Hah, surely you jest! It's just a bunch of files on the system.

So you run the script again. Again it breaks. Again you install another package. Then lmdb wont even bloody install because setup.py is broken. The internets claim this is a pip bug but then you fuck up and end up with 3 copies of pip installed (/usr, /usr/local, ~/.local) ... and well you fuck it all off and start again. Eventually it turns out lmdb is a compile option hidden somewhere inside the cmake scripts of the fucking custom OpenCV build you had to compile when you started on this journey a couple of days ago.

Javascript. I mean I can't even manage to encompass the size of the fuckup of this conceptually let alone put it into words.

Web page feedback. You go to AMD or wherever to look up some hardware or software details and before the front page has even loaded the page goes white and an overlay asks you to provide feedback for your 'web' experience. Here's some fucking feedback:

                     /"\
                    |\./|
                    |   |
                    |   |
                    |>~<|
                    |   |
                 /'\|   |/'\..
             /~\|   |   |   | \
            |   =[@]=   |   |  \
            |   |   |   |   |   \
            | ~   ~   ~   ~ |`   )
            |                   /
             \                 /
              \               /
               \    _____    /
                |--//''`\--|
                | (( +==)) |
                |--\_|_//--|

Jesus, even that was hard to find: nobody knows what the fuck ASCII is anymore. It's all windows-cpfuck-you-too or some other junk.

Highly related is a web-site that keeps asking you to subscribe. Or keeps giving you 'helpful hints' on how to use it. You know what, I can read, if it's not obvious it doesn't fucking matter. I don't need to be stepped in turn through every single feature that is so unique it's on every other bloody website. I'm sure you're very proud of getting it to work, well good for you. Now you just fucked it up by bugging the fuck out of me.

Or bloody achievements. I just don't bother with any forums anymore but I am on the FSF member forum although it's pretty dead and I just don't fit in (story of my life). I mean I get a happy-face-stamp for writing my first post. Ok thanks I guess? What am i, in grade fucking one? There's enough trouble with the world infanticising everything, do we really need it on every bloody forum we visit? I know i know, it drivers user engagement and all the metrics. Well fuck the metrics.

Speaking of metrics, why the fuck can't the yanks get with the programme? Even when they do use them they can't spell it properly. It's a bloody metre, a meter is something with a dial. Fuck the freedom units.

Modern user interfaces. aka the hamburger menu. aka the stack of shit menu. Yeah so apparently a small row or column of easly read text is too much for the kool kidz these days and it's all hip 2D iconography and animations and shit. It's a shit style that will age poorly. Every time a site or application updates menu's go missing or get moved somewhere else. Jesus fucking stick to something you don't have to actually change this this just for the sake of it. You know that people will just get used to whatever you have; the cognitive overload of having to relean it every time massively increases the real costs of any 'improvement' you might make.

Glaring white pages. Well this goes back to the original internet explorer, M$ windows, M$ word (yes I said itt! There's good solid reasoning behing it!). I regularly spend 12+ hours a day on a computer, if I was being blasted with that white shit I would be blinder than I am already. I don't get headaches for a reason.

Constant updates, rolling releases, fluid apis. It's just too much. Sometimes things are just done and don't need updating anymore. But if you do that you dissapear off the google search results. You're no longer part of the zeitgeist! Heavens! Can't have that! No you need a fancy logo and an annual rebranding `event' and need never maintain anything again, just throw out a new release when you've got enough new buttons to make it look like it matters! Gotta rush to the end of that sprint!

If you can't keep up, well you've just been out-evolved by the equivalent of a telephone sanitiser.

There was more i'm sure but that'll do for now.

Oh hang on, one more rather important one in my notes. Back to firefox. I normally run firefox which is finely tuned to be actually usable. I don't use many plugins - disable javascript, disable fonts and colours and that's about it. But I customise the fuck out of everything else: no fucking animations of smooth-this-or-fucking-that, middle-click to open a url, etc etc. I even have a userContent.css that hides a whole of shit on sites I commonly use (like the banners on stackoverflow and so on). But I tried a naked profile in order to test out the stylesheets i've worked on for code and docs. Boy what a fucking joke. All the reminders about shit I don't want. They even fucked up ctrl-tab so it goes forward but wont return where you came from (isntead it opens a menu with very different keyboard navigation) - i mean what's 25 years of convention for anyway?

But the most alarming one was when I went to download my own fucking sourcecode. Apparently a tar.gz file is now a dangerous download.

Yeah fuck off mozilla you bunch of stupid cunts.

Now I just feel like shit, this has been really depressing.

Sad face.

blogz 0.3

I did a bit of work on blogz today and pushed out a 0.3 release. This is not really tested and i'm still not using in on zedzone but I will be testing it and perhaps do another 0.x release if I have bugs. But the reason it's out now is that this is the snapshot base that I will load into git and then work on that from now on.

If it all works out the next step will be working on using a database for indices - at the very least it will be some lmdb thing, but it may go the whole hog and start to look at using crez for the backend so I can look into some more interesting ideas, although this is a lot more work. This will also allow me to look into a comment system eventually.

See the blogz homepage for the downloads as usual.

L-O-L like anyone gives a flying fuck.

Moar Sauce!

I've now moved my main projects over to using modular Java 11 as well as using git. I've quite a few projects remaining.

I'm undecided on many of them whether they just go in the bin, get updated and published (and abandoned) or get worked on actively again.

blogz is almost ready.

This is mostly a git setup but I'm not actually using it yet on this site either, although i've synchronised the code between them now. Once I do both it'll be there.

playerz is a mess.

This is in the midst of a large amount of changes and I lost track of where I was at. Its a fairly interesting little project though so I will get back to it.

izlib is unfinished

I think I discussed this years ago but basically it's unpublished and well from unfinished. It's basically an image processing toolkit and an experimental platform for Java Streams and API refinement.

I don't really have any plans but I noticed the other day there's a lot of code just sitting there rotting away.

mediaz is archived

mediaz was a project I had on google code which was sort of a predecessor to izlib as well as a collection of early JavaFX experiments which are more or less tutorials for others. And the basic start of a layered bitmap image editor (imagez).

It was about the only project anyone asked about when google code went down. I have the repository archived so I guess I could at least convert it.

termz is unfinished

This was a little fun project using OpenCL to drive a terminal emulator display. It's pretty much pointless and could be done directly with OpenGL.

cdez is rotting

I actually have a C version of dez 1.2 too. It could be updated and published or something.

rez and crez are complicated

rez is a project that implements a `personal' versioned blob store. It supports free branches and cheap copies (iirc), renames, metadata and all that jazz, and uses dez for compact storage. Berkely DB (JE) is used as an embedded database.

This is a project i've worked on and off for years (15?+) with separate C and Java implementations. The original driver was for a web CMS. The last time I worked on it was a year ago porting it to C (including a C versin of dez), for this very blog - although in the end I stayed with the simple blogz.

It's explicitly not `enterprise' oriented on purpose.

There's probably plenty of alternatives around now and I don't really know what to do with it. I never quite got the API down to the point I was really happy with. The C version should probably use lmdb now.

What I have done actually works though so I should probably drop it out somewhere. The problems aren't all that complicated but I think I solved them in a fairly tidy and compact and reusable way.

libeze - 2.0

I just need to drop this into git as is.

This was also in the midst of a large number of changes but I kept them out for the last release. But I have a good number of bits and pieces I can add to it. playerz is the main driver here, but that doesn't need much.

SynthZ - the popular one

I'm not doing anything with this, but it has been downloaded fairly often. Maybe I will git it.

I did learn some more about SourceDataLine so the soft keyboard is real-time now!

wanki is who the hell knows

This was a wiki engine using texinfo markup and with the abiity to properly organise and export multi-page documents. It's always been pre-alpha and gone through a number of varients, from C to JavaEE.

It was the original driver for what became (c)rez.

Who knows, maybe one day, I still think it's got some merit. Wiki's still have major trouble with multi-page documents ordered like a book.

DuskZ is unfinished

An attempt at working on a simple game. I got caught up in pointless (de)serialisation stuff and other sort of unimportant details. I'm sorry to the lad that I chatted to about it. He is a nice fellow. But it was sort of at a bad time personally and I just dont have the interest to work on it anymore. I hate letting people down.

I don't know if there's really anything salvagable at this point as all I did was break a bunch of stuff.

socles is archived

This was an OpenCL image processing library. Nobody cared and eventually neither did I. There might be something salvagable for an eventual izlib backend, maybe.

low level arm code, puppybits, zedos

puppybitz is somewhat stale but maybe there's some stuff useful there.

zedos stopped when I hit the USB driver. Fuck intel for making that junk.

paralella is DEAD TO ME

The parallella stuff i'm not longer working on and it will be staying where it is (at least it's published).

I still have some boards but the whole thing soured me on kickstarter and I will never put money into anything of the sort again. It seems it's mostly used for these sort of projects as a 'lets get some zero-cost bridging finance so our real investors have more confidence' which is pretty much fucked-up capitalism at it's finest.

Wherein capital should be the one taking risks to invest in capital to make more of it. This instead is, well, get the plebs to give us free money and take all the risk for no return!

Android apps are going nowhere

Again the source is already out, it's unlikely I will do more but whether I do will be on a case-by-case basis.

I'm sure there is more - that's just what I found from my archive of google code and a quick look at what I have sitting on THIS computer. I've got drives and backups elsehwere, who knows.

Probably if anything I should start checkpointing more often and dumping shit on code.zedzone.au. Until I had that I didn't really have anywhere to put the random otherwise not really publishable-in-themselves experiments which abound.

It will be a while before this list is fully processed.

dez 2.0 for DEZ1 is here!

I just released dez 2.0 for the DEZ1 binary delta format.

Consider this a 1.0 release! The format is now frozen.

It was just going to be 1.3 but I noticed that 1.2 came out 4 years ago so I thought 2.0 was in order. I did prepare 1.3 in the repository but then I just tagged it again with 2.0.

There aren't really many tunables left so I dropped the flags field and made the OP_SINGLE_SPLIT value tunable. It doesn't make much difference anyway.

This is now a modular Java 11 project using my common java.make. I'm also using it to work on adding junit(4) testing to the java.make system. I may split out java.make into jni.make, java.make, junit.make, and so on. tuning it so haven't rolled it back into the base yet.

As usual the Delta-Z home page has download links and other snot. code.zedzone.au houses a browsable copy of the repository; which now uses git. q

Update: I decided to update the javadocs, as they are. The interesting page is the DEZFormat class which defines and describes the format.

These initial links may not be long-term stable as I will possibly look at a unified javadoc build across all my java projects. And I really need to do something about the blinding default stylesheet. My eyes! My EYES!

Update: I couldn't put up with it so I fixed the stylesheet. This time I used some Workbench 2.0 colours but without the 3D borders. Came out quite nice. I filled out the api documents somewhat as well.

Source Code Browser

I've just installed a source code browser on code.zedzone.au and re-added Code back to the main menu of most pages.

I'm using a slightly hacked up version of gitweb. I needed to add a bit more structure and style stuff to get the look I wanted and disable some junk I didn't need. Some more Commodore colourations for a bit of fun.

This is still work in progress and i'm still trying to decide how i'll manage the repositories but hey it works. I'll be slowly migrating all my projects over to it over time. Any java projects will become modular projects first. I'm not bothering to keep any history because I don't need it.

libeze 2.0, 1.1

I had a few fixes I did ages ago I thought I should push out. Mainly some stupid list errors.

I also redid the tree api such that a version bump was required. The changes are sore of either/or and add an extra argument to several functions but it was for some consitency with the list api (no allocations, static init) and I found some intersting uses for changing the comparison function at search time while working on playerz.

The software is on the libeze page as usual. I also added a CHANGES file with the release notes/changelog.

Weeks ago I also did a lot of other experiments and played with ideas but they never fully formed. Support for more sophsiticated serialisation, elf stuff, and lots of experiemnts with memory and pool allocators. Just can't make a decision on those and I forgot where they were at so who knows.

I haven't done it for libeze yet but i've started moving my projects to using git. I don't really like it (the commands are often obtuse and the documentation isn't user oriented) but CVS support is starting to wane quite markedly in modern tools. Sigh.

About Me

Tags