About Me
Michael Zucchi
B.E. (Comp. Sys. Eng.)
also known as Zed
to his mates & enemies!
< notzed at gmail >
< fosstodon.org/@notzed >
Super Cereal!
I got way too caught up with writing a new serialiser over the
last couple of days. Actually I finished off another one I had so
I ended up with two.
There are two cases i'm interested in. One is tight coupling
where simplicity and performance outweights extensibility;
basically for IPC. The other is where extensibility and size are
the main considerations; for object serialisation / data storage.
So I have an XDR-like implementation for the former. The layout
of items is the same as XDR (sans mistakes) but it uses native
ordering of elements, so i dubbed it XDRN for xdr-native.
For the latter i have -yes- yet another tagged format. Each field
is tagged and each object is also a tagged container. The header
is at least 2 bytes - a control byte and a tag byte. I can't be
bothered typing out all the details - here is whatI have in the
source code at the moment.
This is a streamable self-describing byte-oriented binary format.
It is a general purpose format and supports a super-set of the
ez_blob descriptor. It supports primitive and struct types and
sequences thereof and there is room for extension.
Each item beings with a descriptor byte, then followed by a tag id,
a possible count, and the payload.
xxxxttcc control byte
xxxx type code
0 uint8 unsigned int, value zero-extended
1 uint16
2 uint32
3 uint64
- reserved
5 float16
6 float32
7 float64
- reserved
f struct
note that for int/float types, (code&3) == log2 of element size in bytes
tt log2 of tag size in bytes
0 1 byte
1 2 byte
2 4 byte
3 reserved, no tag?
cc log2 of count size in bytes, used to indicate sequence length or non-sequence.
0 1 byte
1 2 byte
2 4 byte
3 none, single item follows
ff is struct-end code
A header is a control byte followed by an optional 1/2/4 byte-length tag,
followed by an optional 1/2/4 byte-length count.
A structure payload is a list of tagged fields until a struct-end
code. A structure sequence is a list of count struct-encoded blocks.
Integers can be stored in the smallest number of bytes, i.e. with
all leading $00 bytes removed.
So basically each field has a type, a tag, and a count. Scalar
values are with a special count code so don't require a count
value. It also differentiates between scalars and single-item
sequences. Sequences all have a count and no end sentinal.
It's versatile enough to hold most likely structures but isn't
universal. String encoding is application layer. No 128+bit
primitives (but there is room to add them). No map type, but
there is room to add it (it could just application layer).
Probably the only significant one is a 32-bit limit on sequence
(array) lengths (for some level of significant!). There are only
96+1 valid codes defined now so there is room in a single control
byte for some but not all of these but it's not likely to be as
tidy.
One example: tt+cc only defines 12 codes, one could swap
tt,cc when tt=11 and thus use all codes and support 1/2/4/8 byte
counts with 1/2/4 byte tags.
ttcc
00cc tag size 1, count size 1/2/4/8
01cc tag size 2, count size 1/2/4/8
10cc tag size 4, count size 1/2/4/8
11tt count size 0, tag size 1/2/4
1111 spare (primtive) / sentinal (struct)
Ok, maybe that would work, and it's not really any more complex in
the code. It could use a lookup table but shifts would probably
be faster. And this still leaves room for 8 more data types.
I went through a few similar iterations to get to this point. It
has a couple of noteworthy features.
- write streamable
It doesn't need to calculate information it doesn't know
in advance. For example the size of an encoded object. This
was a mess in my initial attempts and sometimes required
multiple recursive passes.
- self describing / read streamable
To be robust to data format changes it needs to be able to
skip over data it doesn't understand. The tag defines the
field so can be used to identify known ones. The data type
and length fields combine together to define the number of
bytes to skip for unknown fields. An unitendified sequence of
structs must be skipped one at a time, but they provide enough
information to do so.
- compact
Well, relatively compact for the features it provides.
Tags and integers only use the significant bytes. The minimum
overhead for scalar values is 2 bytes per field for control+8
bit tag, which will cover almost everything. The minimum
overhead for sequences if 3 bytes (control, tag, count), and
for structures is also 3 bytes (control, tag, sentinal).
Fields all have default values and such values are simply
not encoded into the byte stream.
I dunno I feel it's a bit over-engineered, but I couldn't see a
way to simplify it as I really need that tag. It takes about 2x
the amount of code to implement vs the xdrn implementation
although a lot of that is mapping to the ez_blob descriptor
tables. As it is a self-describing format it may be useful to
have a map or stream based api too, and an implementation of
either would be straightforward.
Internally both use a common robust i/o mechanism which is simple
and reliable. This helps protect against common coding errors
like buffer under/overruns. I may expose this as an api in
itself.
I'm pretty useless at writing tests (can't be good at everything!)
but I have tried to write a more comprehensive set of tests here.
Particularly if i'm dumping information into a database I don't
want it breaking.
I could've used an existing design, but well, where's the fun in
that?
More developments
libeze and blogz are now on code.zedzone.au.
I still haven't moved zedzone to use it but I probably will this
week.
I've already prototyped all the code required to move blogz to
using lmdb as an index. Conceptually it's very simple, just 3
tables and a few indexes. But there's a lot of boiler-platish code
to map it to tables and enforce referential integrity across
multiple secondary indices.
Although I haven't implemented it yet the tables support the
required meta-data for versioning, renaming, tagging and so on.
It will also support comments, wiki-like, and book-chapter like
articles. So it will be a stepping stone toward that.
And votes.
I'm thinking about another blob serialiser that uses a tagged
format. This is to support simpler schema upgrades should they be
required. On the other hand it may never be necessary and it
start to fall into over-engineering so I need to see how it falls
out once i've gotten there.
A big bloody rant
Ok it's time for another rant. I had a list of things to rant
about and I lost it ... so I created another one. I may remember
on the way.
Children or pearl clutching nannikins may want to look away.
In short; i'm getting pretty sick of junk on the internet, junk in
software development, junk in politics. There's the other crap
like anti-vaxxers and climate change denialists - but well that's
more of a collective fuckup that we'll all have to reckon with in
the end.
Alright where to begin. Firefox. Fuck firefox. Every time it
upgrades it seems to break somefucking thing. Now the preferences
have been lobotomised for no obvious reason: rather than a click
you have to scroll forever to find what you want. And the default
scroll using a wheel-mouse is so frustratingly slow it's a total
headfuck. Why can't they just include a proper javascript and
image blocker in the base fucking application rather than a buggy
plugin (oh no, extension!) that breaks every upgrade. And then
when you run it it wants to run you through a fucking tutorial
about all the great shit it does. Yeah i've been using a browser
for 20 years, they're just not that special you know. The pocket.
I don't even know what it is but I know I don't want it. Fancy
url rendering: look it's just a bit of text, I want to see what it
is, you don't need to hide the http or the www or the com. It
doesn't make it any more usable or more readable. Multi process;
which just means heavy swapping. Javascript is just such a stupid
language the last thing anyone should be doing is encouraging more
of it.
cmake, got what a load of junk. After years of failures I finally
am able to actually compile some projects with cmake out of the
box. But it's still usually fucked and the only solution seems to
be to go edit the usually non-documented application-specific
configuration files.
C++ is such a shit language. Templates are joke. You
have<to::tell::the::compilerExactlyWhat>youMean.EveryTime(YouUseIt(X)).
And it's still uses gobs of memory and runs like a total pig and gives overly verbose and low-information error dumps.
Everyone uses templates because it inlining absolutely everything
makes the fastest and most efficient code. Actually, no, it
really doesn't. Over-inlining starves registers (particularly on
the terminally register-starved x86 platform), blows out the L1
cache, and wastes memory and disk space. Sure you can use it
judiciously but but the time you've done that you may as well have
just use something else.
Python. Spaces are important. Oh no, actually the number of
white-space characters are important. I haven't got time to wait
for the molasses of a cursor to
step-one-space-at-a-fuck-ing-time-to-get-to-where-the-action-is.
I saw a comment on reddit yesterday about how python is so cool
because of the indenting. It really isn't. With C or any other
sane language you can type whatever you want and the compiler can
work it out. The bit that python is missing is you can then
automatically format it in various more readable ways that show
exactly how the compiler will treat it. With python you're stuck
with what you see. Add a block? Now you how to indent everything
else and make sure you do it right, now just add a couple of
braces and hit re-fromat. Ugh.
This isn't even the real problem with python. I recently had to
try to compile pytorch, a 'lightweight' (ROFLMAO what the fuck?)
artificial neural network engine and the abomonation that was once
caffe. The webpage says use anaconda. Ok 4-fucking-gigabytes
later I get something that doesn't work; this version isn't
compiled with the features I need (probably because of fucking
software patents and ffmpeg). So I spent a couple of days trying
to get it to build from source (actually I tried that first but
had to go back to it). Fighting with cmake and whatnot I finally
get it compiled. But then the python doesn't run. Just a
meaningless backtrace (which is pretty much my experience with
anything python). Ok so some package isn't there. Ok so ... one
uses pip or whatever to install it like you might with apt or yum.
Except it isn't really like apt or yum at all. It isn't even like
slackpkg. All it does is download a tar and drop in on the
filesystem. Dependencies? What are they? Versions? Hah, surely
you jest! It's just a bunch of files on the system.
So you run the script again. Again it breaks. Again you install
another package. Then lmdb wont even bloody install because
setup.py is broken. The internets claim this is a pip bug but
then you fuck up and end up with 3 copies of pip installed (/usr,
/usr/local, ~/.local) ... and well you fuck it all off and start
again. Eventually it turns out lmdb is a compile option hidden
somewhere inside the cmake scripts of the fucking custom OpenCV
build you had to compile when you started on this journey a couple
of days ago.
Javascript. I mean I can't even manage to encompass the size of
the fuckup of this conceptually let alone put it into words.
Web page feedback. You go to AMD or wherever to look up some
hardware or software details and before the front page has even
loaded the page goes white and an overlay asks you to provide
feedback for your 'web' experience. Here's some fucking feedback:
/"\
|\./|
| |
| |
|>~<|
| |
/'\| |/'\..
/~\| | | | \
| =[@]= | | \
| | | | | \
| ~ ~ ~ ~ |` )
| /
\ /
\ /
\ _____ /
|--//''`\--|
| (( +==)) |
|--\_|_//--|
Jesus, even that was hard to find: nobody knows what the fuck
ASCII is anymore. It's all windows-cpfuck-you-too or some other
junk.
Highly related is a web-site that keeps asking you to subscribe.
Or keeps giving you 'helpful hints' on how to use it. You know
what, I can read, if it's not obvious it doesn't fucking matter. I
don't need to be stepped in turn through every single feature that
is so unique it's on every other bloody website. I'm sure you're
very proud of getting it to work, well good for you. Now you just
fucked it up by bugging the fuck out of me.
Or bloody achievements. I just don't bother with any forums
anymore but I am on the FSF member forum although it's pretty dead
and I just don't fit in (story of my life). I mean I get a
happy-face-stamp for writing my first post. Ok thanks I guess?
What am i, in grade fucking one? There's enough trouble with the
world infanticising everything, do we really need it on every
bloody forum we visit? I know i know, it drivers user engagement
and all the metrics. Well fuck the metrics.
Speaking of metrics, why the fuck can't the yanks get with the
programme? Even when they do use them they can't spell it
properly. It's a bloody metre, a meter is something with a dial.
Fuck the freedom units.
Modern user interfaces. aka the hamburger menu. aka the stack of
shit menu. Yeah so apparently a small row or column of easly read
text is too much for the kool kidz these days and it's all hip 2D
iconography and animations and shit. It's a shit style that will
age poorly. Every time a site or application updates menu's go
missing or get moved somewhere else. Jesus fucking stick to
something you don't have to actually change this this just for the
sake of it. You know that people will just get used to whatever
you have; the cognitive overload of having to relean it every time
massively increases the real costs of any 'improvement' you might
make.
Glaring white pages. Well this goes back to the original internet
explorer, M$ windows, M$ word (yes I said itt! There's good solid
reasoning behing it!). I regularly spend 12+ hours a day on a
computer, if I was being blasted with that white shit I would be
blinder than I am already. I don't get headaches for a reason.
Constant updates, rolling releases, fluid apis. It's just too
much. Sometimes things are just done and don't need updating
anymore. But if you do that you dissapear off the google search
results. You're no longer part of the zeitgeist! Heavens! Can't
have that! No you need a fancy logo and an annual rebranding
`event' and need never maintain anything again, just throw out a
new release when you've got enough new buttons to make it look
like it matters! Gotta rush to the end of that sprint!
If you can't keep up, well you've just been out-evolved by the
equivalent of a telephone sanitiser.
There was more i'm sure but that'll do for now.
Oh hang on, one more rather important one in my notes. Back to
firefox. I normally run firefox which is finely tuned to be
actually usable. I don't use many plugins - disable javascript,
disable fonts and colours and that's about it. But I customise
the fuck out of everything else: no fucking animations of
smooth-this-or-fucking-that, middle-click to open a url, etc etc.
I even have a userContent.css that hides a whole of shit on sites
I commonly use (like the banners on stackoverflow and so on). But
I tried a naked profile in order to test out the stylesheets i've
worked on for code and docs. Boy what a fucking joke. All the
reminders about shit I don't want. They even fucked up ctrl-tab
so it goes forward but wont return where you came from (isntead it
opens a menu with very different keyboard navigation) - i mean
what's 25 years of convention for anyway?
But the most alarming one was when I went to download my own
fucking sourcecode. Apparently a tar.gz file is now a dangerous
download.
Yeah fuck off mozilla you bunch of stupid cunts.
Now I just feel like shit, this has been really depressing.
Sad face.
blogz 0.3
I did a bit of work on blogz today and pushed out a 0.3 release.
This is not really tested and i'm still not using in on zedzone
but I will be testing it and perhaps do another 0.x release if I
have bugs. But the reason it's out now is that this is the
snapshot base that I will load into git and then work on that from
now on.
If it all works out the next step will be working on using a
database for indices - at the very least it will be some lmdb
thing, but it may go the whole hog and start to look at using crez
for the backend so I can look into some more interesting ideas,
although this is a lot more work. This will also allow me to look
into a comment system eventually.
See the blogz homepage for the
downloads as usual.
L-O-L like anyone gives a flying fuck.
Moar Sauce!
I've now moved my main projects over to using modular Java 11 as
well as using git. I've quite a few projects remaining.
I'm undecided on many of them whether they just go in the bin,
get updated and published (and abandoned) or get worked on
actively again.
-
blogz
is almost ready.
This is mostly a git setup but I'm not actually
using it yet on this site either, although i've synchronised the
code between them now. Once I do both it'll be there.
- playerz is a mess.
This is in the midst of a large amount of changes and I
lost track of where I was at. Its a fairly interesting
little project though so I will get back to it.
- izlib is unfinished
I think I discussed this years ago but basically it's
unpublished and well from unfinished. It's basically an image
processing toolkit and an experimental platform for Java Streams
and API refinement.
I don't really have any plans but I
noticed the other day there's a lot of code just sitting there
rotting away.
- mediaz is archived
mediaz was a project I had on google code which was sort of
a predecessor to izlib as well as a collection of early JavaFX
experiments which are more or less tutorials for others. And
the basic start of a layered bitmap image editor (imagez).
It was about the only project anyone asked about when google
code went down. I have the repository archived so I guess I
could at least convert it.
- termz is unfinished
This was a little fun project using OpenCL to drive a
terminal emulator display. It's pretty much pointless and could
be done directly with OpenGL.
- cdez is rotting
I actually have a C version of dez 1.2 too. It could be
updated and published or something.
- rez and crez are complicated
rez is a project that implements a `personal' versioned
blob store. It supports free branches and cheap copies
(iirc), renames, metadata and all that jazz, and uses dez for
compact storage. Berkely DB (JE) is used as an embedded
database.
This is a project i've worked on and off for years (15?+) with
separate C and Java implementations. The original driver was
for a web CMS. The last time I worked on it was a year ago
porting it to C (including a C versin of dez), for this very
blog - although in the end I stayed with the simple blogz.
It's explicitly not `enterprise' oriented on purpose.
There's probably plenty of alternatives around now and I don't
really know what to do with it. I never quite got the API
down to the point I was really happy with. The C version
should probably use lmdb now.
What I have done actually works though so I should probably
drop it out somewhere. The problems aren't all that
complicated but I think I solved them in a fairly tidy and
compact and reusable way.
- libeze - 2.0
I just need to drop this into git as is.
This was also in the midst of a large number of changes but I
kept them out for the last release. But I have a good number
of bits and pieces I can add to it. playerz is the main
driver here, but that doesn't need much.
- SynthZ - the popular one
I'm not doing anything with this, but it has been
downloaded fairly often. Maybe I will git it.
I did learn some more about SourceDataLine so the soft
keyboard is real-time now!
- wanki is who the hell knows
This was a wiki engine using texinfo markup and with the
abiity to properly organise and export multi-page documents. It's
always been pre-alpha and gone through a number of varients, from
C to JavaEE.
It was the original driver for what became (c)rez.
Who knows, maybe one day, I still think it's got some merit.
Wiki's still have major trouble with multi-page documents
ordered like a book.
- DuskZ is unfinished
An attempt at working on a simple game. I got caught up in
pointless (de)serialisation stuff and other sort of unimportant
details. I'm sorry to the lad that I chatted to about it. He is
a nice fellow. But it was sort of at a bad time personally and I
just dont have the interest to work on it anymore. I hate letting people down.
I don't know if there's really anything salvagable at this
point as all I did was break a bunch of stuff.
- socles is archived
This was an OpenCL image processing library. Nobody cared
and eventually neither did I. There might be something salvagable
for an eventual izlib backend, maybe.
- low level arm code, puppybits, zedos
puppybitz is somewhat stale but maybe there's some stuff useful there.
zedos stopped when I hit the USB driver. Fuck intel for making that junk.
- paralella is DEAD TO ME
The parallella stuff i'm not longer working on and it will
be staying where it is (at least it's published).
I still have some boards but the whole thing soured me on
kickstarter and I will never put money into anything of the
sort again. It seems it's mostly used for these sort of
projects as a 'lets get some zero-cost bridging finance so our
real investors have more confidence' which is pretty much
fucked-up capitalism at it's finest.
Wherein capital
should be the one taking risks to invest in capital to make
more of it. This instead is, well, get the plebs to give us
free money and take all the risk for no return!
- Android apps are going nowhere
Again the source is already out, it's unlikely I will do
more but whether I do will be on a case-by-case basis.
I'm sure there is more - that's just what I found from my archive
of google code and a quick look at what I have sitting on THIS
computer. I've got drives and backups elsehwere, who knows.
Probably if anything I should start checkpointing more often and
dumping shit on code.zedzone.au. Until I had that I didn't
really have anywhere to put the random otherwise not really
publishable-in-themselves experiments which abound.
It will be a while before this list is fully processed.
dez 2.0 for DEZ1 is here!
I just released dez 2.0 for the DEZ1 binary delta format.
Consider this a 1.0 release! The format is now frozen.
It was just going to be 1.3 but I noticed that 1.2 came out 4
years ago so I thought 2.0 was in order. I did prepare 1.3 in the
repository but then I just tagged it again with 2.0.
There aren't really many tunables left so I dropped the flags
field and made the OP_SINGLE_SPLIT value tunable. It doesn't make
much difference anyway.
This is now a modular Java 11 project using my common java.make.
I'm also using it to work on adding junit(4) testing to the
java.make system. I may split out java.make into jni.make,
java.make, junit.make, and so on.
tuning it so haven't rolled it back into the base yet.
As usual the Delta-Z home page
has download links and other
snot. code.zedzone.au
houses a browsable copy of the repository; which now uses git.
q
Update: I decided to update the javadocs, as they are.
The interesting page is
the DEZFormat
class which defines and describes the format.
These initial links may not be long-term stable as I will possibly
look at a unified javadoc build across all my java projects. And
I really need to do something about the blinding default
stylesheet. My eyes! My EYES!
Update: I couldn't put up with it so I fixed the
stylesheet. This time I used some Workbench 2.0 colours but
without the 3D borders. Came out quite nice. I filled out the
api documents somewhat as well.
Source Code Browser
I've just installed a source code browser
on code.zedzone.au
and re-added Code back to the main menu of most pages.
I'm using a slightly hacked up version of gitweb. I needed to add
a bit more structure and style stuff to get the look I wanted and
disable some junk I didn't need. Some more Commodore
colourations for a bit of fun.
This is still work in progress and i'm still trying to decide how
i'll manage the repositories but hey it works. I'll be slowly
migrating all my projects over to it over time. Any java projects
will become modular projects first. I'm not bothering to keep any
history because I don't need it.
libeze 2.0, 1.1
I had a few fixes I did ages ago I thought I should push out.
Mainly some stupid list errors.
I also redid the tree api such that a version bump was required.
The changes are sore of either/or and add an extra argument to
several functions but it was for some consitency with the list api
(no allocations, static init) and I found some intersting uses for
changing the comparison function at search time while working on
playerz.
The software is on the libeze
page as usual. I also added a CHANGES file with the release
notes/changelog.
Weeks ago I also did a lot of other experiments and played with
ideas but they never fully formed. Support for more sophsiticated
serialisation, elf stuff, and lots of experiemnts with memory and
pool allocators. Just can't make a decision on those and I forgot
where they were at so who knows.
I haven't done it for libeze yet but i've started moving my
projects to using git. I don't really like it (the commands are
often obtuse and the documentation isn't user oriented) but CVS
support is starting to wane quite markedly in modern tools. Sigh.
Copyright (C) 2019 Michael Zucchi, All Rights Reserved.
Powered by gcc & me!