About Me
Michael Zucchi
B.E. (Comp. Sys. Eng.)
also known as Zed
to his mates & enemies!
< notzed at gmail >
< fosstodon.org/@notzed >
github and m$
I only had a couple of long abandoned projects on github but now
i've deleted my account. I don't see the immediate reason why m$
would want to buy it but it can't be for a good one for anyone else.
I wonder if they'd have bought if it git had the same meaning in
american as it does in english - i.e. bastard, fuckwit, etc.
But anyway I guess it's just as well I didn't move anything there
when google code shut down, saves me the hassle of doing it again.
Backend stuff
Winter has hit here and along with insomnia i'm not really feeling
like doing much of an evening but i've dabbled a few times and
basically ported the Java version of a tree-revision database to
C.
At this point i've just got the core done -
schema/bindings and most of the client api. I'm pretty sure it's
solid but I need to write a lot of testing and validation code to
make sure it will be reliable and performant enough, and then
write a bunch more to turn it into something interesting.
But i've been at a desk for 10 hours straight and my feet are icy
cold so it's not happening tonight.
Evolution and S/MIME
So I noticed there was a S/MIME security fault in a bunch of email
software - including Evolution.
Now my memory is a bit faded because it was 15+ years ago but I'm
pretty sure we wrote the code to handle this case (mostly Larry
and Jeff). For this each decoded segment was displayed separately
with a special gtkhtml tag to reset the html parser between
blocks. Although it might have only been on the signature level
so I could be wrong but in general it didn't just dump the whole
email to HTML for all sorts of reasons. The MIME parser could
handle all sorts of broken streams so truncated HTML was expected
to come up once in a while.
Of course that must've all been thrown away when the renderer was
replaced by the 'better' renderer from apple going by some of the
reports of the 'vulnerability'.
Not that i've ever used S/MIME or gpg - it's pretty much useless
to me since nobody I know knows how to use it and hardly anyone
uses email these days anyway.
I was also horrified to see that evolution now uses cmake. Well
just as well I completely ignored the project after I took a
voluntary redundancy ... I would've gone absolutely ballistic!
Not that compiling with libtool didn't suck complete arse but at
least it worked.
But GNOME was already going to shit back before I quit, both due
to redhat throwing their weight around and Miguel being such an
obnoxiously microsoft fanboi. Haven't touched it in any
meaningful way (or Evolution) in over a decade and all I see of it
is going backwards by continously copying the next shitty
GUI-trend-of-the-month and/or being bullied into shitty designs by
a bunch of fuckwits.
Oops
Had a bug in my fastcgi code, that broke the blog for some web
clients depending on their ID string. It just happened to break
on mobile phones more often. Oops.
King PUSS
Some photos of the cat.
He's a bit of a pretty-boy but he's smarter than he looks.
Ostensibly his name is Cooper (as in Cooper's Original Pale Ale).
But I just call him cat.
c dez port
I had a couple of hours to burn Sunday morning so I ported over
the rest of the dez code to C, although I didn't feel like testing
it till I had some hours to burn today.
Anyway, I fixed some bugs and ran some tests. It's only about
30-50% faster than the Java version on the bible test for
practical "limit" values. The patches generated aren't
necessarily identical because of some minor changes in the hash
table design but the differences are minor. The C code also
requires some more bounds and error checking for robustness.
I also added CRC32 checksums to the file format as a quick check
that the input and output aren't corrupted.
cdez + other stuff
I started porting dez to C to look
at using it here somewhere. Along the way I found a bug in the
matcher implementation but otherwise got very distracted trying to
gain a few neglible percent out of the delta sizes by manipulating
the address encoding mechanism.
I tried modifying the matcher in various ways - experimenting with
the hash table details. These involved including the hash value
(i.e. to reduce spurious string matching - it just slows it down) or
using a separate index table (no real difference). Probably the
most surprising was that the performance was already somewhat better
than covered in the dez benchmarks. Both considerably faster
processing and smaller generated deltas. I guess that must have
been an earlier implementation and I need to update them. For
example the bible compression test only takes 11 seconds and creates
a 1 566 019 byte delta - or 65% of the runtime at 90% of
the output size.
This insprired me to play with the chain limit
tunable - which sets how deep the hashtable chain gets before it
starts to throw away older values. Using a setting of 5 (32
depth) it just beats the previous published results but in only
0.7s - still somewhat slower than 0.1 for gzip but at least it's
not out of the range of practicality. This is where I found the
bug in the entry discard indexing which was an easy fix.
This does mean that the other timings I did are pretty much
pointless though - using a larger block search size than 1 just
produces so much worse results and it's still slower. I haven't
tried with a large source input string however, where a chain limit
will truncate the search space prematurely.
Then I spent way too much time and effort trying various address
encoding mechanisms to try to squeeze a little bit more out of the
algorithm. In the end although I managed to get about 2.5% best
case improvement in some cases I doubt it's really worth worrying
about. However some of the alternative address encoding schemes are
conceptually and mechanically simpler so I might use one of them
(and break the file format).
Because of all that faffing about I never really got very far with
the cdez conversion although I have the substring matcher
basically done which is the more complex part. The
encoding/decoding code is quite involved but otherwise
straightforward bit bashing.
Update I tried a different test - one where i simulated the
total delta size of encoding 180 revisions of jjmpeg development -
not a particularly active project but still a real one. The
original encoding is easily the best in this case.
bloggone
For some reason the blog went offline for a few hours. It kept
getting segfaults in libc somewhere. All I did to fix it was
run make install
(which simply copied the binary into
the cgi directory and didn't rebuild anything) and it started
working again. Unfortunately I didn't think to preserve the binary
that was there to find out why it stopped working.
Something to keep an eye on anyway.
BDB | !BDB?
I mentioned a few posts ago that there doesn't seem to be many
NoSQL databases around anymore - at least last time I looked a
year or two ago, all the buzz from a decade ago had gone away.
Various libraries became proprietary-commercial or got abandoned.
For some reason I can't remember I went looking for BerkeleyDB
alternatives and
hit this
stackoverflow question which points to some of them.
So I guess I was a little mistaken, there are still a few around,
but not all are appropriate for what I want it for:
- Unstructured ones are a pain to use;
- Many don't do full ACID;
- Most don't handle multi-process concurrency; or
- Written in exotic languages i'm not interested in having a
dependency on.
I guess the best of those is LMDB - i'd come across it whilst
using Caffe but never looked into it. Given it's roots in
replacing BDB it has enough similarities in API and features to be
a good match for what I want (and written in a sane language)
although a couple of niggles exist such as the lack of sequences
and all the fixed-sized structures (and database size). Being a
part of a specific project (OpenLDAP) means it's hit maturity
without features that might be useful elsewhere.
The multi-version concurrency control and so on is pretty neat
anyway. No transaction logs is a good thing. If I ever get time
I might play with those ideas a little in Java - not because I
necessarily think it's a great idea but just to see if it's
possible. I played with an extensible hash thing for indexing in
camel many years ago but it was plagued by durability problems.
Back to LMDB - i'll definitely give it a go for my revisioned
database thing - at some point.
Copyright (C) 2019 Michael Zucchi, All Rights Reserved.
Powered by gcc & me!