c dez port

I had a couple of hours to burn Sunday morning so I ported over the rest of the dez code to C, although I didn't feel like testing it till I had some hours to burn today.

Anyway, I fixed some bugs and ran some tests. It's only about 30-50% faster than the Java version on the bible test for practical "limit" values. The patches generated aren't necessarily identical because of some minor changes in the hash table design but the differences are minor. The C code also requires some more bounds and error checking for robustness.

I also added CRC32 checksums to the file format as a quick check that the input and output aren't corrupted.

cdez + other stuff

I started porting dez to C to look at using it here somewhere. Along the way I found a bug in the matcher implementation but otherwise got very distracted trying to gain a few neglible percent out of the delta sizes by manipulating the address encoding mechanism.

I tried modifying the matcher in various ways - experimenting with the hash table details. These involved including the hash value (i.e. to reduce spurious string matching - it just slows it down) or using a separate index table (no real difference). Probably the most surprising was that the performance was already somewhat better than covered in the dez benchmarks. Both considerably faster processing and smaller generated deltas. I guess that must have been an earlier implementation and I need to update them. For example the bible compression test only takes 11 seconds and creates a 1 566 019 byte delta - or 65% of the runtime at 90% of the output size.

This insprired me to play with the chain limit tunable - which sets how deep the hashtable chain gets before it starts to throw away older values. Using a setting of 5 (32 depth) it just beats the previous published results but in only 0.7s - still somewhat slower than 0.1 for gzip but at least it's not out of the range of practicality. This is where I found the bug in the entry discard indexing which was an easy fix.

This does mean that the other timings I did are pretty much pointless though - using a larger block search size than 1 just produces so much worse results and it's still slower. I haven't tried with a large source input string however, where a chain limit will truncate the search space prematurely.

Then I spent way too much time and effort trying various address encoding mechanisms to try to squeeze a little bit more out of the algorithm. In the end although I managed to get about 2.5% best case improvement in some cases I doubt it's really worth worrying about. However some of the alternative address encoding schemes are conceptually and mechanically simpler so I might use one of them (and break the file format).

Because of all that faffing about I never really got very far with the cdez conversion although I have the substring matcher basically done which is the more complex part. The encoding/decoding code is quite involved but otherwise straightforward bit bashing.

Update I tried a different test - one where i simulated the total delta size of encoding 180 revisions of jjmpeg development - not a particularly active project but still a real one. The original encoding is easily the best in this case.

bloggone

For some reason the blog went offline for a few hours. It kept getting segfaults in libc somewhere. All I did to fix it was run make install (which simply copied the binary into the cgi directory and didn't rebuild anything) and it started working again. Unfortunately I didn't think to preserve the binary that was there to find out why it stopped working.

Something to keep an eye on anyway.

BDB | !BDB?

I mentioned a few posts ago that there doesn't seem to be many NoSQL databases around anymore - at least last time I looked a year or two ago, all the buzz from a decade ago had gone away. Various libraries became proprietary-commercial or got abandoned.

For some reason I can't remember I went looking for BerkeleyDB alternatives and hit this stackoverflow question which points to some of them.

So I guess I was a little mistaken, there are still a few around, but not all are appropriate for what I want it for:

Unstructured ones are a pain to use;
Many don't do full ACID;
Most don't handle multi-process concurrency; or
Written in exotic languages i'm not interested in having a dependency on.

I guess the best of those is LMDB - i'd come across it whilst using Caffe but never looked into it. Given it's roots in replacing BDB it has enough similarities in API and features to be a good match for what I want (and written in a sane language) although a couple of niggles exist such as the lack of sequences and all the fixed-sized structures (and database size). Being a part of a specific project (OpenLDAP) means it's hit maturity without features that might be useful elsewhere.

The multi-version concurrency control and so on is pretty neat anyway. No transaction logs is a good thing. If I ever get time I might play with those ideas a little in Java - not because I necessarily think it's a great idea but just to see if it's possible. I played with an extensible hash thing for indexing in camel many years ago but it was plagued by durability problems.

Back to LMDB - i'll definitely give it a go for my revisioned database thing - at some point.

https, TLS upgrade

Ahah, so it seems things have changed a bit since last I looked into certificates and certificate authorities - and even then I was looking into code and email signing certs anyway.

After a short poke around I quickly became aware of the Let's Encrypt project which provides automated and free server domain certificates. It can be automated because you control the server and part of the issuing process creates temporary server resources that the signer can cross-check. And all the certs are created locally.

So after a bit of fudging around with the C-based acme client and some apache config I got it all turned on and (compatible) browsers automagically redirecting to the TLS protected url.

Yay.

I didn't want to go with the offical CertBot because python isn't otherwise installed on this server and I didn't want to drag all that snot in for no other reason.

Because the acme-client is a little out of date I had to pass it a few extra parameters to make it create certificates (and had to do some small porting related changes to it using libressl rather than libopenssl).

acme-client \
  -ahttps://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf \
  -C/var/zedzone/acme \
  -vNn \
  zedzone.au www.zedzone.au code.zedzone.au

Once created a daily cron job runs it (without the -vNn options) which requests new certificates if the old ones are within a month of their expirey date (since the Let's Encrypt certificates only last for 90 days).

I then added a https server config:

<VirtualHost www.zedzone.au:443>
    ServerName www.zedzone.au

    ...

    SSLEngine on
    SSLCertificateFile      /etc/ssl/acme/cert.pem
    SSLCertificateKeyFile /etc/ssl/acme/private/privkey.pem
    SSLCertificateChainFile /etc/ssl/acme/fullchain.pem
    SSLUseStapling on

    Header always set Strict-Transport-Security "max-age=31536000"
    Header always set Content-Security-Policy upgrade-insecure-requests
<VirtualHost>

And finally another header to the main server which tells compatible clients to upgrade to use https. This can be a bit odd on the first access but thereafter it does the right thing. I hope!

<VirtualHost www.zedzone.au:80>
    ServerName www.zedzone.au

    ...

    Header always set Content-Security-Policy upgrade-insecure-requests
<VirtualHost>

I didn't want to use a rewrite rule because at the moment I want to keep both url's active, but i might change that in the future. It seems like it might be useful - on the other hand any client anyone is likely to use will support TLS wont it?

~~I've left code.zedzone.au unencrypted for now (even though it's currently the only part of the site that can be logged into!)~~ because I need to check things work with virtual servers on https first and more importantly i'm too hungover to care this fine yet overcast afternoon!

Update: For what it's worth, the server gets an A+ rating on ssllabs SSL Server Test at the time of posting. Although to get the score above B required a few mod_ssl config changes.

Rabbit Holes All The Way Down

I kept poking around the blog code over the last couple of days. It just keeps leading to more and more questions.

DBD

Tuesday I mostly spent re-brushing up on the C api for Berkeley DB and designing the schema to implement my version database using it. At some point since I last looked foreign key constraints must have been added so I implemented that - unfortunately unlike JE they don't support self-referential keys (where a field references the primary key of the same object) so I will have to code up a couple of cases for that manually. Actually i'm not sure I even need fully indexed key constraints as the database is designed never to have deletions. If I ever get that far i'll do some benchmarking to evaluate the tradeoffs, or decide how to do deletions.

During the journey I also discovered that at some point Berkeley DB JE changed licenses again - it had been AGPL3 last time I looked. Now it's changed to Apache. I wonder if this is another project soon to be abandoned to the ASF? Anyway it doesn't make much difference to my Free Software projects (not that I ever got far enough to publish any) but it'll be handy for work as i've wanted to use it pleny of times. It's about the only decent NoSQL DB left these days.

Uploading JavaScript

I pretty much detest JavaScript but I wanted to look at how to write some sort of web-based editor for writing posts and I don't really feel like writing yet another MIME parser to handle multipart/form. Well I probably will have to eventually (or likely dig one of the few i've already written back up) but in the mean-time I investigated direct uploads using XJAX.

Most results from searching turn up JQuery snot but I eventually found some raw JavaScript using XMLHttpRequest directly. Given it's only a few lines of code one has to wonder about these 'frameworks'. I digress. I played around a bit, extended my FastCGI library to support streaming stdin and wrote a basic REST-like `uploader' that can handle binary blobs directly without any messy protocol parsing. Yay. And then I fell down another hole ... how the fuck am I going to do security?

I don't really want to buy an SSL cert for this site but using a self-signed certificate isn't really any good. Without that pretty much any auth system is wildly insecure. I started looking into JavaScript libraries for crypto - some are a little over the top but there are a few smaller ones that might serve the purpose. Crypto has a lot of gotchas and one can't be an expert in everything so i'm not sure I want to start down what would be a very long and winding road just to post to a website.

So i'm toying with a few ideas. First just do nothing, stick to ssh and emacs for posting. If I ever bother with comments or feedback they can be anonymous and not require auth. Or instead of using JavaScript write a standalone Java editor / operator console that calls REST services. Or even using an ssh driven backend. This has some appeal personally but I'll see. Another is to use SSL + Digest Auth - this way I let the browsers and server handle all the complexity and get a mostly ok system. If I install my own CA on my local browser(s) and enforce client certificates from the server side, it should be reasonably secure.

Damn windy road already.

I need a real rest

My sleep has been particularly bad of late. The sleep apnoae is quite bad and I regularly (mostly) forget to wear the mouth splint which doesn't-treat-it-particularly-well-but-it's-better-than-nothing. At least I remembered last night.

Today I gotta try and do some hours for work though. At the moment i'm trying to decipher some statistical software written in matlab, which is about my most favouritist thing in the whole world. Fuck matlab.

Oh, I also bought some mice. I've got a couple of small 'travel mouse' mice that I much prefer to the standard fare and although they used to be easy to find they've become quite scarce around here. What ever happened to BenQ anyway? All the local retail only have microsoft or logitech or their own badged chinese crap now. Coordless also seem to have taken over (higher margins one suspects). I looked everwhere locally and on the usual suspects online but couldn't find anything decent. Oddly enough the ThinkPad one I already have was one of cheapest, and from the source, so I ordered a couple to tide me over for the forseeable future. On a whim I also added a wireless 'laser' one as well, although it's marginally larger.

FastCGI Enabled

Further the previous post I did end up porting my blog driver to my fastcgi implementation.

Benchmarking using `ab' from home it doesn't really make any difference reading the front page of the blog - if anything it's actually marginally slower.

Running the benchmark locally though things are quite different.

Previous standard cgi:

Concurrency Level:      1
Time taken for tests:   13.651 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      37100000 bytes
HTML transferred:       36946000 bytes
Requests per second:    73.25 [#/sec] (mean)
Time per request:       13.651 [ms] (mean)
Time per request:       13.651 [ms] (mean, across all concurrent requests)
Transfer rate:          2654.05 [Kbytes/sec] received

Using fcgi:

Concurrency Level:      1
Time taken for tests:   0.706 seconds
Complete requests:      1000
Failed requests:        0
Total transferred:      37062000 bytes
HTML transferred:       36908000 bytes
Requests per second:    1416.39 [#/sec] (mean)
Time per request:       0.706 [ms] (mean)
Time per request:       0.706 [ms] (mean, across all concurrent requests)
Transfer rate:          51264.00 [Kbytes/sec] received

So yeah, only 20x faster. If I up the concurrency level of the benchmark it gets better but it's hard to tell much from it since everything is running on the same machine.

Regardless, I made it live.

FastCGI experiments

It's not particularly important - i'm lucky to get more than one-non-bot hit in a given day - but I thought i'd have a look into FastCGI. If in the future I do use a database backend or even a Java one it should be an easy way to get some performance while leveraging the simplicity of CGI and leaving the protocol stuff to apache.

After a bit of background reading and looking into some 'simple' implementations I decided to just roll my own. The 'official' fastcgi.com site is no longer live so I didn't think it worth playing with the official sdk. The way it handled stdio just seemed a little odd as well.

With the use of a few GNU libc extensions for stdio (cookie streams) and memory (obstacks) I put together enough of a partial (but robust) implementation to serve output-only pages from the fcgid module in a few hundred lines of code.

This is the public api for it.

struct fcgi_param {
        char *name;
        char *value;
};

struct fcgi {
        // Active during cgi request
        FILE *stdout;
        FILE *stderr;

        // Current request info
        unsigned char rid1, rid0;
        unsigned char flags;
        unsigned char role;

        // Current request params (environment)
        size_t param_length;
        size_t param_size;
        struct fcgi_param *param;
        struct obstack param_stack;

        // Internal buffer stuff
        int fd;
        size_t pos;
        size_t limit;
        size_t buffer_size;
        unsigned char *buffer;
};

typedef int (*fcgi_callback_t)(struct fcgi *, void *);

struct fcgi *fcgi_alloc(void);
void fcgi_free(struct fcgi *cgi);

int fcgi_accept_all(struct fcgi *cgi, fcgi_callback_t cb, void *data);
char *fcgi_getenv(struct fcgi *cgi, const char *name);

I didn't bother to implement concurrent requests, the various access control roles, or STDIN messages. The first doesn't appear to be used by mod_fcgi (it handles concurrency itself) and I don't need the rest (yet at least). As previously stated I used GNU libc extensions to implement custom stdio streams for stdout and stderr, although I used a custom 'zero-copy' buffer implementation for the protocol handling (wherein the calls can access the internal buffer address rather than having to copy data around).

Converting a CGI program is a little more involved than using the original SDK because it doesn't hide the i/o behind macros or use global variables to pass information. Instead via a context-specific handle it provides stdio compatible FILE handles and a separate environmental variable lookup function. Of course it is possible to write a handler callback which can implement such a solution.

The main function of a the fast cgi program just allocates the context, calls accept_all and then free. The callback is invoked for each request and can access stdout/stderr from the context using stdio calls as it wishes.

Apache config

Here's the basic apache config snipped I used to hook it into `/blog' on a server (I did this locally rather than live on this site though).

        ScriptAlias /blog /path/fcgi-test.fcgi

        FcgidCmdOptions /path/fcgi-test MaxProcesses 1

        <Directory "/path">
                AllowOverride None
                Options +ExecCGI
                Require all granted
        </Directory>

Custom streams and cookies

Using a GNU extension it is trivial to hook up custom stdio streams - one gets all the benefits of libc's buffering and formatting and one only has to write a couple of simple callbacks.

#define _GNU_SOURCE

#include <sys/types.h>
#include <sys/uio.h>
#include <stdio.h>
#include <unistd.h>

static ssize_t fcgi_write(void *f, const char *buf, size_t size, int type) {
        struct fcgi *cgi = f;
        size_t sent = 0;
        FCGI_Header header = {
                .version = FCGI_VERSION_1,
                .type = type,
                .requestIdB1 = cgi->rid1,
                .requestIdB0 = cgi->rid0
        };

        while (sent < size) {
                size_t left = size - sent;
                ssize_t res;
                struct iovec iov[2];

                if (left > 65535)
                        left = 65535;

                header.contentLengthB1 = left >> 8;
                header.contentLengthB0 = left & 0xff;

                iov[0].iov_base = &header;
                iov[0].iov_len = sizeof(header);
                iov[1].iov_base = (void *)(buf + sent);
                iov[1].iov_len = left;
                
                res = writev(cgi->fd, iov, 2);
                if (res < 0)
                        return -1;

                sent += left;
        }

        return size;
}

static int fcgi_close(void *f, int type) {
        struct fcgi *cgi = f;
        FCGI_Header header = {
                .version = FCGI_VERSION_1,
                .type = type,
                .requestIdB1 = cgi->rid1,
                .requestIdB0 = cgi->rid0
        };
        if (write(cgi->fd, &header, sizeof(header)) < 0)
                return -1;
        return 0;
}

Well perhaps the callbacks are more `straightforward' than simple in this case. FastCGI has a payload limit of 64K so any larger writes need to be broken up into parts. I use writev to write the header and content directly from the library buffer in a single system call (a pretty insignificant performance improvment in this case but one nonetheless). I might need to handle partial writes but this works so far - in which case the writev approach gets too complicated to bother with.

The actual 'cookie' callbacks just invoke the functions above with the FCGI channel to write to.

  
static ssize_t fcgi_stdout_write(void *f, const char *buf, size_t size) {
        return fcgi_write(f, buf, size, FCGI_STDOUT);
}

static int fcgi_stdout_close(void *f) {
        return fcgi_close(f, FCGI_STDOUT);
}

const static cookie_io_functions_t fcgi_stdout = {
        .read = NULL,
        .write = fcgi_stdout_write,
        .seek = NULL,
        .close = fcgi_stdout_close
};

And opening a custom stream is as as simple as opening a regular file.

static int fcgi_begin(struct fcgi *cgi) {
        cgi->stdout = fopencookie(cgi, "w", fcgi_stdout);

        ...;

        return 0;
}

Example

Here's a basic example that just dumps all the parameters to the client. It also maintains a count to demonstrate that it's persistent.

I went with a callback mechanism rather than the polling mechanism of the original SDK mostly to simplify managing state. Shrug.

#include "fcgi.h"

static int cgi_func(struct fcgi *cgi, void *data) {
        static int count;

        fprintf(cgi->stdout, "Content-Type: text/plain\n\n");
        fprintf(cgi->stdout, "Request %d\n", count++);
        fprintf(cgi->stdout, "Parameters\n");
        for (int i=0;i<cgi->param_length;i++)
                fprintf(cgi->stdout, " %s=%s\n", cgi->param[i].name, cgi->param[i].value);

        return 0;
}

int main(int argc, char **argv) {
        struct fcgi * cgi = fcgi_alloc();
        
        fcgi_accept_all(cgi, cgi_func, NULL);
        
        fcgi_free(cgi);
}

Notes

I haven't worked out how to get the CGI script to 'exit' when the MaxRequestsPerProcess limit has been reached without causing service pauses. Whether I do nothing or whether I exit and close the socket at the right time it still pauses the next request for 1-4 seconds.

I haven't converted my blog driver to use it yet - maybe later on tonight if I keep poking at it.

Oh and it is quite fast, even with a trivial C program.

Versioning DB

Well I don't have any code ready yet but between falling asleep today i did a little more work on a versioned data-store i've been working on ... for years, a couple of decades infact.

In it's current iteration it utilises just 3 core (simple) relational(-like) tables and can support both SVN style branches (lightweight renames) and CVS style branches (even lighter weight). Like SVN it uses global revisions and transactions but like CVS it works on a version tree rather than a path tree, but both approaches are possible within the same small library.

Together with dez it allows for compact reverse delta storage.

Originally I started in C but i've been working Java since - but i'm looking at back-porting the latest design to C again for some performance comparisons. It's always used Berkeley DB (JE for Java) as storage although I did experiment with using a SQL version in the past.

My renewed interest is that the goal is eventually run this site with it as a backing storage - for code, documentation, musings. e.g. the ability to branch a document tree for versioning and yet have it served live from common storage. This was essentially the reason I started investigating the project many years ago but never quite got there. I'm pretty sure I've got a solid schema but still need to solidify the API around a few basic use-cases before I move forward.

The last time I touched the code as 2 years ago, and the last time I did any significant code on it was 3 years ago, so it's definitely a slow burner project!

Well, more when I have more to say.

About Me

Tags