About Me

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as Zed
  to his mates & enemies!

notzed at gmail >
fosstodon.org/@notzed >

Tags

android (44)
beagle (63)
biographical (104)
blogz (9)
business (1)
code (77)
compilerz (1)
cooking (31)
dez (7)
dusk (31)
esp32 (4)
extensionz (1)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (459)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (231)
java ee (3)
javafx (49)
jjmpeg (81)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (10)
opencl (120)
os (17)
panamaz (5)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
players (1)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
vulkan (3)
wanki (3)
workshop (3)
zcl (4)
zedzone (26)
Tuesday, 09 July 2024, 15:15

Quicker divmod on esp32/esp8266

The reason I looked into this is a bit silly but in short I've come up with some code that does a divide/modulo by 10 somewhat faster than the C compiler comes up with. A fairly straightforward bit-shifting implementation improves over the compiler supplied one on esp8266 but that's only the start. I eventually came across some xtensa assembly language code in Linux for div/mod but I haven't investigated, this is just looking at c.

Slow Divide

The esp8266 doesn't have a divide function so it is implemented using a standard bit-based long division algorithm. The RISCV cpu in the esp32c3 does have one but it's pretty slow. Below is an example of a straightforward divmod implementation with a few basic improvements.

//          udiv32   libc.div()   c / and %
// esp32c3    290          87           80  cycles
// esp8266    320         490          470
udiv32_t udiv32(uint32_t num, uint32_t den) {
        uint32_t r = 0, q = 0;
        uint32_t bit = 1 << 31;

        // shortcut low numbers (and avoid 0)
        if (num < den)
                return (udiv32_t){ 0, num };

        // shortcut high numbers
        int f = __builtin_clz(num);

        // we know this bit is set so take it
        bit = bit >> (f + 1);
        num = num << (f + 1);
        r = 1;

        while (bit) {
                r = (r << 1) | (num >> 31);

                num <<= 1;
                if (r >= den) {
                        r -= den;
                        q |= bit;
                }
                bit >>= 1;
        }

        return (udiv32_t){ q, r };
}

It's running time depends on the inputs but on an esp8266 it's around 50% faster than div or simply using { num / div, num % div }. On the other hand it's 1/4 of the speed on an esp32c3 which has divu and remu instructions (of unknown implementation and timings).

There are a few (micro) optimisations which help on most cpus:

One odd thing I noticed when verifying it is that intel cpu's can't shift a 32-bit value by 32 bits, so without the shortcut exit the result will be incorrect if num is 1. But only on intel.

Fast Multiply

I suppose that was a bit of a win for fairly simple C but given multiply isn't slow on the cpu's it's possible to improve the results for specific values by pre-calculating a multiplication constant.

A fixed point division can be implemented by a multiply of the inverse and a shift. Having a 32-bit implied shift from a mulh instruction gives the obvious choice for the shift and provides the maximum precison possible. The inverse constant is easy to calculate (using 33 bit arithmetic):

        SCALE = (1 << SHIFT) / den
              = (1 << 32) / 10
              = 429496729
              = 0x19999999

Then to calculate the division requires the high result of a 32x32 multiply.

uint32_t div10(uint32_t num) {
        return (uint64_t)num * 0x19999999 >> 32;
}

Unfortunately if you try using this directly you find it is incorrect quite often - all(?) multiples of 10 are out by 1 for instance and it gets worse after num gets very large - larger than 0x40000000. This might not be a problem for some applications but I needed a correct result. I tried adding some magic bits to the multiplicand - which helped - but it can't cover every possible input and and requires feedback to correct.

Fortunately over the full range of input the error is no more than 1/2 (or a remainder overflow of +5), and since this code is also interested in the remainder it can then be used to correct the error. Thus:

// esp32c3   33 cycles
// esp8266  124 cycles
udiv32_t udiv10(uint32_t num) {
        uint32_t q = (uint64_t)num * 0x19999999 >> 32;
        uint32_t r = num - q * 10;

        if (r >= 10) {
                r -= 10;
                q += 1;
        }

        return (udiv32_t){ q, r };
}

On an esp32c3 this is nearly 3x faster than the compiler generates using div/rem instructions, but obviously it is hardcoded to the single denominator of 10.

An esp8266 doesn't have a mulh instruction and the compiler generates a full 64 bit intermediate result from a library call - however this is still 5x faster than div(). So the next obvious step is to try to just calculate the high bits of the multiply without retaining the rest. It turns out you can't save many operations but the operations you can save are fairly expensive on these cpus - 64 bit addition is a bit messy without a carry flag.

Using long multiplication provides the solution:

/*
 *             ah     al
 *             bh     bl
 *    -------------------
 *           ah.bl  al.bl
 * +  ah.bh  al.bh
 *    ===================
 *     <<32   <<16
 */
// esp32c3   40 cycles
// esp8266   58 cycles
udiv32_t udiv10(uint32_t num) {
        uint16_t ah = num >> 16;
        uint16_t al = num;
        uint16_t bh = 0x1999;
        uint16_t bl = 0x9999;

        uint32_t c = ah * bh;
        uint32_t d = ah * bl;
        uint32_t e = al * bh;
        uint32_t f = al * bl;

        uint32_t t = d + e + (f >> 16);

        uint32_t q = c + (t >> 16);
        uint32_t r = num - q * 10;

        if (r >= 10) {
                r -= 10;
                q += 1;
        }

        return (udiv32_t){ q, r };
}

Each multiplication result 'column' is 16 bits wide with some overlap so a few shifts are required to line everything up and not lose any bits.

On an esp8266 this uses the mul16u instruction directly and ends up about 2x as faster again, or around 10x faster in total than using div.

This isn't much slower on an esp32c3 either, possibly the mulhu instruction is slower than mulu; it's cycle timing doesn't appear to be publically documented and I haven't tried timing it as yet. I'm also timing call overheads which are significant at this range of cycle times (17-22 cycles).

A bit more juice

This can be improved a tiny bit more. It turns out for this case that a slight adjustment to the multiplier improves the accuracy of the initial result significantly. By adding 1 to the multiplcand will provide an exact result for every num from 0 to a bit over 0x40000000. This is quite a bit better but it still needs correction and the error will be in the other direction - the remainder can go negative.

The mulhu friendly version:

// esp32c3   30 cycles
// esp8266  123 cycles
udiv32_t udiv10(uint32_t num) {
        uint32_t q = (uint64_t)num * 0x1999999a >> 32;
        uint32_t r = num - q * 10;

        if ((int32_t)r < 0) {
                r += 10;
                q -= 1;
        }

        return (udiv32_t){ q, r };
}

Apart from being correct more often the negative test is cheaper on many cpu's since it doesn't require a comparison first. Two's compliment arithmetic means the sign still works as a test on an ``unsigned'' number.

The mul16u friendly version:

// esp32c3   37 cycles
// esp8266   56 cycles
udiv32_t udiv10(uint32_t num) {
        uint16_t ah = num >> 16;
        uint16_t al = num;
        uint16_t bh = 0x1999;
        uint16_t bl = 0x999a;

        uint32_t c = ah * bh;
        uint32_t d = ah * bl;
        uint32_t e = al * bh;
        uint32_t f = al * bl;

        uint32_t t = d + e + (f >> 16);

        uint32_t q = c + (t >> 16);
        uint32_t r = num - q * 10;

        if ((int32_t)r < 0) {
                r += 10;
                q -= 1;
        }

        return (udiv32_t){ q, r };
}

The timings were taken from the cpu cycle counters but are approximate as they depend either a lot (for the shift implementation) or a tiny bit on the inputs. They also include the overheads of function calls, for these systems size is important so it's probably necessary anyway but for comparison this is the timing of non-functional calls tested the same way.

// esp32c3  17 cycles
// esp8266  22 cycles
udiv32_t udiv_null(uint32_t num) {
        return (udiv32_t){ num, num };
}

Not sure it was the best use of a whole day but there you have it.

Well I did mow the lawn.

Tagged code, esp32, hacking.
Monday, 08 July 2024, 03:25

Driver for HLK LD2410C

I've been playing with the tiny HiLink HLK-LD2410C human presence sensor and an ESP32C3 and decided to write a fairly complete driver for it as I couldn't find one apart from a partial implementation in c++ for the Arduino platform.

Below is snippet of an example of how to use it.

#include "esp-ld2410.h"

 ...

    // Initialise device on UART 1 using
    //  Default factory baudrate (256000)
    //  GPIO_NUM_1 for txd (connected to module rxd),
    //  GPIO_NUM_0 for rxd (connected to module txd),
    //  No digital signal
    ld2410_driver_install(UART_NUM_1, LD2410_BAUDRATE_DEFAULT, GPIO_NUM_1, GPIO_NUM_0, GPIO_NUM_NC);

    // enable full-data (engineering) mode
    ld2410_set_full_mode(UART_NUM_1);

    // Retrieve and print data.
    // It may take a little while for data to be valid.
    while (1) {
        const struct ld2410_full_t *data = ld2410_get_full_data(UART_NUM_1);

        printf(" detected: %s %s\n",
            data->status & LD2410_STATUS_MOVE ? "moving" : "",
            data->status & LD2410_STATUS_REST ? "resting" : "");

        vTaskDelay(pdMS_TO_TICKS(1000));
    }
  

Pretty much every feature is supported apart from anything bluetooth related but that isn't difficult to add.

Internally it uses a daemon task to monitor the serial port and manage communications with the device. For example most settings and queries must be performed after switching to 'command mode' - this is handled automatically and efficiently by the daemon task.

This is my first attempt at anything sophisticated with FreeRTOS so my choices of IPC primitives and so on may not be those I'd choose with experience.

I'm not really sure what I'm going to do with the sensor but at least I have a platform for experimentation with it.

There is some more information and links on the esp-ld2410 project page.

Tagged code, esp32, hacking.
Saturday, 06 July 2024, 09:28

Electrified Kilt!

I caught up with some old mates recently and one of them gave me a couple of ESP-01 board to play with. I wasn't really sure what I was going to do with them, one plan was a wifi-connected irrigation controller just for something to poke at and to learn about the devices and the i/o. But this act of generosity cascaded into a bit of a spending spree on aliexpress where i've added some esp32-c3's, a bunch of sensors, relay boards, power modules, solderless breadboards, LED strips ...

Once I got them working I wasn't really sure what to do with them, I have some ideas for the light strips for xmas but since I had them and winter solstice was coming up I decided to do something a bit silly and see if I could successfully 'electrify' one of my kilts!

Moving geometric pattern.

Animated 3D Simplex Noise.

The pictures don't really do it justice - the leds are very bright at night and very colourful and animate very smoothly. I'm using a Makita battery which is a bit hefty to hang off my belt but it's what I had available and has enough juice to run it for about 20 hours or something stupid.

We're having a mid-winter 'light show' thing at the moment and I wore it out again last night (the opening night), I should've done a mainy on foot but I didn't really have the gumption for it and just stayed in the one pub all night. One does feel a bit self concious wearing it although on the solstice I walked home lit up like an xmas tree some time late at night (about an hour's walk).

Hacker Hacking

It turned out to be more work than I expected, from working out how to lay out and attach the leds (sewing them on, so I need to unsew to wash it), soldering up an adapter board, plus of course all the coding.

I started with some of the code in FastLED but all I really have left is some of the colour palettes.

I used Arduino initally to get the devices 'live' but quickly moved onto the FreeRTOS sdk - Arduino is interesting enough but its' a bit clunky to work with compared to proper tools. And I gotta say some of the quality of code is pretty low. I also started the LEDs showing with an ESP-01 board (esp8266) but then moved to an ESP32-C3 which is a much more interesting board. It's the first time I've played with a RiscV processor - TBH I find some of the ISA decisions a bit weird but due to the simplicty of the instruction set and enough registers the compiler mostly does a pretty good job of compiling it. Mostly. It does some weird shit when you return a structure in registers (a 2-vector) like move the stack pointer down and up and doesn't store anythin gon the stack.

The first time I wore it for the solstice I noticed the colours were a bit flat - the Perlin noise from FastLED wasn't being scaled qutie correctly so it was missing a big part of the range. I experimented a lot with it but ended up scapping it and converting the Java implementation of Simplex Noise from Stefan Gustavson into C and then into fixed-point. I also experimented with a lot with the code, various micro-optimisations and using a hash function rather than a permute table, gradients generated from bits or from lookup tables, and so on.

I was going to write a text scroller and even found (and created) some tiny fonts. But in the end I couldn't think of anything worthwhile to write so I decided not to bother coding it up. I created some tiny animations but they looked a bit shit with such a small number of pixels to work with so decided to drop them as well. Instead I render some simple geometric shapes on the fly using signed distance fields which also let more easily play with the colours. This was another journey into writing some fixed-point affine transforms, sincos functions and inverse square root which the internet provided starting points for which I cleaned up or simplified for better RISCV optimisation. And I implemented a version of the R250 random number generator but with a smaller number of coefficients and parameter choices which sped it up a bit more and use that to drive everything. I even hooked up a web server I wrote (not using the ESP SDK one) that you can connect to via wifi but I couldn't really think of anything useful to do with it.

I was worried about the cost of calculation and spent a good few days trying various optimisations but the simplex algorithm was already significantly faster than the noise routines from FastLED and in the end it wasn't a problem. I think the fastest 3D simplex calculation was about 280 cycles, the FastLED 3D noise was about 520 iirc. There are only 50 leds and the colours are easily updated 125 times/second - that's with up to two planes of 3D Simplex Noise or a basic SDF graphic. One somewhat small but still effective optimisation technique to remove the 8 and 16 bit integral types used heavily in ARM or AVR code - these usually require extra operations since the RISCV CPU has only 32 bit types.

The gamma curve of the leds I have is pretty rough - at the low end the steps are very noticeable so I implemented 2 bits of temporal dither (aka 'frame rate control') to try to help. So the LEDS themselves are actually updated 500 times/second and every fourth timeout I re-render the graphics. There is more that could be done to adjust the colour curves and so on but really I couldn't be bothered. I also didn't investigate the ESP32 timers and interrupts and just run it off an RTOS timeout handler.

So anyway I came up with 2 different noise types - one a higher contrast scaled simplex noise with mostly monochromatic palettes, and a smoother one with more colourful palettes. Apart from the z dimension the smoother one can also move up or down, left or right. And 5 different SDF patterns - a line, a cross, a rectangle, half plane, quarter plane and two quarter planes. These rotate or move and the line/cross/rectangle are repeated on an 8x8 grid. There's about a dozen different palettes plus 8 monochromatic ones created on the fly which can also be complemented (i.e. purple+green or just purple+black). These run for a randomised time and then slowly fade into the next routine. One last thing I added was it flashes for 5 seconds every now and then (at different rates) and on a separate cycle to everything else just to add a bit more bling.

I even wrote an Xlib 'kilt simulator' I could use to rapidly test the routines!

Hard noise in purple.

Rotating fat SDF line.

Ahh well that kept me off the streets for a couple of weeks and got a few laughs and a lot of weird looks. I'll (probably?) eventually get around to uploading the code, I have to work out where I got all the bits from first. I'm not involved with any forums any more and mastadon isn't really my thing so nobody will end up seeing it anyway. I filed a bug report with FastLED about their 8 bit blend function which isn't quite correct.

It was also fun doing a bit of coding again - although CMake could certainly fuck off. Shit like that puts me off wanting to code again for a job, although maybe I should start thinking about it again.

Tagged esp32, hacking.
Tuesday, 12 March 2024, 03:06

Summer

Well summer finally arrived — only about 2 months late. The garden's been pretty confused this year but the growing season is all but done now, but i'm hoping for a few more chillies yet.

My leg is slowly healing, unfortunately I developed bursits due to dealing with the degrading joint and that has lingered beyond the hip replacement. The implant is particularly perpendicular which seems to be adding additional stress on the bursa but I was probably too active pre and post-operation and aggravated it further.

I've got some daily physio exercises to move through and that seems to be helping but it's frustrating I can't ride as much as I'd like to. Cycling seems to be the main aggravator for the bursitis which is a bit of a problem when I don't drive. I can get a cortisol injection if I want but I'm trying to avoid is as frankly I'm sick of hanging around hospitals and doctor surgeries. While it's mostly just annoying it can be quite painful and the main problem is it interfering with sleep which I don't get enough of anyway. Anyway it seems to be getting better, albeit very slowly.

I've been spending way too much time and money at the pub drinking way too much; at least it's mostly been for fun and my mood has generally been a bit better this year. I've had a couple of low episodes but I just push through it and try to get out of my head and my house and they haven't persisted, very much not looking forward to winter though, It's quite a bleak time even with the relatively mild winters we experience.

Retirement?

I've just got so much spare time it's hard to fill so I've been looking at things to do such as reading, cooking, exercising. My key-lime tree has been shedding excessive fruit so I've had lots of limes to use up — so I've been making pickles and chutneys and marmalade's and cordial, and hot sauces. I fermented 1.5kg of chillies that I turned into tobasco-style sauces. I found a local place to get sauce bottles and small jars although it's not particularly cheap. I tend to give away a lot of what I made and the jars and bottles rarely return.

In part due to the hip replacement and in part because I have the time I've been keeping pretty fit and healthy, right bang on 'ideal weight' for my height. Apart from the obscenely excessive beer and a bit of a gout flare-up I'm eating very well, it basically turned into a Mediterranean Diet without intending to. Greens from the garden, lots of citrus, chillies, nuts, legumes, soups, a bit of liver once in a while. Not a whole lot of meat — not because I don't like it I just can't be bothered cooking it.

Yesterday I did a bit of a budget review and I'm going to have to either cut back on spending quite a bit or consider working again at some point. Still I'm going ok considering I haven't worked for over 3 years and have zero income so there's still no real rush. I'm getting a bit too comfortable with not having to deal with all that bullshit and I simply blanch at the thought of having to work again. Lucky me I guess.

Tagged biographical.
Tuesday, 12 March 2024, 02:36

Domain Changed!

I've changed the site domain to zedzone.au. I'm pretty sure I got everything but if something breaks it breaks and I can fix it as required. It was a pain in the arse to have to do it but it didn't take much work.

Old SiteNew Site
www.zedzone.spacewww.zedzone.au
code.zedzone.spacecode.zedzone.au
zedzone.spacezedzone.au

I've added a permanent redirect from the old .space domain to the .au domain — but this will only work as long as the domain is active, and it expires in 7 days.

Tagged zedzone.
Tuesday, 12 March 2024, 00:22

Domain Change

I got the .space domain because it was cheap but now it's jumped significantly in price ... so i'm going to attempt to move to zedzone.au.

This might upset things for a little while but so be it, it's not like this is a heavily trafficked site.

Tagged zedzone.
Thursday, 26 October 2023, 00:12

More Legs

Tomorrow will be 7 weeks since I had the hip operation. It's been one of those "I can't believe it's been 7 weeks, but shit a lot has happened" periods.

They told me not to put more than 10kg weight on the operated leg for 6 weeks - but after 3 weeks of bloody crutches I gave that idea the flick. A couple of days I walked all the way into/out of the city - about 10km round trip - entirely on crutches but it just seemed absurd. About a week using a walking stick but then my hand started going numb so I mostly just hobbled around after that. After 5 weeks I got on the bicycle. At the 6 week mark I had a review and the surgeon was very pleased with my progress but I was miserable with the ongoing pain and news of lifelong movement restrictions. I'm seeing a physio on Monday so hopefully I can get a better idea of what they are, plus maybe some exercises and guidance on "overdoing it". So far the directions have been vague and pretty useless although the surgeon did say I could get on the bike again (which I had already obviously).

Mostly i'm just frustrated as fuck at this point. The pain is taking a long time to recede, it hurts to sit, it hurts to lie on my back or operated side - and well even the non-operated one. This is intefering with my sleep which was already shithouse. It hurts to sit playing playstation or drinking beer, it hurts afterwards if i ride too much, walk too much, stand too much. It's difficult and painful to tie my shoelaces or cut my nails. OTC painkillers don't really do much - hell even when I was in the hospital it took 20mg of oxycodone to do anything and they thought I was a junkie or something. As far as I know i'll never be able to sit cross-legged on the floor ever again - maybe that seems trivial but what am I suppose to do, carry a fucking chair everywhere I go? How do I do my go-to hamstring stretch when my back gets too stiff?

So i've been walking a lot anyway - it seems to help with the pain to some extent. At the 6 week mark I started back on situps and pushups - at least I can do those. Riding a bit when the weather is good.

But mostly i'm drinking way too much and with the lack of sleep being moody and sulky and feeling lonely and depressed - so pretty much like every other spring.

One of my main haunts closed down - arsehole landlord in dispute with his daughter even though the business was keeping things in the black. With 4 weeks notice for everyone it was pretty shit. I managed to make it to the closing even on crutches but it's left quite a hole in the community and a hole in my heart every time I walk past.

And one of my nephews was recently in an absolutely horrific car accident - ute rolled end on end and he was thrown 15 metres from the vehicle. By all measures he should be dead but somehow he's still physically alive but his brain ... well things are bad. His eyes open but nobody's home, he can't breathe or eat without tubes. I just hope his needless suffering isn't extended by his idiot father and stempother (they're "preppers") who think he needs to be kept alive at all costs and it turns into another Terri Schiavo indicdent. That's if the pneumonia doesn't do him in soon.

Tagged biographical.
Sunday, 03 September 2023, 09:24

THR coming up

For various reasons it's been a fairly involved few months but now life is back to anhedonia, insufficient sleep, and excess alcohol. But in 5 days that all changes again as i'm finally going under the knife for a total hip replacement on my left leg.

I'm resigned to it but still anxious as fuck and well just super-pissed off that I need to have it done all from a nothing accident 3.5 years ago. It'll be a couple of months before I can even think of getting on a bicycle again which is somewhat limiting as that is my main mode of transport apart from walking.

By pretty much pure chance a niece wants to move to Adelaide so is coming to stay with me for a bit - she can help with the basic day to day stuff that I might not be able to do for a few weeks. There aren't any real insights on how long it might take to recover enough to get around, cook, and clean so I will just have to see how it pans out for me.

I didn't drink for July (moral support for a friend at the time) and lost a bit of weight and spent a bit more effort on getting as fit and strong as possible before the operation so physically i'm about as prepared as i'm likely to be.

Mentally and emotionally i'm a bit of a wreck but oh well, one day at a time.

Tagged biographical.
Newer Posts | Older Posts
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!