About Me
Michael Zucchi
B.E. (Comp. Sys. Eng.)
also known as Zed
to his mates & enemies!
< notzed at gmail >
< fosstodon.org/@notzed >
Quicker divmod on esp32/esp8266
The reason I looked into this is a bit silly but in short I've come
up with some code that does a divide/modulo by 10 somewhat faster
than the C compiler comes up with. A fairly straightforward
bit-shifting implementation improves over the compiler supplied one
on esp8266 but that's only the start. I eventually came across some
xtensa assembly language code in Linux for div/mod but I haven't
investigated, this is just looking at c.
Slow Divide
The esp8266 doesn't have a divide function so it is implemented
using a standard bit-based long division algorithm. The RISCV cpu
in the esp32c3 does have one but it's pretty slow. Below is an
example of a straightforward divmod
implementation with
a few basic improvements.
// udiv32 libc.div() c / and %
// esp32c3 290 87 80 cycles
// esp8266 320 490 470
udiv32_t udiv32(uint32_t num, uint32_t den) {
uint32_t r = 0, q = 0;
uint32_t bit = 1 << 31;
// shortcut low numbers (and avoid 0)
if (num < den)
return (udiv32_t){ 0, num };
// shortcut high numbers
int f = __builtin_clz(num);
// we know this bit is set so take it
bit = bit >> (f + 1);
num = num << (f + 1);
r = 1;
while (bit) {
r = (r << 1) | (num >> 31);
num <<= 1;
if (r >= den) {
r -= den;
q |= bit;
}
bit >>= 1;
}
return (udiv32_t){ q, r };
}
It's running time depends on the inputs but on an esp8266 it's
around 50% faster than div
or simply using { num
/ div, num % div }
. On the other hand it's 1/4 of the speed
on an esp32c3 which has divu
and remu
instructions (of unknown implementation and timings).
There are a few (micro) optimisations which help on most cpus:
- Using
clz()
to skip to the first set bit in the
numerator, even if there's no clz
instruction it's
usually something better than testing each bit one by one;
- Moving the first iteration outside of the loop since we're already at a set bit;
- Avoiding a separate loop counter by using the bit itself;
- Avoiding masks in the bit copy by taking the top bit from the numerator.
One odd thing I noticed when verifying it is that intel cpu's can't
shift a 32-bit value by 32 bits, so without the shortcut exit the
result will be incorrect if num
is 1. But only on
intel.
Fast Multiply
I suppose that was a bit of a win for fairly simple C but given
multiply isn't slow on the cpu's it's possible to improve the
results for specific values by pre-calculating a multiplication
constant.
A fixed point division can be implemented by a multiply of the
inverse and a shift. Having a 32-bit implied shift from
a mulh
instruction gives the obvious choice for the
shift and provides the maximum precison possible. The inverse
constant is easy to calculate (using 33 bit arithmetic):
SCALE = (1 << SHIFT) / den
= (1 << 32) / 10
= 429496729
= 0x19999999
Then to calculate the division requires the high result of a 32x32
multiply.
uint32_t div10(uint32_t num) {
return (uint64_t)num * 0x19999999 >> 32;
}
Unfortunately if you try using this directly you find it is
incorrect quite often - all(?) multiples of 10 are out by 1 for
instance and it gets worse after num
gets very large -
larger than 0x40000000
. This might not be a problem
for some applications but I needed a correct result. I tried adding
some magic bits to the multiplicand - which helped - but it can't
cover every possible input and and requires feedback to correct.
Fortunately over the full range of input the error is no more than
1/2 (or a remainder overflow of +5), and since this code is also
interested in the remainder it can then be used to correct the
error. Thus:
// esp32c3 33 cycles
// esp8266 124 cycles
udiv32_t udiv10(uint32_t num) {
uint32_t q = (uint64_t)num * 0x19999999 >> 32;
uint32_t r = num - q * 10;
if (r >= 10) {
r -= 10;
q += 1;
}
return (udiv32_t){ q, r };
}
On an esp32c3 this is nearly 3x faster than the compiler generates
using div
/rem
instructions, but obviously
it is hardcoded to the single denominator of 10.
An esp8266 doesn't have a mulh
instruction and the
compiler generates a full 64 bit intermediate result from a library
call - however this is still 5x faster than div()
. So
the next obvious step is to try to just calculate the high bits of
the multiply without retaining the rest. It turns out you can't
save many operations but the operations you can save are fairly
expensive on these cpus - 64 bit addition is a bit messy without a
carry flag.
Using long multiplication provides the solution:
/*
* ah al
* bh bl
* -------------------
* ah.bl al.bl
* + ah.bh al.bh
* ===================
* <<32 <<16
*/
// esp32c3 40 cycles
// esp8266 58 cycles
udiv32_t udiv10(uint32_t num) {
uint16_t ah = num >> 16;
uint16_t al = num;
uint16_t bh = 0x1999;
uint16_t bl = 0x9999;
uint32_t c = ah * bh;
uint32_t d = ah * bl;
uint32_t e = al * bh;
uint32_t f = al * bl;
uint32_t t = d + e + (f >> 16);
uint32_t q = c + (t >> 16);
uint32_t r = num - q * 10;
if (r >= 10) {
r -= 10;
q += 1;
}
return (udiv32_t){ q, r };
}
Each multiplication result 'column' is 16 bits wide with some
overlap so a few shifts are required to line everything up and not
lose any bits.
On an esp8266 this uses the mul16u
instruction directly
and ends up about 2x as faster again, or around 10x faster in total
than using div
.
This isn't much slower on an esp32c3 either, possibly
the mulhu
instruction is slower than mulu
;
it's cycle timing doesn't appear to be publically documented and I
haven't tried timing it as yet. I'm also timing call overheads
which are significant at this range of cycle times (17-22 cycles).
A bit more juice
This can be improved a tiny bit more. It turns out for this case
that a slight adjustment to the multiplier improves the accuracy of
the initial result significantly. By adding 1 to the multiplcand
will provide an exact result for every num
from 0 to a
bit over 0x40000000
. This is quite a bit better but it
still needs correction and the error will be in the other direction
- the remainder can go negative.
The mulhu
friendly version:
// esp32c3 30 cycles
// esp8266 123 cycles
udiv32_t udiv10(uint32_t num) {
uint32_t q = (uint64_t)num * 0x1999999a >> 32;
uint32_t r = num - q * 10;
if ((int32_t)r < 0) {
r += 10;
q -= 1;
}
return (udiv32_t){ q, r };
}
Apart from being correct more often the negative test is cheaper on
many cpu's since it doesn't require a comparison first. Two's
compliment arithmetic means the sign still works as a test on an
``unsigned'' number.
The mul16u
friendly version:
// esp32c3 37 cycles
// esp8266 56 cycles
udiv32_t udiv10(uint32_t num) {
uint16_t ah = num >> 16;
uint16_t al = num;
uint16_t bh = 0x1999;
uint16_t bl = 0x999a;
uint32_t c = ah * bh;
uint32_t d = ah * bl;
uint32_t e = al * bh;
uint32_t f = al * bl;
uint32_t t = d + e + (f >> 16);
uint32_t q = c + (t >> 16);
uint32_t r = num - q * 10;
if ((int32_t)r < 0) {
r += 10;
q -= 1;
}
return (udiv32_t){ q, r };
}
The timings were taken from the cpu cycle counters but are
approximate as they depend either a lot (for the shift
implementation) or a tiny bit on the inputs. They also include the
overheads of function calls, for these systems size is important so
it's probably necessary anyway but for comparison this is the timing
of non-functional calls tested the same way.
// esp32c3 17 cycles
// esp8266 22 cycles
udiv32_t udiv_null(uint32_t num) {
return (udiv32_t){ num, num };
}
Not sure it was the best use of a whole day but there you have it.
Well I did mow the lawn.
Driver for HLK LD2410C
I've been playing with the
tiny HiLink
HLK-LD2410C human presence sensor and an ESP32C3 and decided to
write a fairly complete driver for it as I couldn't find one apart
from a partial implementation in c++ for the Arduino platform.
Below is snippet of an example of how to use it.
#include "esp-ld2410.h"
...
// Initialise device on UART 1 using
// Default factory baudrate (256000)
// GPIO_NUM_1 for txd (connected to module rxd),
// GPIO_NUM_0 for rxd (connected to module txd),
// No digital signal
ld2410_driver_install(UART_NUM_1, LD2410_BAUDRATE_DEFAULT, GPIO_NUM_1, GPIO_NUM_0, GPIO_NUM_NC);
// enable full-data (engineering) mode
ld2410_set_full_mode(UART_NUM_1);
// Retrieve and print data.
// It may take a little while for data to be valid.
while (1) {
const struct ld2410_full_t *data = ld2410_get_full_data(UART_NUM_1);
printf(" detected: %s %s\n",
data->status & LD2410_STATUS_MOVE ? "moving" : "",
data->status & LD2410_STATUS_REST ? "resting" : "");
vTaskDelay(pdMS_TO_TICKS(1000));
}
Pretty much every feature is supported apart from anything bluetooth
related but that isn't difficult to add.
Internally it uses a daemon task to monitor the serial port and
manage communications with the device. For example most settings
and queries must be performed after switching to 'command mode' -
this is handled automatically and efficiently by the daemon task.
This is my first attempt at anything sophisticated with FreeRTOS so
my choices of IPC primitives and so on may not be those I'd choose
with experience.
I'm not really sure what I'm going to do with the sensor but at
least I have a platform for experimentation with it.
There is some more information and links on
the esp-ld2410 project page.
Electrified Kilt!
I caught up with some old mates recently and one of them gave me a
couple of ESP-01 board to play with. I wasn't really sure what I
was going to do with them, one plan was a wifi-connected irrigation
controller just for something to poke at and to learn about the
devices and the i/o. But this act of generosity cascaded into a bit
of a spending spree on aliexpress where i've added some esp32-c3's,
a bunch of sensors, relay boards, power modules, solderless
breadboards, LED strips ...
Once I got them working I wasn't really sure what to do with them, I
have some ideas for the light strips for xmas but since I had them
and winter solstice was coming up I decided to do something a bit
silly and see if I could successfully 'electrify' one of my kilts!
The pictures don't really do it justice - the leds are very bright
at night and very colourful and animate very smoothly. I'm using a
Makita battery which is a bit hefty to hang off my belt but it's
what I had available and has enough juice to run it for about 20
hours or something stupid.
We're having a mid-winter 'light show' thing at the moment and I
wore it out again last night (the opening night), I should've done
a mainy on foot but I didn't really have the gumption for it and
just stayed in the one pub all night. One does feel a bit self
concious wearing it although on the solstice I walked home lit up
like an xmas tree some time late at night (about an hour's walk).
Hacker Hacking
It turned out to be more work than I expected, from working out how
to lay out and attach the leds (sewing them on, so I need to unsew
to wash it), soldering up an adapter board, plus of course all the
coding.
I started with some of the code
in FastLED but all I really have
left is some of the colour palettes.
I used Arduino initally to get the devices 'live' but quickly moved
onto the FreeRTOS sdk - Arduino is interesting enough but its' a bit
clunky to work with compared to proper tools. And I gotta say some
of the quality of code is pretty low. I also started the LEDs
showing with an ESP-01 board (esp8266) but then moved to an ESP32-C3
which is a much more interesting board. It's the first time I've
played with a RiscV processor - TBH I find some of the ISA decisions
a bit weird but due to the simplicty of the instruction set and
enough registers the compiler mostly does a pretty good job of
compiling it. Mostly. It does some weird shit when you return a
structure in registers (a 2-vector) like move the stack pointer down
and up and doesn't store anythin gon the stack.
The first time I wore it for the solstice I noticed the colours were
a bit flat - the Perlin noise from FastLED wasn't being scaled qutie
correctly so it was missing a big part of the range. I experimented
a lot with it but ended up scapping it and converting the Java
implementation of Simplex Noise from Stefan Gustavson into C and
then into fixed-point. I also experimented with a lot with the
code, various micro-optimisations and using a hash function rather
than a permute table, gradients generated from bits or from lookup
tables, and so on.
I was going to write a text scroller and even found (and created)
some tiny fonts. But in the end I couldn't think of anything
worthwhile to write so I decided not to bother coding it up. I
created some tiny animations but they looked a bit shit with such
a small number of pixels to work with so decided to drop them as
well. Instead I render some simple geometric shapes on the fly
using signed
distance fields which also let more easily play with the
colours. This was another journey into writing some fixed-point
affine transforms, sincos functions and inverse square root which
the internet provided starting points for which I cleaned up or
simplified for better RISCV optimisation. And I implemented a
version of the R250 random number generator but with a smaller
number of coefficients and parameter choices which sped it up a
bit more and use that to drive everything. I even hooked up a web
server I wrote (not using the ESP SDK one) that you can connect to
via wifi but I couldn't really think of anything useful to do with
it.
I was worried about the cost of calculation and spent a good few
days trying various optimisations but the simplex algorithm was
already significantly faster than the noise routines from FastLED
and in the end it wasn't a problem. I think the fastest 3D
simplex calculation was about 280 cycles, the FastLED 3D noise was
about 520 iirc. There are only 50 leds and the colours are easily
updated 125 times/second - that's with up to two planes of 3D
Simplex Noise or a basic SDF graphic. One somewhat small but still
effective optimisation technique to remove the 8 and 16 bit
integral types used heavily in ARM or AVR code - these usually
require extra operations since the RISCV CPU has only 32 bit types.
The gamma curve of the leds I have is pretty rough - at the low
end the steps are very noticeable so I implemented 2 bits of
temporal dither (aka 'frame rate control') to try to help. So the
LEDS themselves are actually updated 500 times/second and every
fourth timeout I re-render the graphics. There is more that could
be done to adjust the colour curves and so on but really I
couldn't be bothered. I also didn't investigate the ESP32 timers
and interrupts and just run it off an RTOS timeout handler.
So anyway I came up with 2 different noise types - one a higher
contrast scaled simplex noise with mostly monochromatic palettes,
and a smoother one with more colourful palettes. Apart from the z
dimension the smoother one can also move up or down, left or right.
And 5 different SDF patterns - a line, a cross, a rectangle, half
plane, quarter plane and two quarter planes. These rotate or move
and the line/cross/rectangle are repeated on an 8x8 grid. There's
about a dozen different palettes plus 8 monochromatic ones created
on the fly which can also be complemented (i.e. purple+green or just
purple+black). These run for a randomised time and then slowly fade
into the next routine. One last thing I added was it flashes for 5
seconds every now and then (at different rates) and on a separate
cycle to everything else just to add a bit more bling.
I even wrote an Xlib 'kilt simulator' I could use to rapidly test the
routines!
Ahh well that kept me off the streets for a couple of weeks and
got a few laughs and a lot of weird looks. I'll (probably?)
eventually get around to uploading the code, I have to work out
where I got all the bits from first. I'm not involved with any
forums any more and mastadon isn't really my thing so nobody will
end up seeing it anyway. I filed a bug report with FastLED about
their 8 bit blend function which isn't quite correct.
It was also fun doing a bit of coding again - although CMake could
certainly fuck off. Shit like that puts me off wanting to code
again for a job, although maybe I should start thinking about it
again.
Summer
Well summer finally arrived — only about 2 months late. The
garden's been pretty confused this year but the growing season is
all but done now, but i'm hoping for a few more chillies yet.
My leg is slowly healing, unfortunately I developed bursits due to
dealing with the degrading joint and that has lingered beyond the
hip replacement. The implant is particularly perpendicular which
seems to be adding additional stress on the bursa but I was
probably too active pre and post-operation and aggravated it
further.
I've got some daily physio exercises to move through and that
seems to be helping but it's frustrating I can't ride as much as
I'd like to. Cycling seems to be the main aggravator for the
bursitis which is a bit of a problem when I don't drive. I can
get a cortisol injection if I want but I'm trying to avoid is as
frankly I'm sick of hanging around hospitals and doctor surgeries.
While it's mostly just annoying it can be quite painful and the
main problem is it interfering with sleep which I don't get enough
of anyway. Anyway it seems to be getting better, albeit very
slowly.
I've been spending way too much time and money at the pub drinking
way too much; at least it's mostly been for fun and my mood has
generally been a bit better this year. I've had a couple of low
episodes but I just push through it and try to get out of my head
and my house and they haven't persisted, very much not looking
forward to winter though, It's quite a bleak time even with the
relatively mild winters we experience.
Retirement?
I've just got so much spare time it's hard to fill so I've been
looking at things to do such as reading, cooking, exercising. My
key-lime tree has been shedding excessive fruit so I've had lots
of limes to use up — so I've been making pickles and
chutneys and marmalade's and cordial, and hot sauces. I fermented
1.5kg of chillies that I turned into tobasco-style sauces. I
found a local place to get sauce bottles and small jars although
it's not particularly cheap. I tend to give away a lot of what I
made and the jars and bottles rarely return.
In part due to the hip replacement and in part because I have the
time I've been keeping pretty fit and healthy, right bang on
'ideal weight' for my height. Apart from the obscenely excessive
beer and a bit of a gout flare-up I'm eating very well, it
basically turned into a Mediterranean Diet without intending to.
Greens from the garden, lots of citrus, chillies, nuts, legumes,
soups, a bit of liver once in a while. Not a whole lot of meat
— not because I don't like it I just can't be bothered
cooking it.
Yesterday I did a bit of a budget review and I'm going to have to
either cut back on spending quite a bit or consider working again
at some point. Still I'm going ok considering I haven't worked
for over 3 years and have zero income so there's still no real
rush. I'm getting a bit too comfortable with not having to deal
with all that bullshit and I simply blanch at the thought of
having to work again. Lucky me I guess.
Domain Changed!
I've changed the site domain to zedzone.au. I'm pretty sure I got
everything but if something breaks it breaks and I can fix it as
required. It was a pain in the arse to have to do it but it
didn't take much work.
Old Site | | New Site |
www.zedzone.space | → | www.zedzone.au |
code.zedzone.space | → | code.zedzone.au |
zedzone.space | → | zedzone.au |
I've added a permanent redirect from the old .space domain to the
.au domain — but this will only work as long as the domain is
active, and it expires in 7 days.
Domain Change
I got the .space domain because it was cheap but now it's jumped
significantly in price ... so i'm going to attempt to move to zedzone.au.
This might upset things for a little while but so be it, it's not
like this is a heavily trafficked site.
More Legs
Tomorrow will be 7 weeks since I had the hip operation. It's been
one of those "I can't believe it's been 7 weeks, but shit a lot has
happened" periods.
They told me not to put more than 10kg weight on the operated leg
for 6 weeks - but after 3 weeks of bloody crutches I gave that idea
the flick. A couple of days I walked all the way into/out of the
city - about 10km round trip - entirely on crutches but it just
seemed absurd. About a week using a walking stick but then my hand
started going numb so I mostly just hobbled around after that.
After 5 weeks I got on the bicycle. At the 6 week mark I had a
review and the surgeon was very pleased with my progress but I was
miserable with the ongoing pain and news of lifelong movement
restrictions. I'm seeing a physio on Monday so hopefully I can get
a better idea of what they are, plus maybe some exercises and
guidance on "overdoing it". So far the directions have been vague
and pretty useless although the surgeon did say I could get on the
bike again (which I had already obviously).
Mostly i'm just frustrated as fuck at this point. The pain is
taking a long time to recede, it hurts to sit, it hurts to lie on my
back or operated side - and well even the non-operated one. This is
intefering with my sleep which was already shithouse. It hurts to
sit playing playstation or drinking beer, it hurts afterwards if i
ride too much, walk too much, stand too much. It's difficult and
painful to tie my shoelaces or cut my nails. OTC painkillers don't
really do much - hell even when I was in the hospital it took 20mg
of oxycodone to do anything and they thought I was a junkie or
something. As far as I know i'll never be able to sit cross-legged
on the floor ever again - maybe that seems trivial but what am I
suppose to do, carry a fucking chair everywhere I go? How do I do
my go-to hamstring stretch when my back gets too stiff?
So i've been walking a lot anyway - it seems to help with the pain
to some extent. At the 6 week mark I started back on situps and
pushups - at least I can do those. Riding a bit when the weather is
good.
But mostly i'm drinking way too much and with the lack of sleep
being moody and sulky and feeling lonely and depressed - so pretty
much like every other spring.
One of my main haunts closed down - arsehole landlord in dispute
with his daughter even though the business was keeping things in the
black. With 4 weeks notice for everyone it was pretty shit. I
managed to make it to the closing even on crutches but it's left
quite a hole in the community and a hole in my heart every time I
walk past.
And one of my nephews was recently in an absolutely horrific car
accident - ute rolled end on end and he was thrown 15 metres from
the vehicle. By all measures he should be dead but somehow he's
still physically alive but his brain ... well things are bad. His
eyes open but nobody's home, he can't breathe or eat without tubes.
I just hope his needless suffering isn't extended by his idiot
father and stempother (they're "preppers") who think he needs to be
kept alive at all costs and it turns into another Terri Schiavo
indicdent. That's if the pneumonia doesn't do him in soon.
THR coming up
For various reasons it's been a fairly involved few months but now
life is back to anhedonia, insufficient sleep, and excess alcohol.
But in 5 days that all changes again as i'm finally going under
the knife for a total hip replacement on my left leg.
I'm resigned to it but still anxious as fuck and well just
super-pissed off that I need to have it done all from a nothing
accident 3.5 years ago. It'll be a couple of months before I
can even think of getting on a bicycle again which is somewhat
limiting as that is my main mode of transport apart from walking.
By pretty much pure chance a niece wants to move to Adelaide so is
coming to stay with me for a bit - she can help with the basic day
to day stuff that I might not be able to do for a few weeks.
There aren't any real insights on how long it might take to
recover enough to get around, cook, and clean so I will just have
to see how it pans out for me.
I didn't drink for July (moral support for a friend at the time)
and lost a bit of weight and spent a bit more effort on getting as
fit and strong as possible before the operation so physically i'm
about as prepared as i'm likely to be.
Mentally and emotionally i'm a bit of a wreck but oh well, one day
at a time.
Copyright (C) 2019 Michael Zucchi, All Rights Reserved.
Powered by gcc & me!