Roar.
Hmmmm, well I was wrong about why newlib was crashing. It's because the 'loader' is just executing the code in place, and the default linker script (that i've modified slightly) was re-aligning the data segment load address, but not the file offset. So again the data wasn't aligning with where the code expected it (rodata was still, which was confusing of course). For now I just removed the realignment of the data segment and yay, libc now seems to work.
It was still crashing in C++ though.
And of course, I downloaded the wrong version of the toolchain, so I didn't have the source to go looking through. But while it's downloading the right version I poked around on the cvsweb interface for newlib and found the right bit of startup code anyway. Ahh, missing a call to _libc_init_array.
Still crashing.
Hmm. At this point i'd developed the exception handler to dump the registers and some of the stack and was looking at finding out where in the C++ code it was crashing, so I recompiled with no optimisations and debug on. Damn, now it started working!
Mostly at least. Something is still wrong with bss in it or something - every time I compiled it it displayed using a different zoom/shear/rotations. Although if I set those explicitly, it all works fine.
Not terribly fast - every lion takes just under a second to render.
I've identified that the C code can't be compiled with optimisation -O2 or higher, so assuming there's no compiler bug, there's something in the dodgy setup code that's out of whack (C++ is compiled with O3 fine, but it isn't doing any of the hardware banging). But that doesn't surprise me in the least (missing volatile's and the like), and for now it can stay being unoptimised.
I've kinda forgotten why I wanted to render 2D polygons in the first place. I think it was to avoid storing big bitmaps, but using C++ seems to drag in stdio (for exception reporting) which blows code size out to about 200K anyway. Might be better off with a small PNG decoder instead.