Java, FFT, Stuff

Just received what appears to be Intel hiring spam on the Beagleboard list. I hope technology like the beagleboard eventually wipes their sorry criminal arses off the planet, the last thing I want is to hear about their stinking jobs.

Blah.

Anyway. Java n stuff.

I kept working on my prototype and added the Apache Maths library mostly because I wanted a simple Complex type that was supposedly written by Java people who know how to write such things. Got a bunch of algorithms converted and just ran it a few times looking at the printf's just to admire my handiwork.

Then I saw the profiler button and thought what the hell, lets see how it works. Well apart from being pretty friggan kick-arse (I can't even remember the last time I ran a profiler over C, but I can tell you it sucked - Visual Studio had absolutely naught without 'paying extra' for some shareware crap) I noticed huge bottlenecks all over the place. For one my image loading was taking about 70% of the time (loads 10 images, does a few fft's over part of them). Well I was flying blind on that one so made a bit of a mess of it. Anyway, did something about that .. next ... the complex arithmetic and the array references were taking all the time. SST arrays are accessed through multi-dimensional capable getters, which are abstracted a couple of levels to an index calculation that loops over an array of strides. Yes very efficient. For some reason I imagined the 'hotspot' compiler could remove all that, but alas it cannot.

Anyway ... to cut a long and rather boring story short, I've thrown away apache.maths's Complex type and written my own that has mutable elements and is used just to hold values. I found a decent FFT library in Java and since that was pretty much the only reason I was using SST I also got rid of that and wrote my own 2D array classes, and re-wrote most of the code to use simple one-dimensional arrays directly.

Man it's ugly code - but it was pretty fugly using apache.maths.complex too - the C is much, much cleaner for the inner loops. For all that it's about 10x faster than it was and on par roughly with the C code (don't quite have the same processing to really compare yet), which i think is worth it. Even if it took most of the day, although I cleaned stuff up as I went too so it wasn't wasted. Well i'm about where I was with the C a week ago, but in the meantime there's been a 4 day long-weekend and now I can easily add GUI. I looked at gtk+ very briefly but it just looked like no fun at all. And besides being cross platform at this point is a big plus.

So along the way I found JTransforms - this is the nice Java FFT library. I'm not sure why it took so long to find - I looked all over the place and all I could find were dinky little implementations. And lots of people looking for decent implementations. And really old stuff. Maybe it's the age of Java, or maybe google just failed me yet again as a search service.Another tool I found along the way was ImageJ - actually I'd looked at it a few weeks ago but not in any great detail. The AWT interface is a bit clunky, but it is pretty fast. From that I worked out how to access the image buffers directly so sped up my image loading about 100x (well I was using 'getpixel' basically, now just a byte array directly). I don't really know if the gimp's implementations are particularly good/bad/or whatever, but doing the same sized guassian filter for example over a 1024x768 image noticeably faster in ImageJ.

... Visual Studio ...

... a rotting pile of

pig shit

Oh here's a funny thing which will probably be very easy to get used to. In the debugger (or even running the code), `fatal' errors like array bounds exceptions don't mean the end of execution. Oh yes I know, welcome to the late 90s, but .NET in VS was a pig, nay a bloaty fat diseased pig, oh lets not mince words; a rotting pile of pig shit, when trying to debug, and fatal errors were fatal too, so I forgot it doesn't have to be that way in so-called 'modern' languages. Not that Netbeans has been too kind to me, I've had a few pointer grabs and it brought system down twice (or maybe SELinux did in its own 'denial of service' attack for added security, if i understood the stack trace at all - it's now disabled), one pointer grab required hitting the power button to reboot, and the other required the reset button as trying to run any programme ended in an oops. Not having much luck debugging threaded code either (it just 'vanishes' if it has a fatal error).

Spent a bunch of time going through some GSOC proposals ... but I don't really know what to say and don't have any examples to go on. Boy that 'melange' thing they're using to track everything is a real crock of shit, really really slow and painful to use - I can't imagine how it works on a project with dozens of proposals. I didn't even realise there were more applications on another page of that fugly 'macos themed' javascript spreadsheet/table view thing it lists everything in.

Oh yeah and so much for getting some dry-work done in the back yard. Later that night it started bucketing down, and bucketing down, and it then rained a lot - around 32mm in a few hours - a lot of rain for here, and particularly loud on a tin roof. Barely a drop for months and then all that. The yard is still totally soaked 2 days later.

That's it for summer I guess.

Daylight savings just finished here - and it's already throwing me out. After plugging away at the computer all afternoon I thought it must've been 10 or something as it seemed to have been dark for so long, but it was only 7:30 ... Good time to call it a day I guess.

Yesterday I got a whole lot of work out of the way on the retaining wall, I should've done more today but I was too lazy. The weather was too nice too - although then I just spent most of the day inside instead of taking advantage of it. Once I sit down to hack it's pretty well game over ... Probably should've made the effort though - looks like weather is turning wet soon, and I need to get some of the stuff done whilst it's still try. Maybe an hour or two over the next two days might do it.

Well that's Easter done, back to work tomorrow.

WoofƆs

Was up pretty late hacking away on some WoofƆs stuff, and then continued it for most of today. For the most part it's just re-arranging stuff I already have from puppybits, or my previous x86 hacking. Pretty time consuming though, trying to tie it all together. It also needed quite a bit of re-thinking and re-jigging along the way.

I've got it all building, and it currently launches two tasks, one which immediately goes to sleep waiting for work to do, and the idle task which blinks the led. But that isn't really any more than I already have in puppybits, so I need to start testing the other stuff - per-process virtual memory, message passing, and so on. These things are a bit hard to play with in isolation since so much support is needed first. Hmm, I really need to get timers and timer interrupts working too, but I can test that in isolation.

The architecture which i've thought about so far does seem to be holding together at least - no big surprises have come along ... yet. Although that assumes the stuff i've written actually works too. So far I have a 'memserver' which is tightly bound to the kernel - it is basically a kernel thread since it accesses the kernel memory directly. It is used to create all in-kernel objects - directly adding them to the kernel in some cases, or where necessary through lightweight system calls to register the new resource. This lets me avoid any dynamic memory in the kernel itself, and by using the right data structures or the occasional lightweight system call I don't have to worry about serialisation either (well, once i've got it right).

I had been thinking about all sort of exotic data structures like trees or hash tables to locate resources based on an object id - but these all have various issues. Execution time, serialisation, and so on (I have implementations that use no dynamic memory, so that wasn't an issue). So in the end I settled for a simple array for many of the objects - it probably uses less memory anyway, and certainly needs less code. It also allows me to update and access it atomically from multiple threads of execution without trouble.

The whole 'kernel' is only 10 system calls so far - and although I still need a few more for interrupts, it shouldn't be many ... I think. Of course most of the work is in the servers, and the kernel is just passing data around. Even in those 10 I already have some 'helper' syscalls too - they aren't strictly necessary but combine a couple of system calls into one, there is probably scope for some more of those.

I had a little chicken and egg problem with message allocating - you need to send a message to the memory server to allocate new messages before they can be sent via the kernel. I think I have worked out a solution - part of the process start-up will be to send a system message to the new process. This message can then be used to ask for more, or just as a general purpose message container. Still, I might need some other more direct mechanism since message ownership passes every time they are forwarded, and it is possible for them to get `lost'.

I might have to look into a mailbox mechanism I guess, although I don't want to have too many options for IPC. After I'd written the basic message allocation system I thought it would be a bit bulky to do too often - well the point is not to in the first place, but it seemed a bit heavy. I looked into a simpler mechanism of message passing using limited registered arguments. But it just didn't seem that useful - because you can only pass primitive non-pointer types. As soon as you want to pass buffers around the mechanism falls down, and you need even more complex support code (like 'far' copies), which then needs to perform extra security checks, and so on. One `freebie' of the message passing mechanism as i've envisioned it so far is it sort of self-checks. Nothing the kernel deals with needs checking in detail because any addresses are from trusted sources or have already been verified, or just integer representations of virtual addresses it never looks at. And servers can perform fairly cheap checking since all data must fit in a fixed bound.

Might sit on all that for a while now.

New project

Finally cleaned up and checked in the kernel start code, and a simple demo which uses it.

I still need to do some prototyping of various low-level components before I can make a whole, but I might kick off the part of the whole I can do already because i'm getting a bit bored of just working on these small fragments.

Keeping with the canine theme, i'm going to call it ...

WoofƆs

Or maybe WʊfƆs, although I think the larger one is more aesthetically pleasing (the letters are based on the international phonetic alphabet if you were wondering).

I'll probably just keep it in the puppybits project, although keep the code separate. With work and home being as they are I haven't had much time nor energy to work on this stuff, so things aren't going to progress at any great rate. Speaking of which I think it's time to hit the yard and get dirty shoveling. Two days of Easter down and all i've managed so far is ~97km of cycling, a hangover, and an improved tan.

CentOS isn't worth the paper ... oh hang on

After a few more hiccups I finally gave up on CentOS on my workstation last week - I don't mean to diss on the CentOS project and people as such, but I have to say I had a pretty negative experience with it and probably wont be considering it again (there just isn't any need). I hadn't tried it before, and thought i'd give it a go, mostly because I was expecting simple reliability.

Unfortunately not.

Lack of packages: Just how limited was a little bit of a shock ... but this was something entirely reasonable and had no major problem with this.
Lack of stability: This was both shocking and surprising. I had a lot of stability issues from day one. Primarily that xterm's constantly locked up when trying to do ordinary things like 'man blah' or 'less foo' (and I need to use the real xterm - the other `desktop project' knockoff's are unusably slow at full screen, and barely usable even at 80x24). I got around it using info, or emacs shell but it got old very fast. Firefox also had a habit of grabbing the pointer if I accidentally dragged something which is too easy to do. For the most part I loathe DnD because generally it's too easy to trigger and used in some really stupid places, and where it could be useful - e.g. dropping files into an application or filename selector, it isn't implemented, sigh. I wish GUI designers of the free `desktops' had a wider GUI experience than just MacOS and Windows 95.
Bloody slow (not just slow): Ok I'm a bit spoilt with my home workstation and it is decidedly slower hardware, but this was a little ridiculous. I can forgive some of it because I chose to encrypt the home partition, but I really have no feel on how much that makes a difference. I'm sure EXT3 is something to blame here too - if anything demonstrates that the Linux project is no true meritocracy it is that this steaming pile is still the main filesystem of choice when so many more advanced and reliable filesystems have existed for years!
Updates broke it: What finally broke the camel's back was turning it on one morning and having no X any more because the update manager prodded me to install a different Linux and I wasn't paying attention. I am 'partly to blame' because I am using the proprietary nvidia driver (the other one just didn't work, and I need OpenCL/CUDA or something down the line too) from livna or something or other - but I haven't had quite such problems doing similar things in the past that were so difficult to resolve. After mucking about for an hour or two I gave up and just installed Fedora 11 (the newer ones are just fucked up too much to be slavagable within my patience band). That ended up taking the whole day (made a few mistakes along the way), but it was definitely worth it.

Now I don't know how much CentOS differs from RHL, but this doesn't leave a good impression of that either. The whole point of using old software with limited packages is for stability and reliability and it was on those two particular points that I had the most negative experience. Probably the worst I've had with any OS since automated package management existed (~10 years).

I've moved to Fedora 11 and have had no issues since. Faster, more stable, more packages, blah blah blah.

X, Java, Netbeans

Well, while i'm dissing on free software projects ... I don't know what they're doing wrong but OpenJDK has some major performance issues that simply make Java (and anything that uses it) look really bad ( in all fairness I am using Fedora 11, so it is certainly quite an old release ... ). I am doing some stuff in netbeans (6.8) and although usable for the most part, had a decidedly lurcherous gait which only got worse over time till it was barely usable. I resorted to using emacs for editing (it's still a much better basic editor), jumping back to netbeans only to look up methods (JavaDocs as useful as they are are a bit unwieldy to navigate, particularly in a web browser). I suspect it has more to do with something basic like the interaction with X than anything else - and we all know what a mess of The X Windows System that f.d.o are making so it probably isn't entirely `their fault'.

So I finally got sick of that and after a bit of searching on the net and trying "-Dsun.java2d.pmoffscreen=false" to no big effect, the only thing seemed to be to try the Sun JDK. *Cough* Oracle JDK. Apart from a different way of selecting fonts that initially made it look a bit crap, and a generally lighter appearance in the text you'd think I just upgraded the computer to see the difference.

It's like night and day. It might use scads more memory than a C app but it feels just as snappy and responsive (at least in comparison to how it was, hard to guage). The thing is, having such a poor user experience from the OpenJDK just continues to give Java a bad name when it just isn't warranted, and hasn't been for a long time now. I was originally just going to implement a GUI in Java (or maybe just a prototype) but I might try a few tests of the algorithms to see how it compares to the C code - it spends most of it's time in FFTW anyway at present. It's a real pity that it doesn't support a primitive complex type, but that isn't the end of the world. I'm using the Shared Scientific Toolkit at the moment, although mostly just for the arrays and fft ops (that's all the code really needs).

CMake

Oh god what is this pain in my head ... yet another messed up build system. One that provides all the craptascialness of Ant for your C++ projects too! Jesus, who comes up with this shit? I had some issues building SST - it turned out I just needed to install the static FFTW library, but along the way I had my first experience of CMake. About the only thing going for it is that it doesn't use XML. What I don't understand is why these things come along and break the whole point of make files? During the build something failed - so I worked out how to manually run the command and the command ran ok. Fine then, just run make and let it continue ... oh no, that would be too fucking simple wouldnt it? It uses some fucked up meta-system for tracking build dependencies, so it just re-builds the whole directory again, again failing on the final link. Then I ran a different make target which built ok ... but wasn't quite what I needed. No problem, try the original build target ... 'target up to date' ... huh?

THIS IS TOTAL SNOT!

The one and only point for any make system to exist in the first place is to guarantee your builds are consistent without having to recompile everything every time. For everything else you may as well just use shell scripts - which are conveniently embedded inside a Makefile. I understand CMake does some other stuff, and also provides a `shell script' environment that supports broken operating systems, and that idea has some merit - should you wish to interact with such a broken system at least - but if it can't get the basics right what use is it?

Ant is another nightmare all of its own - not only does it not do any dependency checking whatsoever (one of the most critical features of any build system), simple human readable shell scripting is replaced not only by a disastrous and unparsable XML scripting language but also by dynamically loaded Java modules! I can understand wanting to use your favourite language for everything, but Java is not a scripting language and sometimes there is a right tool for the job.

completely broken ...

... simply worthless.

... CMake ... Ant ...

I consider any tool where you must occasionally `make clean' to get a reliable build to be completely broken and simply worthless. CMake seems to get around this absurdity by abusing 'ccache' - another painful bit of kit that shouldn't need to exist (the limited use-cases for which ccache provides any service can mostly be done in other much simpler ways - assuming you have a reliable dependency mechanism in the first place). And Ant gets around this by the fact the java compiler is quite fast anyway, and already does the dependency checking - but only for the Java sources, and any real project has to do more than just compile objects and link them together. One often has to 'clean and rebuild' in all the GUI IDEs i've used, but there should be no reason to ever need this unless you're manipulating files outside of it.

Idiot. Is this you?

What I find disappointing (vexatious, alarming, and upsetting too) is that people refuse to learn a basic reliable, flexible tool, and then come up with their own which doesn't even solve the original problem. `I can't use an editor to control the tabs in my file' is really an idiotic and puerile argument (if you do think that, then yes i am in fact calling you an idiot. Idiot.) I suspect if they understood the original problem they wouldn't have bothered to inflict all this crap upon themselves or the rest of us in the first place and wouldn't the world would be a better place for that?

I understand some people think make is a bit arcane .. but it isn't. XML is the definition of arcane - lets design a format which makes it easy to write parsers, for developers. You know that tiny bit of code that gets written once and forgotten about because nobody fucking cares once it works. It's not like they even got that bit right - the parsers are large and complex and the code you write using them ends up being large and complex too (and it's all slow). But the real winner is that this then inflicts all the nitty-gritty which makes it 'easy to parse' onto the user for whom it is decidedly not easy to parse, or write, or even design. Now that's arcane.

But I digress.

The auto*tools are a big bit of nasty, but at least it's only one (perhaps broken) system to learn, not 'n' (definitely broken) systems where 'n' is monotonically increasing. And there are plenty of application projects for which it shouldn't be needed (but gets used anyway, sigh).

Woke up too early

Shit, I was supposed to have some bricks delivered this morning - I was up way too late and got little sleep worrying about making sure I tell them to put it in the driveway to avoid 3 hours of back breaking work moving them. I suppose i should go get some groceries, or I might just avoid the shops which turn into a nightmare of panic buying just before Easter since many shops are shut - GOOD HEAVENS - for 2 whole days in a row. How ever will we all survive for so long without being able to buy more shit!

Bye bye CELL

Well, I guess that's really the last nail in the coffin for CELL.

Sony's just announced that the next firmware `upgrade' for the PS3 will drop Linux support (and it's so important, that's all it will do). This is very, very disappointing. They blame it on crackers or 'security', but it's obvious it is just a cost cutting exercise. Sony have been hurting financially for a while now, and the razor gang is out with their daggers looking for savings.

After the `ps3 slim' dropping support (due supposedly to lack of resources to write the hypervisor drivers), and then IBM dropping CELL for HPC ... I guess the writing was on the wall. I'm glad I gave up development on CELL BE some time ago and got hold of a beagleboard instead - overall it's been a more satisfying experience if only because things are simpler. The whole CELL thing was just a costly mistake for all by the looks of it - being a bit ahead of it's time lead to a few limitations that people couldn't cope with.

Even though I rarely use it anymore, the whole thing plainly stinks - this is not a device I rent, I bought it. And for them to come into my home and remove functionality (advertised on the box no less) from a device that I paid for in full (well over-paid) should simply be illegal, if it isn't already.

I can't even log onto the Sony blog to fruitlessly whine about it because they've changed the login system to some horrid mess that takes ages to load and only shows blank pages (I bet it works on ie6 though, if the comments in the page are anything to go by). Well if they don't want me as a customer it isn't really my loss is it?

Time to remove CELL BE from the subtitle of this page at least; not that it has had much point in being there for quite some time.

Damn numbers

Hmm, that was frustrating.

Have been trying to write a `kernel' boot header - one that sets the MMU up for the kernel to execute at another address (0xC0000000) and then jumps to it. Been very tired from sleeping poorly and a bit brain-dead after work so I haven't been really switched on, but it's been dragging on so much I was about to give up (well not really, but it felt like I should).

Apart from a few little bugs, i was using the wrong TEXCB/AP flags for the level 2 page entry for devices ... but I don't know why it's wrong. It seems to check out in the manual, but for whatever reason it just crashes the code (FWIW I was using 0xb2 - 'non sharable device, rw everyone' rather than '0x16' 'sharable device, rw supervisor only). Blah. One little number change and now it works. $@%!$#

I plan to use the two translation table mode, which means the system memory will start at 0x80000000 - so it may make sense to just identity map the kernel at that address. But for now the memory map will have the kernel at 0xc0000000, and i'll start shared libraries or something else at 0x80000000.

So here it is ... in hindsight I may have done things in the wrong order, but this way makes things easy. I set aside some memory in the BSS section for the page tables and let the linker manage allocating space for them, also for the I/O devices - although this means a couple of physical pages are lost at present.

There is a few little `tricks' that I use so the code is position independent, although there are possibly better ways to do it. The init code has to be position independent because the linker script is set up so that all the code starts from the same virtual address - it could be done otherwise, but then I would need an ELF loader to relocate the image - which is somewhat more work.

 _start:
        adr     r12,_start              @ this will be physical load address
        mov     sp,r12
        push    { r0 - r3 }

First I just setup r12 and the stack to point to our load address - which is 0x80008000 as set by the linker script. This gives the code a fixed location from which to calculate physical and virtual addresses. The incoming arguments are saved too - although nothing uses them yet (das u-boot can pass in arguments or information about modules or filesystems it preloaded into memory).

        ldr     r1,bss_offset
        ldr     r2,bss_offset+4
        add     r1,r1,r12
        add     r2,r2,r12
        mov     r0,#0
1:      str     r0,[r1],#4
        cmp     r1,r2
        blo     1b

Clear the BSS - the code reads a relative offset that the linker creates, that indicates where the BSS starts and stops, and then uses r12 to map that to the physical address. The ldr r1,bss_offset is assembled into a pc-relative instruction so will work no-matter where it's loaded.Then there is a loop which uses a table to initialise the page tables. I first need to find the space within the BSS where it is stored, and then iterate through the entries. Each range is defined by a virtual target address, a start offset relative to _start, a virtual end address, and the `small page' flags for the pages.

        ldr     r11,ttb_offset
        add     r11,r12                 @ physical address of kernel_ttb
        add     r10,r11,#16384          @ same for kernel_pages

        adr     r9,ttb_map
        mov     r8,#ttb_size
1:      ldm     r9!, { r4, r5, r6, r7 } @ virtual dest, start offset, virtual end, flags
        add     r5,r12                  @ physical address

2:      mov     r3,r4,lsr #20
        ldr     r2,[r11, r3, lsl #2]
        cmp     r2,#0

If the l2 page isn't set yet, then just allocate one and update the l1 entry.

        moveq   r2,r10
        addeq   r10,#1024
        orreq   r2,#1
        streq   r2,[r11, r3, lsl #2]

Form and store the l2 page table entry.

        bic     r2,#0xff                        @ r2 = physical address of l2 page
        mov     r1,r4,lsr #12
        and     r1,#0xff
        orr     r0,r5,r7
        str     r0,[r2, r1, lsl #2]

And then loop for all the pages and all the entries in the table. Here I compare for equality for the end address - I do this so I could map the last page of memory if I wanted to. But currently I don't use this.

        add     r4,#4096
        add     r5,#4096
        cmp     r4,r6
        bne     2b

        subs    r8,#1
        bne     1b

That's really the meat of it - the table has the smarts in it, and uses the linker to create the interesting values required.Then it just turns on the MMU - this could probably be simplified as I can just enforce the state I want (i.e. don't bother preserving bits). Putting 1 in CP15_TTBCR means that two page tables are used, the TTBR1 table is used for any address with the top bit set (i.e. >= 0x80000000).

        mrc     15, 0, r0, CP15_SCTLR
        bic     r0,#SCTLR_ICACHE
        bic     r0,#SCTLR_AFE | SCTLR_TRE | SCTLR_DCACHE | SCTLR_MMUEN
        mcr     p15, 0, r0, CP15_SCTLR

        mov     r0,#0
        mov     r1,#1

        mcr     p15, 0, r0, CP15_TLBIALL
        mcr     p15, 0, r1, CP15_TTBCR          @ Top 2G uses TTBR1   
        mcr     p15, 0, r11, CP15_TTBR0
        mcr     p15, 0, r11, CP15_TTBR1
        mcr     p15, 0, r0, CP15_TLBIALL
        sub     r0,#1
        mcr     p15, 0, r0, CP15_DACR

        pop     { r0 - r3 }

        mrc     15, 0, r8, CP15_SCTLR
        orr     r8,#SCTLR_MMUEN
        mcr     p15, 0, r8, CP15_SCTLR

This last instruction turns the MMU on (and will probably eventually turn on the caches/etc). The input arguments are restored before turning on the MMU since the stack memory will no longer be valid or mapped (actually I should probably map the same 32K to the system stack wherever I decide to put that). The CPU now flushes the pipeline and starts executing instructions from the current pc - but with the MMU on. Because of this the code has to ensure this instruction is still mapped to the same address otherwise it's a one-way trip to la-la land.In this case the ldr pc,=vstart will force the assembler to generate a constant load from the constant pool (via a pc-relative load). The linker will set this constant up to point to the virtual address properly.

        ldr     pc, =vstart

Now come the relative offsets used to locate the BSS range, as well as the page table memory from within BSS.

bss_offset:
        .word   __bss_start__ - _start
        .word   __bss_end__ - _start
ttb_offset:
        .word   kernel_ttb - _start

And then the important stuff - the page table mapping descriptions. Rather than store the 'virtual end' address it could probably store the length of the address range, but so long as they are aligned properly it doesn't really make much difference. Note that even with the relative addresses any range in memory can be accessed using the simple arithmetic that the linker supports.

ttb_map:
        @ this page, so mmu can be enabled
        .word   LOADADDR, 0, LOADADDR + start_sizeof, CODE
        @ kernel text at virt address
        .word   __executable_start, 0, __data_start__, CODE
        @ kernel data
        .word   __data_start__, __data_start__-_start, __bss_end__,DATA
        @ system stack, 32K, 4K from end of memory
        .word   0 - 32768 - 4096, 0x8000000 - LOADADDR, 0-4096, DATA
        @ i/o of gpio, for debug too (LEDs!)
        .word   GPIO5, 0x49056000 - LOADADDR, GPIO5+4096, NDEV
        @ do serial port too, for debug stuff
        .word   UART3, 0x49020000 - LOADADDR, UART3+4096, NDEV

        .set    ttb_size, (. - ttb_map) / 16
        .ltorg

The .ltorg ensures the constant pool is stored at this point, so we can guarantee they are within the one page which needs to be identity mapped immediately after turning on the MMU.

vstart:
        ldr     sp,=-4096                       @ init stack
@       bl      __libc_init_array               @ static intialisers
        mov     r8,#(0xf<<20)                   @ enable NEON coprocessor access (still off though)
        mcr     p15, 0, r8, c1, c0, 2
        b       main

And this is the 'virtual address' entry point. This could just occur immediately after the setup code, but separating it makes it more obvious it's separated. About the only necessary setup is the (system) stack pointer. I was going to place this at the end of the virtual memory but having it one page back protects from stack underflow as well.

And finally there is the size of this code, and the BSS which stores the bare minimum so I can set it up and see it works (i.e. the UART or blink the LEDs).

        .set    start_sizeof, ((. - _start)+4095) & 0xfffff000

        .bss
        .balign         16384
        .global kernel_ttb, kernel_pages, UART3
kernel_ttb:
        .skip   16384
kernel_pages:
        .skip   1024*32
GPIO5:  .skip   4096
UART3:  .skip   4096

And ... it's done. Phew.

Unfortunately this means all my 'library code' that uses fixed physical addresses wont work any more, including the debug printing stuff. But that's something to worry about later.

One goal I had was that code isn't just setting up the page table to be thrown away later - this is sufficient to remain the kernel page table forever. Either for a supervisor level kernel process/threads, or for in this case as the `system page table' which is used for any address above 0x80000000. It still needs a little tweaking - the page table should be write-through cache-able for instance - but now it works I can worry about the details. Well now hopefully I can move on to more interesting things.

Interpolating arbitrary values

For work I have been playing with a few things of some interest. I thought I needed a function that could interpolate a set of values spread across an arbitrary 2d plane into a grid of values. I came across this interesting implementation of Thin Plate Splines which seemed to do the job. Unfortunately it turned out that I needed to interpolate more values than is practical with this algorithm (it does it, it just takes too long), and I can probably just force the values to be in a grid anyway so I can use much simpler methods. But still, this is an interesting algorithm to have in the toolkit and it produces pleasant looking results. Interestingly I found the C++ 'ludecomposition' code too messy to convert to C (i'm using different data structures) and just used the Java one it references as a starting point instead. It was much more C-like and translated in a very straightforward manner.

So I wrote a basic bicubic interpolater - the code uses bilinear at the moment although in an inconsistent way which doesn't really work since values can be missing. I was hoping bicubic would be a more natural fit for what it is doing, and worry about the missing values later. Unfortunately it doesn't seem to help much - the input data is just too noisy/inconsistent so I guess there is more to fix first (sorry this doesn't make much sense, I can't really say what it's trying to do).

Walls, dirt

I have some photo's of the progress on the retaining walll but i'm too lazy to put them up today. I got some ag-pipe on the weekend, so I'm just about ready to back-fill at least some of the wall (i don't think I have enough gravel to do the whole lot, but i'll see), although I'm not sure where to run it - and an outlet mid-way along the wall i've already laid will be a bastard! I was going to have it coming out the ends but now i'm not so sure. I need to decide so I can get the right fittings too (which for some reason are rather expensive for what they are).

Boral are having a sale on bricks and whatnot this week so I went and ordered another pile of retaining wall blocks (40% off makes it worth it, even if I don't need them for a while). I wasn't really sure how many I needed to start with, and I used a lot more than I thought originally (just the main wall uses most of them). I have a better plan on what I want to end up with now, so hopefully I got it right ... I guess I can always put them around trees or something if I have too many, or create a lower wall if I don't have enough.

Since I wont need to use them for a while i'm going to try to get them delivered into the driveway - so I don't have to move them off the verge by hand. So today I also moved the rest of the roadbase off of the drive-way to a pile out the back. Unfortunately I overloaded my cheap wheelbarrow and it turned over and I bent the handle (well it was only $60), but it's still usable. If I get stuck into finishing off the walls around the paving area it will get used up pretty fast anyway - of the 3 tons I probably have under 1 left. I'll get the bricks before easter, so it could be a very long long weekend if I get stuck into it ...

About Me

Tags

ARRGH!

Java, FFT, Stuff

That's it for summer I guess.

WoofƆs

New project

Woof!

CentOS isn't worth the paper ... oh hang on

X, Java, Netbeans

CMake

Woke up too early

Bye bye CELL

Damn numbers

Interpolating arbitrary values

Walls, dirt