Well said.

The Haiku mailing list is the strangest list I ever did subscribe to.

This post from Justin is completely super-fantastalistic. Very good read, if completely off-topic.

Still, this thread was basically the last straw for me at the moment on the Haiku list (but not enough on it's own) ... I guess I may return one day though

Update 1/4/2010. I've edited this post to change the tone a bit - I shouldn't have posted intoxicated as usual, or at the insane hour of the morning I did either, IIRC.

Read an enjoy.

from Justin x

reply-to haiku@freelists.orgto haiku@freelists.orgdate 22 March 2010 02:32subject [haiku] Re: The Holy Biblemailing list haiku.freelists.org Filter messages from this mailing list

This whole discussion is irritating... when followers of these faiths ignore the fact that they have mutually exclusive religions and the claims of the Christian bible for instance specifically demand the death of false prophets etc (not to mention killing gay people, atheists, witches, anyone who tries to convert you, your own children if they are unruly, rape victims (or just forcing them to marry their rapists), etc and don't try your "Jesus changed that" crap because I can point out to you a litany of verses where he specifically states that he did NOT come to change the old laws but to ensure their fulfillment until the end of time... EVERY JOT AND TITTLE etc... and how he numerous times references the validity of those old laws INCLUDING THEIR PUNISHMENTS OF DEATH etc. So don't even try that ignorant and wrong excuse).

Islam is the same way in it's treatment of Kuffar etc... death to apostates (as Sharia law demands that in a Muslim country under Sharia law if you say that you are no longer a Muslim the punishment is to be killed. How enlightened and civilized!), conversion or death for non-Muslims etc... not to mention that Islam specifically denies the divinity of Christ, who for Christians IS God etc... so that COMPLETELY denies a shared belief in the same God... as they're just two absurd variations on an older myth... just like Mormons in the US today are an even more recent absurd fallacious re-invention of older mythology... exactly as what Judaism was to the pagan polytheistic pre-monotheistic god of Moses... what Christian was to Judaism... what Islam was to Christianity... what Mormonism is to Christianity... just newer and newer absurd twists on an already absurd mythology invented by painfully ignorant barbarians who knew less about the natural world than small children know today.

Not to mention that when it comes down to it any one of the followers of any of these faiths INHERENTLY denies the validity of the others... but is too stupid to actually comprehend that the very reasons they deny all those other religions are the exact same reasons that apply to their own stupid fairytale. They just can't put two and two together because they're brainwashed, compartmentalized, idiots who utterly lack the ability to apply logic, reason, critical assessment etc to their own beliefs even when they do use them to dismiss other beliefs.

And to top that off, again, you have people like this who smile and act like it's some happy thing that you believe in the same stupid fairytale in spite of the fact that in reality you DENY the validity of their belief and their different interpretation on a completely fabricated mythology.

So when I see grown adults behaving like idiots who believe in imaginary gods invented centuries or millennia ago by ignorant bronze age tribesmen etc... and then ignoring what the very holy scriptures they're tittering about actually say etc... and end up just patting each other on the back because their mutually shared delusion stems from the same older absurd mythology... and that they have in common that ignorant and irrational belief in the supernatural etc...

I honestly didn't expect to run across so many idiots on a mailing list for a not well known, in-development, operating system.

And for the record I actually hate Satanists... I made the earlier post just to illustrate the principle of people talking about a pretty stupid topic on a mailing list that other people might find offensive for its stupidity and implications. The reality is that you can't even legally reprint the Satanic Bible because it's still a copyrighted work, having only been written in 1969 etc... but I'd wager that probably nobody in this drooling bunch of idiots actually knows much of anything about the actual Satanic Bible... or the fact that true Satanists who are actual members of the Church of Satan DON'T ACTUALLY BELIEVE IN THE SUPERNATURAL AT ALL. They don't believe in a literal being called Satan and merely refer to him as a symbol of rebellion, self importance, etc... but pretty much not a single religious person I have met knows any of that because they actually do believe in make believe mythological beings and invisible all seeing men etc... and they spend so much time lying to each other about the boogieman that they never bother to actually educate themselves not only about the actual REAL history of their own religions and what their own scriptures actually say and command them to do etc... but certainly not anything that might ever, in any way, pose a threat to their delicate bubble of profound closed minded IDIOCY.

And I say all this as a person who was a Christian and youth minister for over 20 years.

I hate you people for being so damn stupid and for publicly blabbering about it like it's something to be proud of... and for having the false impression that you have some right to be exempt from being ridiculed as the idiots you actually are. That people like me need to sit and watch you publicly make asses of yourselves and seek to get together to try to bring your idiot cult and its literature to a new OS to try to share and spread your profound retardedness and brainwashing to future generations.

Like try this on for size you morons...

We know for a FACT that the Earth was not created 6,000 years ago... much less the whole universe. We now FOR A FACT that humanity did not suddenly appear 6,000 years ago in a magic garden in Mesopotamia. And because we know FOR A FACT that claim is false... we know that Adam and Eve weren't there strolling around and talking to a talking serpent either. Nor were there magic fruit trees that they ate from BECAUSE THEY DIDN'T EXIST. And because we know that, we know that they didn't eat that fruit and commit the original sin because THEY WEREN'T THERE. Thus there was nothing for the made up God to be mad about.. nothing for him to create himself as his son to forgive us from because it never happened because they were never there.

Can your tiny minds follow that train of logic?

http://en.wikipedia.org/wiki/Age_of_the_Earth

http://en.wikipedia.org/wiki/Human_evolution

http://en.wikipedia.org/wiki/Timeline_of_human_evolution

http://en.wikipedia.org/wiki/Human_mitochondrial_DNA_haplogroup

By the very same logic and facts, we know that Noah's Ark NEVER HAPPENED. Not only is it laughably absurd from a physical and logistical standpoint... but that much water has never in the history of this planet existed here on Earth. And had it actually rained that much (not to mention the laughably stupid claim that the bible actually makes about the heavens, the firmament, actually being a mechanical dome over head to hold the literal ocean above us up from falling on us... in which god opened up mechanical floodgates to let all the water pour down to flood the Earth and then made it all evaporate... but I digress...) had it actually rained that much and then evaporated in that span of time it would have saturated the entire atmosphere up into the stratosphere to over 100% and blocked out the sun completely and frozen the entire Earth wiping out life as we know it... and this all supposedly happened in the last 6,000 years... you know when the Egyptians had a kingdom etc... and these other civilizations failed to notice a world wide flood and subsequent world encompassing life obliterating ice age etc... IT'S LAUGHABLE at how absolutely detached from reality you people are that believe this tripe... and these weren't figurative stories either... as we know from a wealth of contemporary archaeological evidence that the people of those times LITERALLY BELIEVED these things because they were IGNORANT and knew essentially nothing about the natural world so they made up answers as best they could to describe the world around them and took it as fact... such as the fact that up until a few centuries before Christ they all still believed that the Earth was a circular flat disk. They LITERALLY believed it because the earth looked flat to them. All the older Old Testament books in the bible are written from that perspective and contain references to the Earth being flat etc.

But I digress...

When people are faced with reality and facts like this which threaten their stupidity bubble, they jump to make Ad Hominem personal attacks... to argue style over substance... to ignore the burden of proof for their ridiculous claims... to use special pleading to try to claim exemptions for their own idiotic beliefs that they don't allow to be applied to anything else... they erroneously and without any evidence presuppose the validity of their wrongheaded mythology and then use that unsupported presupposition to then fallaciously dismiss facts and actual solid real world evidence that disproves their claims.

They hem and haw, they try to claim Pascal's Wager... they intentionally (or simply out of ignorance or stupidity) confuse general deism with faith in a specific dogma such as Christianity etc... kind of like you idiots are doing giggling about how you share the same God etc... completely failing to recognize the very mutually exclusive and specific claims made by your various faiths etc... (and such fundamental schisms also exist WITHIN the faiths... Sunni and Shia... Catholic and Protestant... and people still MURDER EACH OTHER TODAY over these stupid and worthless disagreements over IMAGINARY SHIT.)

I could go on and on.

You people are IDIOTS and I resent you for bringing your idiocy to this list.

I would expect that people who are interested in an operating system such as this would be a little more educated than average... a little more information savvy etc... that people would be capable of pulling their own heads out of their own asses and seeing the absurdity and utter lack of knowledge they have about the world and their own beliefs as a part of that very real world etc.

THINK.

PLEASE.

Good day.

- Hide quoted text -On Sun, Mar 21, 2010 at 11:20 AM, Mattia Tristo wrote:
Very nice, i think we belive in the same God.

Good work to you and cheers
Mattia Tristo 2010/3/18 Nicholas Otley
Hi Mattia,

I can't help you on the Bible, but I'm planning on developing The Holy Qur'an and Nahj al-Balaghah for Haiku (Arabic and English).

Take care,
Nik On 18 March 2010 18:27, Mattia Tristo wrote:
Here in the italian mailing list we are discussing about developing the Holy Bible for Beos/Zeta/Haiku, there is someone who want to develop it in English too?

Hope don't wasting your time, let me say best regards to you
Mattia Tristo

Google Summer of Code for BeagleBoard

Here's some good news - BeagleBoard.org has been accepted for the Google Summer of Code programme. At the request of the administrator Jason Kridner i've also signed up as a possible mentor.

There is a gsoc group, a page of project ideas, and there's also there's the list of existing project ideas.

I'm not really sure which projects I would like to mentor given the opportunity, but there are a few that piqued my interest:

Minix 3 Support - although TBH I've looked at the latest incarnation of the Minix 3 source, and it is very heavily tilted at x86 processors and not arranged in any particularly extensible way. It is also a far cry of untidiness compared to the Minix 3 of Andy Tannenbaum's book. Support for RTEMS (and embeded realtime os) could also be interesting although I don't know much about it.
JTAG Debugger. Unfortunately I myself don't know much about it so probably not for me to mentor - but it would be nice to have a free software hardware debugger.
Improved bootloader support, which mainly falls down to implementing a USB stack for a bootloader so USB ethernet and USB mass storage become available devices. There are currently no free and re-usable USB stacks outside of the main free operating systems so this could be useful for other reasons, including selfish ones.
Any of the many DSP acceleration or NEON acceleration projects. e.g. OpenVG on DSP and/or NEON would probably be a good sized project with utility. Both `heterogeneous computing' and SIMD processors are going to play a big role in computing for the foreseeable future.

But there's a ton of other interesting projects in there.

A couple of links, the wall of dread lurches forward, dried meat.

A couple of interesting posts I came across this evening:

Obliquity by John Kay, an old article from the Financial Times, via Naked Capitalism.
Showing examples of how direct goals such as 'increasing shareholder value' are the least successful way to achieve it. Food for thought in this crazy modern world where CEO bonuses and shareholder returns are the the sole quantification of success of a company or industry. Interesting in light of globalisation, out-sourcing, patenting of everything from ideas to seeds to animals, and so on and so forth that all put private profit before society.
Curtain of Tragedy Will Be Raised Soon Enough, But Perhaps Not Next in Japan, Jesse's Café Américain.
A pretty depressing look into the crystal ball for the next decade or so. I don't think Australia for one realises how fragile it's current `resource-driven' aka `lucky-country' or rather, `stupid-country' economy really is (as opposed to 'the clever country' era which gave the me the opportunity to go through Uni). A little god-bothery at the end, but don't let that put you off, Jesse always has lots of interesting things to say.
More than a four-letter word from Financial Armageddon, and Revolution, flashmobs, and brain chips. A grim vision of the future from The Guardian.
Both about a new report from the UK Department of Defence about the world-wide strategic threat environment in 30 years time. As The Guardian's sub-heading says, very grim reading. It's all about the population really; I can't imagine any palatable political solution, and any trend line with no limit has no scientific solution either. In for a bumpy ride even if these glimpses of the future are out by a few years. Although at my age I may just escape it through natural means.

All very depressing. I don't think it helped that I was sort-of watching a pretty grim tv series 'The Survivors' at the same time either. Although I don't think the writers there can do maths terribly well - a 90% reduction in population would still leave a sizeable number of people in Britain - over 5 million. Not one dozen, and even raw meat takes more than a few hours to go off on the nose in a disconnected fridge! Hmm all this reminds me for some reason about a book I read a few years ago called `The Genocides' that left me deeply disturbed for at least a week. It just a sadistic device of the author simply give the reader un unpleasant time and lasting memory. It is a grotesque tale with no redeeming features whatsoever. Avoid.

Wall of Dread

Well, on to more mundane and thankfully lighter topics. This afternoon I finally got off my fat bum and spent 4-5 hours working on the main retaining wall. Managed to finish the bottom layer and stacked another two on top since I was on a run - I still need some ag-pipe before I can back-fill each layer with gravel, so I should try to get that early this week so I can get most of the work just out of the way at last. I was going to walk it from the plumbing shop - it's only about 1km away. Although I may have stuffed up a bit - i'm not sure where the ag-pipe should drain to, and I didn't leave any drainage holes along the 12m of the run. It can still come out of the ends though, so maybe i'll just do that (unlikely i'll move all those damn blocks again, particularly the bottom row). I also re-cut the stair-case base to make the steps deeper, although I can't fill that in till I have that ag-pipe.

I measured the height along the length once I finished it rises a bit in the middle unfortunately - about 8mm, but at least in a relatively smooth curve. Yes yes, stupid I know - I should have regularly measured, but I did have another string-line at the brick level and used a couple of spirit levels, but all those little `close enough fuckit's add up. But it doesn't really matter because it only goes out `significantly' where the stair-case is, which is a natural break in the line. I guess i'll have to see how much it settles over time too. My $5 rubber mallet disintegrated before the last few base bricks were done, but I managed to tap down and level the last few with a bit of wood and a heavy ball-peen hammer - pretty well over it all by that stage, but I think I didn't drop off too much in finishing quality. Knowing my luck it will be a bit out compared to the boundary (and to-be-shed-wall), but that should only affect the aesthetics unless I made a really big mistake.

Dried Beef

The biltong turned out more or less ok. I've been sampling bits as I went along to see how it 'aged', and the first bits were a bit young (they were also thinner). The flavour didn't really settle till today, 6 days in - once the vinegar had dissipated. I made a slight variation for two sets of strips. One I rinsed in water and vinegar to remove the excess salt, as according to one recipe, and the other I left as is in an attempt to keep more flavour in-tact, according to another. The vinegar one ended up pretty bland, so i'm not sure that's the right idea. But the other one is a bit salty - ok with a beer in hand or after a long hot day in the sun, but not fantastic. Since I was caught up with some other things I also salted it for about 15 hours, so maybe next time I'll just try a shorter seasoning session. Although I imagine most of the salt would still be clinging to the outside once it is hung up. Need a way to get more chilli flavour on it too - maybe some sambal pedas as a final stage. Still, one bit I tried today I'd label a success, even with the saltiness. Almost has a touch of salami flavour to it, mixed with bbq steak, a bit more chilli and less salt and it could be a winner. 1kg of well-trimmed lean porterhouse made about 400g of biltong; so it's pretty cheap to make. Will have to investigate a better drying cabinet than a house-moving box with a shade-cloth cover though, at least if it becomes a regular task.

SSE, gcc, AMD

I'm a bit ahead of schedule on my work and I am awaiting some feedback so I thought i'd look at what SSE2 could do to speed up tiny bits of what it does. Might not need to use it, but it'd be good to have some sort of feel for what it can do - e.g on CELL it can make an order of magnitude difference. Some interesting and surprising results.As a quick aside, while searching for why using an SSE intrinsic kept complaining about assigning to the wrong data-type (turned out it was just a typo and gcc didn't complain about the undefined `function') I came across this interesting comparison of SSE optimisations on common compilers. Actually I was quite surprised that gcc was doing quite so much with constant subexpression elimination of vector quantities - I thought intrinsics were pretty much in-line assembly with register aliases. Anyway good to see it is well ahead of the pack in all departments.First, I tried implementing the following function:

X = abs(Z)*exp(Y*i)Z, X complex, Y real

The C code is all pretty straightforward:

                        Xrow[x] = cabsf(Zrow[x])
                                * cexpf(Yrow[x]*I);

The SSE code was a bit messier, but for such a simple function quite manageable. I found a nice library of SSE optimised transcendental functions needed to implement a vector cexpf(), and the Cephes maths library was an excellent reference for the internal workings of functions like cexp - e.g. so I could remove redundant work.

                        v4sf d0, d1;
                        v4sf i, r;
                        v4sf absz;

                        v4sf rr, ri;
                        v4sf ci, si;

                        d0 = Zrow[x];
                        d1 = Zrow[x+1];

                        i = _mm_shuffle_ps(d0, d1, 0x88);
                        r = _mm_shuffle_ps(d0, d1, 0xDD);
                        absz = _mm_sqrt_ps(r*r + i*i);

                        i = Yrow[x/2];
                        sincos_ps(i, &si, &ci);

                        rr = absh*ci;
                        ri = absh*si;

                        d0 = _mm_unpacklo_ps(ri, rr);
                        d1 = _mm_unpackhi_ps(ri, rr);

                        Xrow[x] = d0;
                        Xrow[x+1] = d0;

I'm sure it's not the prettiest code, and it may not even work (not sure my shuffles are correct), but assuming it is otherwise correct, it'll do. I also tried a double version for simple C, and the above sse version with 1 unrolled loop.

I have a number of machines to try it on, a pentium-m core-duo-wtf-its-called, athlon 64 x2, and a phenom II 945. I'm using gcc 4.4.3. Running the loop 10 000 times over a 64x64 array using constant loop counts. I'm using the aggressive optimisation flags suggested by AMD, including -march=amdfam10 and -funroll-all-loops, and also c99 features like restrict.

Anyway I'll keep this brief. I'm not so much interested in the various CPU's as the relative differences within them.

 Code             A64-X2    Phenom II Pentium-M

 C single           1.16         0.84      1.17
 SSE single         0.69         0.31      0.37
 SSE unrolled       0.66         0.31      0.37
 C double           1.36         0.90      1.25

Well it's certainly an improvement, but nothing like on Cell. The Athlon-64 X2 really suffers from a 64-bit internal data-path, so the SSE is not even 2x faster, even though it's calculating 4x values at once for most operations. The intel shows the biggest improvement using SSE at just over 3x, but the Phenom is about the same. gcc has much improved since I tried similar things on the SPU so perhaps the non-sse code is 'just that good', but I can't help but imagine there is more silicon dedicated to making non-sse code run faster too. Oh and it is interesting that double's are pretty much on-par with singles, at least in this case. I was hoping that moving to singles would make a bigger difference - too lazy at this point to try a double SSE3 version (i'd have to write sincos_pd for example).

I tried this first because I thought the a little complexity (yet not too time consuming to write) would help the compiler do more optimisations, and it is complex enough it can't be memory constrained. I couldn't get gcc to inline the sincos_ps, but I doubt that would've made much difference.

Then I tried something much simpler, just adding two 2d arrays. I didn't use sse for this, but just used the vector types. And just looped over every element of every row - I used a stride calculation rather than treating it as completely linear. Hmm, perhaps I should have used non-constant dimensions to see what the compiler did with more general purpose code.

a[x] += b[x];

gcc unrolled the whole row-loop so things are about as simple as they get. Actually gcc vectorised the version using simple C types as well (well if it couldn't vectorise this loop it'd be a worry), and then it included the two versions: one vectorised for when the addresses are aligned appropriately (which they are), and one using the FPU for otherwise. Looking at x86 assembly gives me a headache so I didn't look too closely though. Since this loop is so simple I had to run it 1 000 000 times to get a reasonable timing.

  Code             A64-X2    Phenom II Pentium-M

  C single           8.5         1.1        2.1
  C vector single    6.0         0.98       1.2

Ok, so it's tuned for the phenom, but wtf is going on with the athlon64? Memory speed constrained? I presume they are all constrained by the L2 cache, or I might even be getting cache thrashing or something (this is the sort of thing you could guarantee on cell ... huge pity IBM aren't progressing it for HPC). Don't ask me why the pentim-m version is so different - afaict from the source they're both using SSE2 code since the memory should be aligned properly. *shrug*.

So after all that i'm not sure i'm much wiser. The question being - is it worth the effort to manually vectorise the code or even use sse intrinsics on an x86 cpu, at least for the algorithms i'm using? Hmm ... probably ... even a 2x increase on a minor operation is better than nothing, but it'll only be an incremental improvement overall- but at least in every case it was an improvement. These are all using 32-bit operating systems too, I wonder if I should try a 64-bit one. Well i'll see how much time I have to play with it.

Should really try GPGPU and/or OpenCL simply to get a good 'feel' of what to expect.

Indecision, games, and the rest

Wow, tired. Then again I was up till 3am last night hacking on some MMU code, so that probably isn't surprising. And some damn person called me with a wrong number at 8:30am too. Been a long couple of days. Oh hmm, now i remember, I was up for 23 hours on Saturday too - after a very early awakening that led me to the last post, I spent all afternoon helping a mate put up a fence, then a few beers and pizzas and a late night later ended up crashing about 3am. No wonder it feels like I need a weekend, and it's only Tuesday.

Been very busy with work, and had a few wins today - I think i'm all out of endorphins now, so I can't get back into the MMU code as i'd planned to. As a side note, whilst generating data to test some algorithms I accidentally created the image below. What took me about it is that it is surprisingly uncomfortable to look at - it seems to jump around randomly, about 3 times a second (at least on a bright LCD monitor). I don't think it's just my tired or coffeed eyes ...

On the MMU code I was writing a more comprehensive bootstrap function. One that initialises the MMU to relocate the loaded image to somewhere appropriate, and in a way that it can remain in use for the running system. I have some code, and it's nice and compact but I'm still undecided on the final memory map. For example, the ARM CPU on the Beagle can have two page tables, one for high addresses, i.e. the system, and one for the rest. If I use that feature it forces the system table to use at least the top 2G of the virtual address space, which was different from what I had in mind. Not that it matters I guess since the machine only has 256MB of memory anyway ... so probably no big deal. I'd like to use that option since it simplifies the management of dynamic globally shared data the way I was thinking of doing it (i.e. shared library code/rodata). There's a few other details i'm still undecided on too, it would be nice to use section pages for instance (1MB pages that only require a level 1 PTE).

I spent the few nights earlier working on the higher level MMU code, managing translation tables, allocating physical pages and thinking about how to implement efficient `agile' (migratable) memory. The actual page table manipulation is pretty simple but you need a fair bit of infrastructure before you can use it.

So I also wrote a simple first-fit memory allocator that uses an AVL tree to coalesce free blocks. It's something i've researched quite a bit in the past and first-fit works surprisingly well for a general allocator particularly if there's a quick way to coalesce frees (the main drawback being not-so-great worst-case allocation performance). Even though I've written one before (for AROS), I'm using the AVL code derived from the parent-linked one from libavl, since I couldn't find the final version of the one I wrote and it's probably better anyway. I modified it though to remove the iterator and tree objects so it can be used without requiring any additional or dynamic memory. I also poked the balance factor into the low 3 bits of the parent pointer so I can jam in the free block size into 16 bytes - which determines the minimum allocation granularity. If the linear search for the first-fit allocation proves too inefficient, I can always add another tree and bump the minimum granularity to 32 bytes which is probably still ok. Although the overhead of managing two trees may not be.

And I played with a couple of different ideas for allocating physical pages. I tried a bitmap, which is pretty simple but tricky to allocate large ranges (>32 pages), has poor worst-case, and has no room to store any meta-data. So then I went to an indirectly linked list of words, mainly because I then have somewhere to store extra per-page data like a shared refcount, or other bits of information. But that's even messier to allocate any range more than 1 block in size; although I thought I didn't need to do that, I do at least need to allocate 16kb blocks for the first level translation tables (although I only need 8kb if I use the dual-translation table setup). So I may end up going back to the idea I had with my other os which was just to track the holes in the address space using a tree - much like with the gp memory allocator but without being able to use the holes themselves to store data. I suspect I chose to do it that way for a reason - it has a reasonably high worst-case memory requirement, but that is unlikely, and bounded. Since I will need the functionality for a global virtual address allocator for the 'agile' memory blocks, it probably makes sense to just use the same thing.

For something different i'm also trying to make some biltong. Thought about it for ages but never got motivated enough to try it. I was at the Central Market on Friday arvo and grabbed some dried meat and super-hot sausage and found it particularly yumscrious. But the extraordinary price helped me to decide that it was time to try to make something like that myself. I just started with 1kg of beef to see if it works or if I just end up making a batch of biological warfare agent. I should really have given the kitchen a good clean first. I thought i'd skip the receipe that starts with '25lbs of beef or game', at least to start with ;-)

Read an interesting article a few days ago about five creepy ways video games try to get you addicted. The point I found most interesting was the assertion that games are basically taking the place of an un-fulfilling work environment, and particularly since the points mentioned seem to match my hobby and at least the 'good bits' of the work I do:

Autonomy (that is, you have some say in what you do day to day);
Complexity (so it's not mind-numbing repetition);
Connection Between Effort and Reward (i.e. you actually see the awesome results of your hard work).

Maybe that's why I don't get terribly obsessed with games - in effect i'm already playing one. Particularly of late, this has been very obvious to me - I can't put a problem down until I get it to work, or at least finish some little mile-stone along the way. If I can do that I can go away and feel a bit pleased that that 'target' has been reached, even if I know they will never end. On the other hand, if I can't it can be very disappointing, and it can even act as a catalyst to something worse since I have no other distractions just now. The other point this ties into is quite interesting too:

What is a game?Well, we humans play games because there is a basic satisfaction in mastering a skill, even if it's a pointless one in terms of our overall life goals. It helps us develop our brains (especially as children) and to test ourselves without serious consequences if we fail. This is why our brains reward us with the sensation we call "fun" when we do it

Basically games are a way to learn stuff. I guess a bit like fizzy drinks or other food with flavour but low nutritional content, some of these modern games are simply abusing key survival traits honed through evolution over millions of years simply to make a buck. This is what the world has become, hollow food, hollow entertainment, and hollow learning, for a hollow existence?

Actually I'm not so sure any of the survival skills gained through evolution are much use to us any more. Most of them don't work when you have unbounded supplies of high-energy food-stuffs within easy reach, spend all of your time sitting on a chair behind a desk or in a machine screaming across a layer of processed oil, or simply have so many people that there's no room for them or any other animals simply to live in peace. Evolution only optimises based on the past, not on the future. Yes, we're all screwed - keep an eye on the fish.

Fork'n Evolution!

I just saw a post suggesting it is time for Ubuntu to fork evolution and thought it deserved a comment ... Although now i've just about finished this post I realise just how silly the whole notion is to start with, so it probably doesn't.

It is a pretty bizarre suggestion.

Even just on a technical level, the author may not realise just how much work is involved in maintaining a piece of code the size of Evolution. Or the amount of organisation and effort required to make the sort of major structural changes it probably needs before any real progress can be made.

And politically it is a naive and arrogant to suggest one GNU/Linux company can 'go it alone' like that on any major piece of software, particularly one developed by a competitor. It is also pretty rude - I haven't even touched Evolution for 5 years as a user or developer, but as a free software developer and advocate I think forking should really be the point of last resort. And even then it would only be after a demonstrable and intractable conflict had arisen - e.g. refusal to accept many reasonable patches of significant size. A few 10 line patches hardly turns one into a maintainer. And I wonder if anyone who might be involved has even done that.

Evolution was always a strange free software project, which i've written about plenty of times before. Whilst I was working on it it had only a few external contributions, and even less so from any of the GNU/Linux vendors (apart from Sun). Most had patches just for (re)branding, but the occasional bug fix they did have rarely made it to the code-base through normal processes i.e. they rarely if ever submitted them to us. Not that we made it terribly easy, anally retarded patch submission and review processes, the copyright assignment crap, and simply being busy most of the time.

But even despite that - it is a free software project, and open to contributions from anyone. There is simply no need to fork it, if anyone desires to improve it they can, and indeed are encouraged to. It is part of the `social contract' if you will, of any GPL project. You get the code for free, and if you want to make changes you can, and add them to the public pool for the good of all. And unlike many 'open source' projects - Evolution was always a real free software product and not just an alpha-grade product looking for free beta-testers and code-monkeys like many seem to be.

"They didn’t understand that there is just one killer feature (just as with integrated desktop social networking) that needs to be in there which is Exchange support."

Back on topic ... this one statement probably undermines the post itself more than any other. As Jeff said in comments on the post this was Ximian's business model pretty much from the start, and one of the reasons Novell bought them. It was pretty much the reason Evolution existed anyway, although more to replace the need for Microsoft Exchange than to work with it - at least in my mind. As a free software developer I never saw the need to inter-operate with proprietary crap like that - which is probably why I was never working on that part of the code.

Also the whole bit about `social networking' mentioned in the post - this was a major part of the internal thinking at Ximian right from the start - before the term even existed. Although like then, I still don't really see the need for it, particularly in a corporate environment. We all just used IRC and it worked just fine for us.

I was working on the mail code even longer than Jeff - I can't believe I started over 10 years ago. I haven't looked at the code since leaving Novell and the project, and I never worked on the `microsoft exchange component' anyway, but i'd suggest the code was simply never up to snuff given the way it originally designed, and even just because who wrote it.

Nice guy and all, but not great at coming up with a maintainable design let alone writing reliable C. Jeff and I's experience with the original IMAP code was a nightmare that started when he simply abandoned it in a fit of terrible management. One day he simply began refusing emails or dealing with patches saying he'd moved to another project (might have even been the microsoft exchange plugin). So we were left maintaining a really nasty bit of code with no deep knowledge of how it worked. It was all too lisp-like with little or no structure which made it very difficult to grok particularly given it's requirement for multi-threading. Lisp might be a great language, but it simply isn't how you write C.

And apart from the code itself, I believe the microsoft exchange component had a messy architecture layered on top of the already overly complex ones (yes, multiple) within evolution. Partly because all the different data-types were routed through the same connection, and also because it started as a proprietary extension and had silly things like a symbol obfuscater. To be honest once I found out how it worked I was surprised it worked at all ... it was not surprising in the least that it was slow and buggy and probably always would be.

Disbanding the original team and moving the project wholesale to India can't have helped either, at least at the time. Apart from the brain drain that involved, India has a tendency to add layers of middle-management too, so big decisions we could have gotten away with by the end (probably without asking ;-) became completely impossible due to risk averse middle-managers only worried about their next promotion-due-to-tenure. High staff turn-over due to a liquid job market was also an issue.

The entire component architecture of the system should have been replaced years ago (I wouldn't know - perhaps it has been) - almost all of the early design decisions were wrong, and particularly outside of the mail code none of it was ever reviewed because of the high turn-over of developers. Inside the mail component there was a huge scope for doing things in a much more re-usable way. CORBA got a bad wrap because we had an absolutely terrible design and often simply used it incorrectly. The original bonobo design was just a COM clone, so no surprises there - pluggable multi-process widget systems was a stupid use of CORBA. But the evolution-data-server wasn't, although the interfaces were not as good as they could have been.

I don't think there is any conspiracy - but Novell are a proprietary software company first and foremost and a project like evolution never really fit it's culture - probably the major reason I left in the end. Once HR starts impacting on engineering I think you have worries as a technology company, and even more as an `open sauce' one where the most valuable `eye-pee' you have is in your employee's heads.

Canonical make some odd decisions about Ubuntu too - more odd since they are purporting to be such a community focused organisation. But I don't think even they would be arrogant or silly enough to seriously consider such a strange notion as forking evolution. I'm sure they could surprise me though.

Syscalls and IPC

The next thing i've been looking at is system calls, and using them to implement IPC.

I wanted system calls to lead to a direct context switch efficiently, so that for example, in the simple case an IPC call to a server process acts synchronously. But on the other hand, there are other tasks for system calls which are really just isolated or system-level function calls, and they don't need the overhead.

So I came up with the idea of two levels - simple system calls which are invoked and return much any other function invocation, and heavier ones which might lead to a context switch. I just then use the system call number to chose what code to execute.

It's actually easier just showing the code I came up with than trying to explain it.

svc_entry:      
        cmp     r7,#SVC_SIMPLE_LAST
        push    { r12, lr }
        adrlo   r12,svc_vectors
        adrlo   lr,svc_exit             @ low vectors, call and return to svc_exit
        ldrlo   pc,[r12, r7, lsl #2]
  ...
svc_exit:
        ldmia   sp!,{r12, pc }^

Well that's the entirety of the code-path for 'simple' system calls. It doesn't strictly need to save r12, but it makes the context-switching case a bit simpler, and keeps the stack aligned properly too. It doesn't need to save the normal EABI work registers, because the target vector will save any it needs to.

At the svc_exit label I initially had `pop { r12, pc }^' thinking that the pop pseudo-op would work the same as ldmia as it does normally, but implement an exception return because of the trailing `^' ... but it doesn't ...

The "we might context switch" case is then much the same as every other exception handler, and re-uses some of the code too. It saves the state to the current TCB, and then invokes the vector. It also handles invalid system call numbers by suspending the task.

        srsdb   #MODE_IRQ
        cps     #MODE_IRQ
        stm     sp,{r0-r14}^
        cps     #MODE_SUPERVISOR

        cmp     r7,#SVC_LAST
        adr     r12,svc_vectors
        adr     lr,ex_context_exit
        ldrlo   pc,[r12, r7, lsl #2]
        b       task_error_handler      @ syscall # invalid - suspend task

One of the simplest useful IPC mechanism I can think of is simply a signal based one (the AmigaOS idea of signals, not the Unix one, although they can be made to work the same). If you're waiting on a signal and it arrives you wake up, otherwise it just gets registered asynchronously and if you wait later, it no-ops.

It only required 2 basic operations, and another 2 for a bit more flexibility.

Signal signals another task with a single signal bit, and Wait puts the current task to sleep until any one of a given set of signals arrives, or does nothing if it already has. Since either may trigger a context switch, these are implemented using the heavier-weight system call mechanism.The other two functions are AllocSignal and FreeSignal whose usage is pretty obvious. These are implemented using non-context switching system calls.The functions are very simple. First Signal, all it does is set the signal bit in the task control block, and checks if the task was waiting for it. If so it sets the return value the function expects, adds the task back to the run queue and reschedules the task, which may then execute depending on the scheduling algorithm.

void svc_Signal(int tid, int sigNum) {
        struct task *tcb = findTask(tid);

        if (!tcb)
                return;

        sigmask_t m = (1<<sigNum);
        sigmask_t sigs;

        tcb->sigReceived |= m;
        if (tcb->state == TS_WAIT) {
                sigs = tcb->sigReceived & tcb->sigWaiting;
                if (sigs) {
                        tcb->tcb.regs[0] = sigs;

                        ... move tcb to run queue ...
                        ... reschedule tasks ...
                }
        }
}

Wait is just as simple. In the simplest case, if any of the signals it is waiting on have already arrived, it just returns the intersection of the two sets. Otherwise, the task is put to sleep by placing it on the wait queue, storing the signals it is waiting on. The `trick' here is that when it wakes up, it will just act like it has returned from a context switch - and start executing immediately after the svc instruction. The Signal code above uses that to return the return-code in the delayed case via register r0 - the standard ABI register for function return values.

sigmask_t svc_Wait(sigmask_t sigMask) {
        struct task *tcb = thistask;
        sigmask_t sigs;

        sigs = tcb->sigReceived & sigMask;
        if (sigs) {
                tcb->sigReceived &= ~sigs;
        } else {
                tcb->sigWaiting = sigMask;
                tcb->state = TS_WAIT;

                ... move current task to wait queue ...
                ... reschedule tasks ...
        }

        return sigs;
}

Invoking system calls uses the svc instruction. This encodes the system-call number in the instruction, but you can also pass it in a register - and I follow the linux EABI which puts the system call number in r7.

With this mecanism it is trivial to implement a system-call in assembly language - tell the compiler what the args are, but the assembly just passes them to the system.

C code:

extern int Wait(int sigmask);

ASM:

Wait:   push { r7 }
        mov  r7,#3
        svc  #3
        pop  { r7 }
        bx   lr

But this wont let the compiler in-line the calls, which would be desirable in many cases. So one must resort to inline asm in the C source.

sigmask_t Wait(sigmask_t sigMask) {
        register int res asm ("r0");
        register int sigm asm("r0") = sigMask;

        asm volatile("mov r7,#3; svc #3"
                     : "=r" (res)
                     : "r" (sigm)
                     : "r7", "r1", "r2", "r3", "r12");

        return res;
}

(I'm not sure if that is 100% correct - but it seems to compile correctly.)

Code in ipc-signal.c and another version of context.S.

Other IPC

Well beyond simple signals there are a few other possibilities:

Mailboxes - These are usually a fixed-length queue which can be written to asynchronously if is it not full, but only with simple data type such as an integer or pointer. No dynamic memory is required inside the kernel so it is easy to implement. Shared memory and the like is required to send more complex information.
Shared memory - Fairly simple once you have a mechanism to share memory. No way to send or receive notifications about state changes though, so requires polling (bad for lots of reasons), or other IPC primitives such as signals. Requires some sort of serialisation mechanism too, such as mutexes or semaphores which are best avoided.
Sychronous message passing - This is used to implement a light-weight context switch in microkernels. Because the callee and caller rendezvous at the same time, there's no need to save state beyond the normal state saved in a function call, and simplifies scheduling. If asynchronous or multi-processing is required threads are needed though, and passing large messages can be expensive.
Asynchronous message passing - This uses a queue to keep track of the received messages and allows the target to receive them when ready. If the receiver is ready and waiting it can work much like the synchronous case, otherwise the sender can keep doing work. In a non-protected/non-virtual memory environment this easy and efficient to implement, but these both add complications. For example dynamic memory is needed in the kernel, and messages saved to a kernel-queue involve a double copy - although it can be implemented without copying too.

I'm fond of the async message passing model, particularly with non-copying of messages - I used it in Evolution and other threaded code i've written since. So i'd like to try and get that working efficiently in a protected/virtual memory environment if I can - I had it working before on x86. I don't know if it worked terribly efficiently, but I don't see why it shouldn't be too bad.

Mailboxes also look interesting - because of their simplicity mostly. They can convey more information than a single bit as with signals, but i'm not sure they could replace them (the CELL BE hardware implements mailboxes plus a bit-based system much the same as signals - and if they spent the hardware to do them both there's probably a good reason).

Once I get some mechanism to pass data between tasks I guess i'll look at something a bit more meaty, like a device driver - and go back to banging my head against that gigantic OMAP manual.

FOPS in context

Another late night bit of hacking last night to add FPU support to the context switch code (which incidentally is now committed).

It works by disabling the FPU when a new task is scheduled which doesn't match the owner of the FPU. FPU instructions then cause an illegal instruction exception, and that is used to switch the context if it was an FPU instruction.

I think I spent more time working out how to check for floating point instructions in the illegal instruction exception handler than anything else. Partly just finding out what the instruction bits were, but also just how to implement it without a big mess.

Basically, afaict the following bit patterns are those used by all of the FPU related instructions.

  1111001x        -        -        - advanced simd data processing
  xxxx1110        - xxxx101x xxx0xxxx vfp data processing
  xxxx110x        - xxxx101x xxxxxxxx ext reg l/s
  11110100 xxx0xxxx        -        - asimd struct l/s
  xxxx1110        - xxxx101x xxx1xxxx vfp<>arm transfer
  xxxx1100 010xxxxx xxxx101x        -

Well, the problem is that ARM can only compare to an immediate 8 bit value (albeit anywhere within the 32 bits). So it becomes a bit of a mess of mucking about with those, or loading constants from memory. In the end I opted for the constant load - I figure that a constant load from memory is not really any different from a constant load from the instruction stream, once in the L1 cache, so assuming you're going to save a lot of instructions it will probably be faster. Well it's smaller anyway ...

As a comparison I wondered what gcc would come up with, e.g. with:

void modecheck(unsigned int *a) {
        unsigned int v = *a;

        if ((v & 0xfe000000) == 0xf2000000
            || (v & 0x0f000e00) == 0x0e000a00
            || (v & 0x0e000e00) == 0x0c000a00
            || (v & 0xff100000) == 0xf4000000
            || (v & 0x0fe00e00) == 0x0c400a00)
                printf("do stuff\n");
        else
                printf("do other stuff\n");
}

And it's pretty ugly:

        ldr     r0, [r0, #0]
        and     r3, r0, #-33554432
        cmp     r3, #-234881024
        beq     .L2
        bic     r3, r0, #-268435456
        bic     r3, r3, #16711680
        bic     r3, r3, #61696
        mov     r2, #234881024
        bic     r3, r3, #255
        add     r2, r2, #2560
        cmp     r3, r2
        beq     .L2
        bic     r3, r0, #-251658240
        bic     r3, r3, #16711680
        bic     r3, r3, #61696
        mov     r2, #201326592
        bic     r3, r3, #255
        add     r2, r2, #2560
        cmp     r3, r2
        beq     .L2
        bic     r3, r0, #14680064
        mov     r3, r3, lsr #20
        mov     r3, r3, asl #20
        cmp     r3, #-201326592
        beq     .L2
        bic     r3, r0, #-268435456
        bic     r3, r3, #2080768
        bic     r3, r3, #12736
        mov     r2, #205520896
        bic     r3, r3, #63
        add     r2, r2, #2560
        cmp     r3, r2
        beq     .L2
        ... else
        b       done
.L2:    .. if
done:

So basically it's converting all the constants to 8 bit operations. AFAICT the `branch predictor' basically just has a cache of recently taken branches, and the `default prediction' is that branches are not taken. If that's the case, the above code has a 13 cycle penalty in the best case ... plus all the extra instructions to execute in the worst. The first 2 cases are the most likely with fpu instructions though.

Not that this is in any way time-critical code (at most it will execute once per context switch), it just seems a bit of a mess.

I came up with something somewhat simpler, although to be honest I have no idea which would run faster.

        .balign 64
vfp_checks:     @ mask    ,value
        .word   0x0f000e00,0x0e000a00   @ xxxx1110        - xxxx101x -
        .word   0x0e000e00,0x0c000a00   @ xxxx110x        - xxxx101x -
        .word   0xff100000,0xf4000000   @ 11110100 xxx0xxxx        - -
        .word   0x0fe00e00,0x0c400a00   @ xxxx1100 010xxxxx xxxx101x -

        ...

 ldr r2,[r3,#-8]  @ load offending instruction
 ldr r2,[r2]

 and r0,r2,#0xfe000000 @ check 1111001x xxxxxxxx xxxxxxxx xxxxxxxx
 cmp r0,#0xf2000000
 ldrned r0,r1,vfp_checks
 andne r0,r0,r2
 cmpne r0,r1
 ldrned r0,r1,vfp_checks+8
 andne r0,r0,r2
 cmpne r0,r1
 ldrned r0,r1,vfp_checks+16
 andne r0,r0,r2
 cmpne r0,r1
 ldrned r0,r1,vfp_checks+24
 andne r0,r0,r2
 cmpne r0,r1
 bne 1f

        ... if

1:      ... else

Given the constants are on a cache boundary, presumably their loads should all be 1 cycle (for all 64 bits) after the cache line is loaded, so they shouldn't be any worse than in-line constants that take more than 2 or more instruction to create (let alone the 7 required for both constants in the second C case).

Interestingly although not surprisingly the -Os code is more similar to the hand-rolled assembly, although it includes explicit branches rather than conditional execution. And does single-word loads instead of double.

Apart from that the context switching itself is pretty simple - just store the fpu registers above the others. irq_new_task is where the fpu is turned on or off depending on the current fpu owner, so it doesn't do anything if only one task is actively using the fpu. The existing context switch code required no changes - it is all handled 'out of band'.

Given that I now have some tasks to manage I also added a general illegal instruction handler for really `illegal instructions'. All it does is suspend the current task and remove it from the run queue as well, and the other tasks keep running just fine. For my uKernel it will then have to signal a process server to clean up/or whatever.

Code is in context.S and irq-context-fp.c.

OCaml

Well yesterday I should have been out working on the retaining wall but in the morning I got stuck thinking about optimising implementations of mathematical expressions (too much coffee late the night before methinks).The primary reason is that I think I found a pretty extreme case of wasteful calculation in an algorithm I'm working on. As far as I can tell from manual inspection, in one place it calculates a `square' 4D intermediate result - actually of the 8M array entries (64^4), most of them, about ~7M, remain unset at (0+0i). But when it uses this array for further processing it only uses 2047 of the results! Considering the intermediate array is 256MB in size(!!!), this is an obvious target for optimisation - it spends far more time clearing the memory than calculating the results it uses!

(I had the mistaken impression that matlab did smart things with such expressions, but I guess it doesn't - it's really just a scripting language front-end to a bunch of C functions.)

I have a feeling this sort of thing is happening throughout the algorithm although this is probably at the extreme end, but analysing it manually seems error prone and time consuming.

Surely there's a better way ... functional languages?

I'm not sure if i'll be able to take advantage of any of them, at least in the short-term. I was reading about how FFTW uses OCaml to implement it's optimiser - this is the sort of thing I might have to look at, although I never could wrap my head around functional programming languages. But anwyay - it's something i'll keep investigating although my focus to start with will just be code translation. Perhaps some sort of symbolic expression analyser might help me reduce the equations - once I work out what they are at least.

Another brick in the wall

I did get a couple of hours in on the retaining wall, laying the foundation layer of bricks which is the tricky bit - but then just as I was starting to make good progress it started to rain. I got nearly half-way on the main wall at least. I assessed that the stair-well i'd created was too steep and started too high (a slack string led me to believe the first-run of bricks would be lower), so i'll have to dig that out more too, but that wont take too long. I need to get some ag-pipe before I can back-fill it though. Hmm, should really get out there and do a bit more for a couple of hours - at least do the first run - hmm, but alas I think the day has gotten away from me again, and I have some other things to do.

It's a long-weekend anyway, so maybe tomorrow is a better day to get dirty, and the clouds are looking a bit dark at the moment.

About Me

Tags