The problem with teaching abstractions
For want of distraction, I've been hanging around some of the OpenCL forums of late. Boy do they get some ninny questions.
From people trying to drop plain C (with full stdio and all) into the compiler and expecting it to work, to someone asking if you can 'write functions' in the host language ... (err, I mean seriously. Maybe the guy is a matlab guru but certainly it isn't that hard to find out about C or whatever host language he's using).
But those are just the most extreme in the last couple of days. What is more worrying is just how many people don't seem to understand computer architecture at all - much less a system containing a 'remote' processing device like a GPU.
Really basic things like cache, registers, stack, memory latency, memory banks & contention, I/O bus latency, call invocation overheads, and so on. Not to mention the less-obvious but not really more complex ideas that GPU's bring to the table such as memory coalescing, thread divergence/masking (i.e. SIMT branches), local memory, and so on.
Also, a rough idea of just how fucking fast modern hardware is.
I presume most of the queries are from students but they just seem to have absolutely NFI what is going on 'under the bonnet' on their shiny new bit of hardware. e.g. the reason your code is slower on a GPU should be bleedingly obvious before you went anywhere near a compiler.
Before trying to work with such a complex system, you really need to know some basics of computer system architecture - otherwise none of the api will make any sense, nor will any of the results.
The manuals are good: just read them.
Experiment on your own; it's easy, it's fast, it's cheap (it only takes your time, and a student's time isn't worth anything). You don't learn any-where near as much if you just copy someone else, or forever ask inane questions.