About Me

Michael Zucchi

 B.E. (Comp. Sys. Eng.)

  also known as Zed
  to his mates & enemies!

notzed at gmail >
fosstodon.org/@notzed >

Tags

android (44)
beagle (63)
biographical (104)
blogz (9)
business (1)
code (77)
compilerz (1)
cooking (31)
dez (7)
dusk (31)
esp32 (4)
extensionz (1)
ffts (3)
forth (3)
free software (4)
games (32)
gloat (2)
globalisation (1)
gnu (4)
graphics (16)
gsoc (4)
hacking (459)
haiku (2)
horticulture (10)
house (23)
hsa (6)
humour (7)
imagez (28)
java (231)
java ee (3)
javafx (49)
jjmpeg (81)
junk (3)
kobo (15)
libeze (7)
linux (5)
mediaz (27)
ml (15)
nativez (10)
opencl (120)
os (17)
panamaz (5)
parallella (97)
pdfz (8)
philosophy (26)
picfx (2)
players (1)
playerz (2)
politics (7)
ps3 (12)
puppybits (17)
rants (137)
readerz (8)
rez (1)
socles (36)
termz (3)
videoz (6)
vulkan (3)
wanki (3)
workshop (3)
zcl (4)
zedzone (26)
Monday, 16 June 2014, 09:00

bandwidth. limited.

I've been continuing to experiment with fft code. The indexing for a general-purpose routine got a bit fiddly so I went back to the parallella board to see about some basics first.

The news isn't really very good ...

I was looking at a problem of size 2^20, I was going to split it into two sections: 1024 lots of 1024-element fft's and then another pass which does 1024 lots of 1024-element fft's again but of data strided by 1024. This should mean the minimum external memory accesses - two full read+write passes - with most working running on-core. There's no chance of keeping it all on core, and even then two passes isn't much is it?

But I estimate that a simple radix-2 implementation executing on only two cores will be enough to saturate the external memory interface. FFT is pretty much memory bound even on a typical CPU so this isn't too surprising.

Tagged hacking, parallella.
borkenmacs | more fft musings
Copyright (C) 2019 Michael Zucchi, All Rights Reserved. Powered by gcc & me!