Home arrow Support arrow Forums

Luminary Micro Forums

<< Start < Prev 1 2 3 Next > End >>

paulkimelman

Junior Boarder
Click here to see the profile of this user

2008/05/23 15:23

Re:hand optimized FFT/IFFT for Cortex-M3 attached

imellen wrote:
ARM assembler uses pseudoinstructions such as LDR immediate and ADR that can be translated into different instructions.

Note that you can use .N and .W (narrow and wide) to force which instruction size (.W allows you to ensure alignment for the next instruction for example).

The issue of address resolution is improving in both GCC's asm as well as ARM's (not sure about IAR). This allows forcing more choices about local address form vs. far address form, but does not yet allow use of MOVW, which is a shame. The most notable model for address resolution is the ability to load a symbol or section address into a register, and then use all LDR/STR offsets from that (with the linker fixing it up).

Just add assembly file to project, declare and call fft functions from C code.

Although I agree that you can dump the text into a .s file, it is still also an option to embed into a C function. The C compiler will not do anything other than pass to the assembler, but it does then make it easier to keep within a normal C or C++ flow. Thanks, Paul

login or register to reply

ProARM

Senior Boarder
Click here to see the profile of this user

2008/05/26 14:51

Re:hand optimized FFT/IFFT for Cortex-M3 attached

Ok, I have the LM3S1968 kit with Keil compiler.
It is a well optimized compiler... as anyone can see easily from the disassembler window.
Unfortunately for our purposes, it doesnt seem to be so clever with ML too...
After passing the weekend trying to make my first program in cortex Machine Language, I discovered:
1) Keil inline assembler supports only ARM set, so you can't program Cortex M3, wich is a THUMB-only processor.
Anyway, adding a .S file to the project makes possible to use the "embedded assembler". Then you can call the ML routine from your C code.
2)I was happy until I discovered another BIG problem. Keil's debug mode gives you the timing profile ONLY within the software simulation of the processor. The guide tells it wont show this info when the real processor is running...
I'm afraid, this makes impossible to test the fft routine...

---------
By the way, me too started to program with 6502 and Z80... but today Luminary is the first company in the world making the "urban legends" true:
If you disable the JTAG via software, this is forever because the flag is stored in a non-volatile bit.
read here:
http://www.luminarymicro.com/component/option,com_joomlaboard/Itemid,92/func,view/id,374/catid,7/

The second issue is when your program attempts to define some of the JATG pins like generic i/o. Again, the JTAG will be disabled... on "fury" class devices there is a way to reset the chip but on "sandstorm" devices this situation is without recovery...
read here:
http://www.luminarymicro.com/component/option,com_joomlaboard/Itemid,92/func,view/id,902/catid,5/
and here:
http://www.luminarymicro.com/component/option,com_joomlaboard/Itemid,92/func,view/id,1667/catid,5/

login or register to reply

ProARM

Senior Boarder
Click here to see the profile of this user

2008/05/26 14:51

Re:hand optimized FFT/IFFT for Cortex-M3 attached

By the way, this forum acts in a strange way...
Sometime the messages are sorted from the firts to the last, some other times they are sorted in reverse order, with the last message as first...
It is happening to you too? ...

Post edited by: ProARM, at: 2008/05/26 14:55

login or register to reply

paulkimelman

Junior Boarder
Click here to see the profile of this user

2008/05/26 23:57

Re:hand optimized FFT/IFFT for Cortex-M3 attached

ProARM wrote:
1) Keil inline assembler supports only ARM set, so you can't program Cortex M3, wich is a THUMB-only processor.

No, it will use the correct Thumb-2. Note that Keil/ARM only supports the "unified assembly code" (ARM, Thumb, and Thumb-2 all using the same mnemonics).

2)I was happy until I discovered another BIG problem. Keil's debug mode gives you the timing profile ONLY within the software simulation...

You should be able to access the DWT cycle counter (gives accurate time information down to the cycle) and you can use the SWV output of Keil tools. Both give very high accuracy.

making the "urban legends" true:

You have always been able to "brick" MCUs and generally always will be able to. This is because security of a production part is more important that debuggability in the end. However, normal code cannot cause these problems. It takes a concerted effort and a few operations to do this. This is the same whether C or Asm. Fury parts solve this by allowing you to wipe the part clean (erase all flash and all pin controls) so it is not leaking anything.

As to your question about the forum, it seems to reverse it when you log in. So, if unlogged-in, it shows in cron order, but if logged in, newest is 1st. Regards, Paul

login or register to reply

ProARM

Senior Boarder
Click here to see the profile of this user

2008/05/29 14:41

Re:hand optimized FFT/IFFT for Cortex-M3 attached

Uhm, ok...
while I'm trying to have a deeper look at the Keil IDE and its integrate assembler, I want to ask some questions about the fft routine...

Please, Ivan, help me to understand how it would be possible to use your fft+ifft for some usefull application.


Do you think it would be good to perform accurate digital filtering of an audio signal? Or for audio data it would be better to use some more-specific routine?

For example, lets assume we have a mono channel, 44khz, 16 bit. Lets consider the fft size=256.
(by the way... this makes sense for audio filtering, or should we consider a bigger size?)

At 44khz, we have a sample every 22,7 uS, so we have a data-block every 5,81 mS.
Your routine takes 19425 cycles for fft and 19425+70 for ifft. (if I understand well)

With Cortex clocked at 25 Mhz, latency 0, every cycle is 40 nS.
So, to process both routines, I will take: 38920x40= .. only 1,56 mS... it can be done!

As I understand, If I want to filter the audio signal, before to apply the inverse fft I must do something on the data array in the frequency domain... but lets talk about this later...

After I took a deeper look in your documentation, there are some details I dont understand:

1) 16 bit complex arithmetic, 1Q15 coefficients.
What does it means 1Q15 ?

2) your example uses arrays of 512 words each (short -32768 to 32767). What is the fft size for this example? 256 I think... But why 512? Because you are "coupling" the data(?) but why?

3) the input array x is complex, but my audio data are not... how I should fill the input array? leaving the immaginary parts=0?

4) it would be possible to rewrite you fft routine to handle only real input in order to save time and RAM?

5) you use 3 arrays for the data:
x (input) and y, z (both output)
Usage:

fftR4(y, x, 256); // y is in frequency domain y[128]=
ifftR4(z, y, 256); // z should be x/N + noise introduced by 16 bit truncating


Why z should be x/N? what does it means? If I apply the fft + ifft I was thinking I should return to the input values (+- some noise)...

6) For the x y z arrays, I presume this usage:
x: will feed the input data. ->in audio terms: the samples with the imaginary parts always =0
y: will give the frequency domain (real+imaginary) -> in audio terms: [amplitude+phase] at certain frequency
z: will give back the time domain data -> in audio terms: the samples-out with imaginary parts always =0

Am I wrong on this?
- - - - - - - - - - - - - - - - -
Now the hard questions:

7) Can I reuse the x array if I dont need to keep the original data?
Example:
fftR4(y, x, 256);
ifftR4(x, y, 256);

8) What is the frequency domain for fft if I use size 256 with 44Khz audio (22,7uS) ?
And what would it be the frequency step between every position of the y array?
What if I choose a bigger size?

9)To perform the audio filtering how should I act?
Is OK to trim only the amplitude part in the y array (frequency domain) leaving the phase untouched?
For example, if I set =0 a selection in the y array... I will obtain a band-cutter filter?

tanks, it's all... ;)

login or register to reply

imellen

Fresh Boarder
Click here to see the profile of this user

2008/05/30 17:04

Re:hand optimized FFT/IFFT for Cortex-M3 attached

ProARM, most questions you've asked is beyond the scope of this forum. FFT is workhorse in multiple DSP algorithms, I suggest you find some DSP literature how to use it.
The fundamental questions is not "So, I have optimized FFT routine, where can I use it?" but "I need optmized FFT for my application, where can I find it?"
Anyway, I'll try to answer most of your questions related directly to posted FFT routine:
1) 1Q15 coefficients - this is often used fixed decimal point notation. Number before Q meand number of integer bits (including sign if signed), number behind is number of fractional bits. Signed 1Q15 integer can represent values from -32768/32768 to +32767/32768 = -1 to 0.99996
2) example uses arrays of 512 words each - this is indeed for 256 point FFT test (256 x two 16 bit integers (real, imaginary) )
3,4) input array x is complex, but input data real - yes, it means imaginary part=0. This is sort of inefficient use of complex FFT, as 2nd half of FFT output is symetrical copy. There are some tricks based on FFT properties how to calculate real FFT of length 2*N with complex FFT of length N. As I said it will be in version 2.0 (when I find more time for this)
5) FFT output scaled by N - this is inherent limitation of 16 bit arithmetics in order to avoid overflow. You have to multiply output by N in order to restore original values.
6) x - fft - ifft - your assumptions are correct, keep in mind scaling issues
7) can I reuse x - FFT reads x, writes output to y. x and y arrays has to be different. Your example will work, basic programing principles.
8) frequency bin separation is fs/N where fs is sampling frequency, N is fft size. For fs=44100Hz FFT bins are 172.265625 Hz apart.
9) audio filtering in frequency domain - it is complicated issue, make some research on convolution in frequency domain and mp3 principles. Process that you've described will work for single buffer (N samples) For continuous stream it will produce discontinuities at the buffer edges (you have to overlap buffers and use windowing)

hope this helps.
Ivan
P.S.
I've finished updated FFT routine that also supports odd powers of two (N=8, 32, 128, 512 and 2048). I'll post it soon after cosmetic changes.

login or register to reply
<< Start < Prev 1 2 3 Next > End >>