Iterative Radix2-FFT on FPGA
Hey,
so now I'm ready with my first FFT Implementation.
Like I mentioned in my earlier Post I want to implement a Iterative Radix2 FFT Algorithm.
You can see a Pseudocode on Wikipedia: https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm#Data_reordering,_bit_reversal,_and_in-place_algorithms
I compared it with a C++ Implementation of it and it's relative exact.
With relative I mean that there are rounding Errors in it or Overflow Errors I don't exactly know.
The Picture of the Simulation Shows just the Imaginary Part. The Real Part is nearly 0 that means it's about -1 and 1.
Calculation Times@100Mhz Input Clock are about 56µs@128Samples and 23µs@64Samples.
So it's not a pretty fast Implementation and it's not really resource efficient I think. It would be also possible to work with lower Resolution actually I'm using 64bit Fixed Point Numbers (EN25_39).
Resource Usage at 128Samples is:
But better Implementations for Resource Efficient and faster Fourier Transform would be the Rader Algorithm or the Winograd Algorithm.
In Common.vhd you can find some Constants. These Constants are just the Real and Imaginary Part of a Complex Number with Amplitude 1 and Angle Theta.
Theta is (2*Pi)/unityStep
UnityStep is 0x1 left shifted to the Number of Samples.
Source Code:
Common.vhd
it_fft.vhd
it_fft_tb.vhd
so now I'm ready with my first FFT Implementation.
Like I mentioned in my earlier Post I want to implement a Iterative Radix2 FFT Algorithm.
You can see a Pseudocode on Wikipedia: https://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm#Data_reordering,_bit_reversal,_and_in-place_algorithms
I compared it with a C++ Implementation of it and it's relative exact.
With relative I mean that there are rounding Errors in it or Overflow Errors I don't exactly know.
The Picture of the Simulation Shows just the Imaginary Part. The Real Part is nearly 0 that means it's about -1 and 1.
Calculation Times@100Mhz Input Clock are about 56µs@128Samples and 23µs@64Samples.
So it's not a pretty fast Implementation and it's not really resource efficient I think. It would be also possible to work with lower Resolution actually I'm using 64bit Fixed Point Numbers (EN25_39).
Resource Usage at 128Samples is:
- Slice LUT's -> 18988 (29.95%)
- Slice Registers -> 17050 (13.45%)
- F7 Muxes -> 4352 (13.73%)
- F8 Muxes -> 2048 (12.92%)
- DSPs -> 32 (13.33%)
But better Implementations for Resource Efficient and faster Fourier Transform would be the Rader Algorithm or the Winograd Algorithm.
In Common.vhd you can find some Constants. These Constants are just the Real and Imaginary Part of a Complex Number with Amplitude 1 and Angle Theta.
Theta is (2*Pi)/unityStep
UnityStep is 0x1 left shifted to the Number of Samples.
Source Code:
Common.vhd
it_fft.vhd
it_fft_tb.vhd
Kommentare
Kommentar veröffentlichen