如题所示,其实我把这个累加计算换成了汇编写的,所需要的cycle却变多了。(前提,DSP软件勾选了“Enable optimization”)
for(n = 0; n < 65; n++)
{
dctArrayOut+= a[n]*cos_11[n];
}
换成了:
.section program;
.global _a_dot_c_asm;
_a_dot_c_asm:
P0 = R0;
I0 = R1;
P1 = 64;
R0 = 0;
NOP;
R1 = [P0++];
R2 = [I0++];
LSETUP (begin_loop, end_loop) LC0 = P1;
begin_loop: R1 *= R2;
R2 = [I0++];
end_loop: R0= R0 + R1 (NS) || R1 = [P0++] || NOP;
R1 *= R2;
R0 = R0 + R1;
R0 = R0>>19;
RTS;
_a_dot_c_asm.end:
实在想不通,为什么cycle会变多呢?按理说,汇编比C要快啊?
求各位大神帮忙解答一下谢谢~