MaX*_*leR 6 c# arrays audio split
我正在寻找de/interleave缓冲区的最快方法.更具体地说,我正在处理音频数据,所以我试图优化分配/组合通道和FFT缓冲区的时间.
目前我正在为每个数组使用一个带有2个索引变量的for循环,所以只加上操作,但所有托管数组检查都不会与C指针方法进行比较.
我喜欢Buffer.BlockCopy和Array.Copy方法,这些方法在处理通道时会花费大量时间,但是阵列无法拥有自定义索引器.
我试图找到一种方法来制作一个数组掩码,它将是一个带有自定义索引器的虚假数组,但在我的FFT运算中使用它时证明速度慢了两倍.我想编译器在直接访问数组时可以提取很多优化技巧,但无法优化通过类索引器进行访问.
我不想要一个不安全的解决方案,尽管从它的外观来看,这可能是优化此类操作的唯一方法.
谢谢.
这是我现在正在做的事情:
private float[][] DeInterleave(float[] buffer, int channels)
{
float[][] tempbuf = new float[channels][];
int length = buffer.Length / channels;
for (int c = 0; c < channels; c++)
{
tempbuf[c] = new float[length];
for (int i = 0, offset = c; i < tempbuf[c].Length; i++, offset += channels)
tempbuf[c][i] = buffer[offset];
}
return tempbuf;
}
Run Code Online (Sandbox Code Playgroud)
我运行了一些测试,这是我测试的代码:
delegate(float[] inout)
{ // My Original Code
float[][] tempbuf = new float[2][];
int length = inout.Length / 2;
for (int c = 0; c < 2; c++)
{
tempbuf[c] = new float[length];
for (int i = 0, offset = c; i < tempbuf[c].Length; i++, offset += 2)
tempbuf[c][i] = inout[offset];
}
}
delegate(float[] inout)
{ // jerryjvl's recommendation: loop unrolling
float[][] tempbuf = new float[2][];
int length = inout.Length / 2;
for (int c = 0; c < 2; c++)
tempbuf[c] = new float[length];
for (int ix = 0, i = 0; ix < length; ix++)
{
tempbuf[0][ix] = inout[i++];
tempbuf[1][ix] = inout[i++];
}
}
delegate(float[] inout)
{ // Unsafe Code
unsafe
{
float[][] tempbuf = new float[2][];
int length = inout.Length / 2;
fixed (float* buffer = inout)
for (int c = 0; c < 2; c++)
{
tempbuf[c] = new float[length];
float* offset = buffer + c;
fixed (float* buffer2 = tempbuf[c])
{
float* p = buffer2;
for (int i = 0; i < length; i++, offset += 2)
*p++ = *offset;
}
}
}
}
delegate(float[] inout)
{ // Modifying my original code to see if the compiler is not as smart as i think it is.
float[][] tempbuf = new float[2][];
int length = inout.Length / 2;
for (int c = 0; c < 2; c++)
{
float[] buf = tempbuf[c] = new float[length];
for (int i = 0, offset = c; i < buf.Length; i++, offset += 2)
buf[i] = inout[offset];
}
}
Run Code Online (Sandbox Code Playgroud)
和结果:(缓冲区大小= 2 ^ 17,每次测试的迭代次数= 200)
Average for test #1: 0.001286 seconds +/- 0.000026
Average for test #2: 0.001193 seconds +/- 0.000025
Average for test #3: 0.000686 seconds +/- 0.000009
Average for test #4: 0.000847 seconds +/- 0.000008
Average for test #1: 0.001210 seconds +/- 0.000012
Average for test #2: 0.001048 seconds +/- 0.000012
Average for test #3: 0.000690 seconds +/- 0.000009
Average for test #4: 0.000883 seconds +/- 0.000011
Average for test #1: 0.001209 seconds +/- 0.000015
Average for test #2: 0.001060 seconds +/- 0.000013
Average for test #3: 0.000695 seconds +/- 0.000010
Average for test #4: 0.000861 seconds +/- 0.000009
Run Code Online (Sandbox Code Playgroud)
每次测试都得到类似的结果.显然,不安全的代码是最快的,但我很惊讶地看到CLS无法弄清楚它在处理锯齿状数组时会丢弃索引检查.也许有人可以想出更多方法来优化我的测试.
编辑:我尝试使用不安全的代码循环展开,它没有效果.我也尝试过优化循环展开方法:
delegate(float[] inout)
{
float[][] tempbuf = new float[2][];
int length = inout.Length / 2;
float[] tempbuf0 = tempbuf[0] = new float[length];
float[] tempbuf1 = tempbuf[1] = new float[length];
for (int ix = 0, i = 0; ix < length; ix++)
{
tempbuf0[ix] = inout[i++];
tempbuf1[ix] = inout[i++];
}
}
Run Code Online (Sandbox Code Playgroud)
结果也是一个命中未命中比较测试#4与1%的差异.到目前为止,测试#4是我最好的方法.
正如我告诉jerryjvl,问题是让CLS没有索引检查输入缓冲区,因为添加第二个检查(&& offset <inout.Length)将减慢它...
编辑2:我之前在IDE中运行了测试,所以这里是结果:
2^17 items, repeated 200 times
******************************************
Average for test #1: 0.000533 seconds +/- 0.000017
Average for test #2: 0.000527 seconds +/- 0.000016
Average for test #3: 0.000407 seconds +/- 0.000008
Average for test #4: 0.000374 seconds +/- 0.000008
Average for test #5: 0.000424 seconds +/- 0.000009
2^17 items, repeated 200 times
******************************************
Average for test #1: 0.000547 seconds +/- 0.000016
Average for test #2: 0.000732 seconds +/- 0.000020
Average for test #3: 0.000423 seconds +/- 0.000009
Average for test #4: 0.000360 seconds +/- 0.000008
Average for test #5: 0.000406 seconds +/- 0.000008
2^18 items, repeated 200 times
******************************************
Average for test #1: 0.001295 seconds +/- 0.000036
Average for test #2: 0.001283 seconds +/- 0.000020
Average for test #3: 0.001085 seconds +/- 0.000027
Average for test #4: 0.001035 seconds +/- 0.000025
Average for test #5: 0.001130 seconds +/- 0.000025
2^18 items, repeated 200 times
******************************************
Average for test #1: 0.001234 seconds +/- 0.000026
Average for test #2: 0.001319 seconds +/- 0.000023
Average for test #3: 0.001309 seconds +/- 0.000025
Average for test #4: 0.001191 seconds +/- 0.000026
Average for test #5: 0.001196 seconds +/- 0.000022
Test#1 = My Original Code
Test#2 = Optimized safe loop unrolling
Test#3 = Unsafe code - loop unrolling
Test#4 = Unsafe code
Test#5 = My Optimized Code
Run Code Online (Sandbox Code Playgroud)
看起来循环展开是不利的.我的优化代码仍然是我最好的方式,与不安全的代码相比只有10%的差异.如果我只能告诉编译器(i <buf.Length)暗示(offset <inout.Length),它将丢弃检查(inout [offset]),我将基本上得到不安全的性能.
| 归档时间: |
|
| 查看次数: |
3524 次 |
| 最近记录: |