Java:微优化数组操作

Question

Java:微优化数组操作

Mar*_*boe 9 java optimization performance micro-optimization neural-network

我试图建立一个简单的前馈神经网络的Java端口.
这显然涉及大量的数值计算,所以我试图尽可能地优化我的中心循环.结果应该在float数据类型的限制范围内正确.

我当前的代码如下所示(错误处理和初始化已删除):

/**
 * Simple implementation of a feedforward neural network. The network supports
 * including a bias neuron with a constant output of 1.0 and weighted synapses
 * to hidden and output layers.
 * 
 * @author Martin Wiboe
 */
public class FeedForwardNetwork {
private final int outputNeurons;    // No of neurons in output layer
private final int inputNeurons;     // No of neurons in input layer
private int largestLayerNeurons;    // No of neurons in largest layer
private final int numberLayers;     // No of layers
private final int[] neuronCounts;   // Neuron count in each layer, 0 is input
                                // layer.
private final float[][][] fWeights; // Weights between neurons.
                                    // fWeight[fromLayer][fromNeuron][toNeuron]
                                    // is the weight from fromNeuron in
                                    // fromLayer to toNeuron in layer
                                    // fromLayer+1.
private float[][] neuronOutput;     // Temporary storage of output from previous layer


public float[] compute(float[] input) {
    // Copy input values to input layer output
    for (int i = 0; i < inputNeurons; i++) {
        neuronOutput[0][i] = input[i];
    }

    // Loop through layers
    for (int layer = 1; layer < numberLayers; layer++) {

        // Loop over neurons in the layer and determine weighted input sum
        for (int neuron = 0; neuron < neuronCounts[layer]; neuron++) {
            // Bias neuron is the last neuron in the previous layer
            int biasNeuron = neuronCounts[layer - 1];

            // Get weighted input from bias neuron - output is always 1.0
            float activation = 1.0F * fWeights[layer - 1][biasNeuron][neuron];

            // Get weighted inputs from rest of neurons in previous layer
            for (int inputNeuron = 0; inputNeuron < biasNeuron; inputNeuron++) {
                activation += neuronOutput[layer-1][inputNeuron] * fWeights[layer - 1][inputNeuron][neuron];
            }

            // Store neuron output for next round of computation
            neuronOutput[layer][neuron] = sigmoid(activation);
        }
    }

    // Return output from network = output from last layer
    float[] result = new float[outputNeurons];
    for (int i = 0; i < outputNeurons; i++)
        result[i] = neuronOutput[numberLayers - 1][i];

    return result;
}

private final static float sigmoid(final float input) {
    return (float) (1.0F / (1.0F + Math.exp(-1.0F * input)));
}
}

Run Code Online (Sandbox Code Playgroud)

我使用-server选项运行JVM,到目前为止,我的代码比类似的C代码慢25%到50%.我该怎么做才能改善这种情况？

谢谢,

马丁威比

编辑#1:在看到大量的回复之后,我应该澄清一下我们场景中的数字.在典型运行期间,该方法将使用不同的输入调用约50.000次.典型的网络将具有numberLayers = 3层,分别具有190,2和1个神经元.因此,最里面的循环将具有大约2*191+3=385迭代(当计算层0和1中添加的偏置神经元时)

编辑#1:在此线程中实现各种建议后,我们的实现几乎与C版本一样快(在~2%之内).感谢您的帮助!所有的建议是有帮助的,但因为我只能标记一个答案正确的,我会给它@Durandal两个建议阵列优化和是预先计算的唯一一个for循环头.

Answer 1

Pet*_*rey 8

一些技巧.

在你最内层的循环中,考虑如何遍历CPU缓存并重新排列矩阵,以便顺序访问最外层的数组.这将导致您按顺序访问缓存而不是跳到整个地方.缓存命中可以比缓存未命中快两个级别.例如重组fWeights所以它被访问为

activation + = neuronOutput [layer-1] [inputNeuron]*fWeights [layer - 1] [neuron] [inputNeuron];

不要在循环内执行工作(每次),这可以在循环外完成(一次).每次可以将它放在局部变量中时,不要执行[layer -1]查找.您的IDE应该能够轻松地重构它.
Java中的多维数组不像C中那样有效.它们实际上是多层单维数组.您可以重新构建代码,因此您只使用单维数组.
当您可以将结果数组作为参数传递时,不要返回新数组.(保存在每次调用时创建新对象).
而不是在整个地方执行第1层,为什么不使用layer1作为第1层并使用layer1 + 1而不是layer.

哇 - 优化阵列访问可将运行时间缩短20%.谢谢. (2认同)

Answer 2

Syn*_*r0r 5

首先,不要这样做:

// Copy input values to input layer output
for (int i = 0; i < inputNeurons; i++) {
    neuronOutput[0][i] = input[i];
}

Run Code Online (Sandbox Code Playgroud)

但是这个:

System.arraycopy( input, 0, neuronOutput[0], 0, inputNeurons );

Run Code Online (Sandbox Code Playgroud)

Answer 3

Dur*_*dal 5

忽略实际的数学,Java中的数组索引本身就是一种性能损失.考虑到Java没有真正的多维数组,而是将它们实现为数组数组.在最里面的循环中,可以访问多个索引,其中一些索引在该循环中实际上是常量.部分数组访问可以移动到循环之外:

final int[] neuronOutputSlice = neuronOutput[layer - 1];
final int[][] fWeightSlice = fWeights[layer - 1];
for (int inputNeuron = 0; inputNeuron < biasNeuron; inputNeuron++) {
    activation += neuronOutputSlice[inputNeuron] * fWeightsSlice[inputNeuron][neuron];
}

Run Code Online (Sandbox Code Playgroud)

服务器JIT可能执行类似的代码不变运动,找到的唯一方法是更改和分析它.在客户端JIT上,无论如何都应该提高性能.您可以尝试的另一件事是预先计算for循环退出条件,如下所示:

for (int neuron = 0; neuron < neuronCounts[layer]; neuron++) { ... }
// transform to precalculated exit condition (move invariant array access outside loop)
for (int neuron = 0, neuronCount = neuronCounts[layer]; neuron < neuronCount; neuron++) { ... }

Run Code Online (Sandbox Code Playgroud)

JIT可能已经为您做了这个,所以如果它有帮助,请进行配置.

有没有一点可以乘以1.0F在我这里逃避？:

float activation = 1.0F * fWeights[layer - 1][biasNeuron][neuron];

Run Code Online (Sandbox Code Playgroud)

Other things that could potentially improve speed at cost of readability: inline sigmoid() function manually (the JIT has a very tight limit for inlining and the function might be larger). It can be slightly faster to run a loop backwards (where it doesnt change the outcome of course), since testing the loop index against zero is a little cheaper than checking against a local variable (the innermost loop is a potentical candidate again, but dont expect the output to be 100% identical in all cases, since adding floats a + b + c is potentially not the same as a + c + b).

数组和预先计算似乎将整体运行时间提高了 25% :) 谢谢。 (2认同)

归档时间：	15 年，5 月前
查看次数：	2010 次
最近记录：	8 年，11 月前