GLSL ES 1.1中卷积的高效邻域纹理访问

Ale*_*int 6 iphone opengl-es glsl

我正在使用iPhone着色器GLSL ES 1.1中的3x3内核进行卷积.目前我正在进行9次纹理查找.有更快的方法吗?一些想法:

  • 将输入图像作为缓冲区而不是纹理传递,以避免调用纹理插值.

  • 从顶点着色器传递9个不同的vec2坐标(而不是我目前正在做的一个),以鼓励处理器有效地预取纹理.

  • 查看可能适用于此的各种Apple扩展.

  • (已添加)调查GLSL shaderOffset调用的ES等价物(在ES下不可用,但也许有等价物)

在硬件方面,我特别关注iPhone 4S.

Bra*_*son 7

你确定你不是指OpenGL ES 2.0吗?你不能使用OpenGL ES 1.1做任何类型的着色器.我会假设前者.

根据我的经验,我发现这样做的最快方式是你列出的第二个项目.我在我的GPUImage框架中做了几种类型的3x3卷积(你可以使用它而不是试图自己滚动)和那些我在水平和垂直方向的纹理偏移中输入并计算顶点内所需的九个纹理坐标着色器.从那里,我将那些作为变化传递给片段着色器.

这(大部分)避免了片段着色器中的依赖纹理读取,这在iOS PowerVR GPU上非常昂贵.我说"大多数情况下"因为在像iPhone 4这样的旧设备上,只有八种变化用于避免依赖纹理读取.正如我上周所了解到的那样,第九个触发了旧设备上的依赖纹理读取,因此可以减慢速度.然而,iPhone 4S没有这个问题,因为它支持以这种方式使用更多的变化.

我将以下内容用于顶点着色器:

 attribute vec4 position;
 attribute vec4 inputTextureCoordinate;

 uniform highp float texelWidth; 
 uniform highp float texelHeight; 

 varying vec2 textureCoordinate;
 varying vec2 leftTextureCoordinate;
 varying vec2 rightTextureCoordinate;

 varying vec2 topTextureCoordinate;
 varying vec2 topLeftTextureCoordinate;
 varying vec2 topRightTextureCoordinate;

 varying vec2 bottomTextureCoordinate;
 varying vec2 bottomLeftTextureCoordinate;
 varying vec2 bottomRightTextureCoordinate;

 void main()
 {
     gl_Position = position;

     vec2 widthStep = vec2(texelWidth, 0.0);
     vec2 heightStep = vec2(0.0, texelHeight);
     vec2 widthHeightStep = vec2(texelWidth, texelHeight);
     vec2 widthNegativeHeightStep = vec2(texelWidth, -texelHeight);

     textureCoordinate = inputTextureCoordinate.xy;
     leftTextureCoordinate = inputTextureCoordinate.xy - widthStep;
     rightTextureCoordinate = inputTextureCoordinate.xy + widthStep;

     topTextureCoordinate = inputTextureCoordinate.xy - heightStep;
     topLeftTextureCoordinate = inputTextureCoordinate.xy - widthHeightStep;
     topRightTextureCoordinate = inputTextureCoordinate.xy + widthNegativeHeightStep;

     bottomTextureCoordinate = inputTextureCoordinate.xy + heightStep;
     bottomLeftTextureCoordinate = inputTextureCoordinate.xy - widthNegativeHeightStep;
     bottomRightTextureCoordinate = inputTextureCoordinate.xy + widthHeightStep;
 }
Run Code Online (Sandbox Code Playgroud)

和片段着色器:

 precision highp float;

 uniform sampler2D inputImageTexture;

 uniform mediump mat3 convolutionMatrix;

 varying vec2 textureCoordinate;
 varying vec2 leftTextureCoordinate;
 varying vec2 rightTextureCoordinate;

 varying vec2 topTextureCoordinate;
 varying vec2 topLeftTextureCoordinate;
 varying vec2 topRightTextureCoordinate;

 varying vec2 bottomTextureCoordinate;
 varying vec2 bottomLeftTextureCoordinate;
 varying vec2 bottomRightTextureCoordinate;

 void main()
 {
     mediump vec4 bottomColor = texture2D(inputImageTexture, bottomTextureCoordinate);
     mediump vec4 bottomLeftColor = texture2D(inputImageTexture, bottomLeftTextureCoordinate);
     mediump vec4 bottomRightColor = texture2D(inputImageTexture, bottomRightTextureCoordinate);
     mediump vec4 centerColor = texture2D(inputImageTexture, textureCoordinate);
     mediump vec4 leftColor = texture2D(inputImageTexture, leftTextureCoordinate);
     mediump vec4 rightColor = texture2D(inputImageTexture, rightTextureCoordinate);
     mediump vec4 topColor = texture2D(inputImageTexture, topTextureCoordinate);
     mediump vec4 topRightColor = texture2D(inputImageTexture, topRightTextureCoordinate);
     mediump vec4 topLeftColor = texture2D(inputImageTexture, topLeftTextureCoordinate);

     mediump vec4 resultColor = topLeftColor * convolutionMatrix[0][0] + topColor * convolutionMatrix[0][1] + topRightColor * convolutionMatrix[0][2];
     resultColor += leftColor * convolutionMatrix[1][0] + centerColor * convolutionMatrix[1][1] + rightColor * convolutionMatrix[1][2];
     resultColor += bottomLeftColor * convolutionMatrix[2][0] + bottomColor * convolutionMatrix[2][1] + bottomRightColor * convolutionMatrix[2][2];

     gl_FragColor = resultColor;
 }
Run Code Online (Sandbox Code Playgroud)

即使有上述注意事项,这款着色器在iPhone 4上运行的视频为640x480帧时间约为2毫秒,而4S则可以使用这样的着色器轻松处理30帧/秒的1080p视频.