Metal Compute管道不适用于MacOS,但适用于iOS

san*_*iso 3 macos gpgpu ios metal

我正在尝试用Metal做一些GPGPU计算.我有一个基本的Metal管道:

  • 创建所需的MTLComputePipelineState管道和所有关联的对象(MTLComputeCommandEncoder,命令队列等);
  • 为写作创建目标纹理(使用desc.usage = MTLTextureUsageShaderWrite;);
  • 启动一个基本着色器,用一些值填充这个纹理(在我的实验中,要么将一个颜色分量设置为1,要么根据线程坐标创建一个灰度值渐变);
  • 从GPU读回此纹理的内容.

我在2个设置中测试此代码:

  • 在OSX 10.11上使用2013年初的MacBook Pro;
  • 在带有iPhone 6的iOS 9上.

IOS版本上运行得很好,我得到正是我问着色器做.然而,OSX上,我获得了一个有效的(非零,具有正确大小)输出纹理.但是,当获取数据时,我得到的所有数据都是0.

我错过了一个特定于OS X实现的步骤吗?这似乎发生在NVIDIA GT650MIntel HD4000上,或者可能是运行时中的错误?

由于我目前不知道如何进一步调查这个问题,所以在这方面的任何帮助也将不胜感激:-)

编辑 - 我目前的实施

这是我实现的初始(失败)状态.它有点长,但主要是用于创建管道的样板代码:

id<MTLDevice> device = MTLCreateSystemDefaultDevice();
id<MTLLibrary> library = [device newDefaultLibrary];
id<MTLCommandQueue> commandQueue = [device newCommandQueue];

NSError *error = nil;
id<MTLComputePipelineState> pipeline = [device newComputePipelineStateWithFunction:[library
                                                                                    newFunctionWithName:@"dummy"]
                                                                             error:&error];
if (error)
{
    NSLog(@"%@", [error localizedDescription]);
}
MTLTextureDescriptor *desc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA8Unorm
                                                                                width:16
                                                                               height:1
                                                                            mipmapped:NO];
desc.usage = MTLTextureUsageShaderWrite;

id<MTLTexture> texture = [device newTextureWithDescriptor:desc];

MTLSize threadGroupCounts = MTLSizeMake(8, 1, 1);
MTLSize threadGroups = MTLSizeMake([texture width]  / threadGroupCounts.width,
                                   [texture height] / threadGroupCounts.height,
                                   1);

id<MTLCommandBuffer> commandBuffer = [commandQueue commandBuffer];

id<MTLComputeCommandEncoder> commandEncoder = [commandBuffer computeCommandEncoder];
[commandEncoder setComputePipelineState:pipeline];

[commandEncoder setTexture:texture atIndex:0];
[commandEncoder dispatchThreadgroups:threadGroups threadsPerThreadgroup:threadGroupCounts];
[commandEncoder endEncoding];

[commandBuffer commit];
[commandBuffer waitUntilCompleted];
Run Code Online (Sandbox Code Playgroud)

用于获取数据的代码如下(我将文件拆分为两部分以获得更小的代码块):

// Get the data back
uint8_t* imageBytes = malloc([texture width] * [texture height] * 4);
assert(imageBytes);
MTLRegion region = MTLRegionMake2D(0, 0, [texture width], [texture height]);
[texture getBytes:imageBytes bytesPerRow:[texture width]*4 fromRegion:region mipmapLevel:0];
for (int i = 0; i < 16; ++i)
{
    NSLog(@"Pix = %d %d %d %d",
          *((uint8_t*)imageBytes + 4 * i),
          *((uint8_t*)imageBytes + 4 * i + 1),
          *((uint8_t*)imageBytes + 4 * i + 2),
          *((uint8_t*)imageBytes + 4 * i + 3));
}
Run Code Online (Sandbox Code Playgroud)

这是着色器代码(将1写入红色和alpha,在主机上读取时应在输出缓冲区中变为0xff):

#include <metal_stdlib>

using namespace metal;

kernel void dummy(texture2d<float, access::write> outTexture [[ texture(0) ]],
                  uint2 gid [[ thread_position_in_grid ]])
{
    outTexture.write(float4(1.0, 0.0, 0.0, 1.0), gid);
}
Run Code Online (Sandbox Code Playgroud)

小智 8

同步代码:

[commandEncoder setTexture:texture atIndex:0];
[commandEncoder dispatchThreadgroups:threadGroups threadsPerThreadgroup:threadGroupCounts];
[commandEncoder endEncoding];

//
// synchronize texture from gpu to host mem
//
id<MTLBlitCommandEncoder> blitEncoder = [commandBuffer blitCommandEncoder];
[blitEncoder synchronizeTexture:texture slice:0 level:0];
[blitEncoder endEncoding];

[commandBuffer commit];
[commandBuffer waitUntilCompleted];
Run Code Online (Sandbox Code Playgroud)

这是在2012年中期的Mac Book上使用与您相同的GPU和2015年中期使用AMD Radeon R9 M370X2048МБ进行测试的.

有时我使用跟随技巧来获取没有同步的纹理数据:

id<MTLComputeCommandEncoder> commandEncoder = [commandBuffer computeCommandEncoder];
[commandEncoder setComputePipelineState:pipeline];

[commandEncoder setTexture:texture atIndex:0];
[commandEncoder dispatchThreadgroups:threadGroups threadsPerThreadgroup:threadGroupCounts];
[commandEncoder endEncoding];

//
// one trick: copy texture from GPU mem to shared
//
id<MTLBlitCommandEncoder> blitEncoder = [commandBuffer blitCommandEncoder];

[blitEncoder copyFromTexture:texture
                 sourceSlice: 0
                 sourceLevel: 0
                sourceOrigin: MTLOriginMake(0, 0, 0)
                  sourceSize: MTLSizeMake([texture width], [texture height], 1)
                    toBuffer: texturebuffer
           destinationOffset: 0
      destinationBytesPerRow: [texture width] * 4
    destinationBytesPerImage: 0];

[blitEncoder endEncoding];

[commandBuffer commit];
[commandBuffer waitUntilCompleted];


// Get the data back
uint8_t* imageBytes = [texturebuffer contents];

for (int i = 0; i < 16; ++i)
{
    NSLog(@"Pix = %d %d %d %d",
          *((uint8_t*)imageBytes + 4 * i),
          *((uint8_t*)imageBytes + 4 * i + 1),
          *((uint8_t*)imageBytes + 4 * i + 2),
          *((uint8_t*)imageBytes + 4 * i + 3));
}
Run Code Online (Sandbox Code Playgroud)

两种方法都能正常工作

在此输入图像描述

  • 出于好奇:原始代码(没有blit调用)是否可以在带有AMD卡的Mac上运行?我想检查它是否与驱动程序相关的问题或Mac OS Metal实现更通用的问题. (2认同)