图像尺寸、ByteBuffer 大小和格式不匹配

Sha*_*aza 5 face-detection dart flutter google-mlkit

我正在尝试在 flutter 中制作一个人脸识别应用程序。大部分代码取自此处。该项目使用了 Firebase ML Vision(现已弃用),因此我遵循了Google ML Kit 的迁移指南。我对代码的人脸检测部分进行了更改。

以下是检测功能的代码:

Future<List<Face>> detect(CameraImage image, InputImageRotation rotation) {

    final faceDetector = GoogleMlKit.vision.faceDetector(
      const FaceDetectorOptions(
        mode: FaceDetectorMode.accurate,
        enableLandmarks: true,
      ), 
    );
    return  faceDetector.processImage(
      InputImage.fromBytes(
        bytes: image.planes[0].bytes,
        inputImageData:InputImageData(
          inputImageFormat:InputImageFormatMethods.fromRawValue(image.format.raw)!,
          size: Size(image.width.toDouble(), image.height.toDouble()),
          imageRotation: rotation,
          planeData: image.planes.map(
            (Plane plane) {
              return InputImagePlaneMetadata(
                bytesPerRow: plane.bytesPerRow,
                height: plane.height,
                width: plane.width,
              );
            },
          ).toList(),
        ),
      ),
    );
  }
Run Code Online (Sandbox Code Playgroud)

当我调用该函数时,出现以下错误: 错误 我无法弄清楚我哪里做错了。这是initializeCamera函数(其中调用了检测函数):

void _initializeCamera() async {
    
    CameraDescription description = await getCamera(_direction);

    InputImageRotation rotation = rotationIntToImageRotation(
      description.sensorOrientation,
    );


      _camera =
        CameraController(description, ResolutionPreset.ultraHigh, enableAudio: false);
  
    await _camera!.initialize();
    await loadModel();
    //await Future.delayed(const Duration(milliseconds: 500));
    tempDir = await getApplicationDocumentsDirectory();
    String _embPath = tempDir!.path + '/emb.json';
    jsonFile =  File(_embPath);
    if (jsonFile!.existsSync()) data = json.decode(jsonFile!.readAsStringSync());

    _camera!.startImageStream((CameraImage image)async {
      if (_camera != null) {
        if (_isDetecting) {
          return;
        }
        _isDetecting = true; 
        String res;
        dynamic finalResult = Multimap<String, Face>();
        List<Face> faces = await detect(image, rotation);  <------------------ Detect Function

        if (faces.isEmpty) {
          _faceFound = false;
        } else {
          _faceFound = true;
        }
        Face _face;
        imglib.Image convertedImage =
            _convertCameraImage(image, _direction);
        for (_face in faces) {
          double x, y, w, h;
          x = (_face.boundingBox.left - 10);
          y = (_face.boundingBox.top - 10);
          w = (_face.boundingBox.width + 10);
          h = (_face.boundingBox.height + 10);
          imglib.Image croppedImage = imglib.copyCrop(
              convertedImage, x.round(), y.round(), w.round(), h.round());
          croppedImage = imglib.copyResizeCropSquare(croppedImage, 112);
          // int startTime = new DateTime.now().millisecondsSinceEpoch;
          res = _recog(croppedImage);
          // int endTime = new DateTime.now().millisecondsSinceEpoch;
          // print("Inference took ${endTime - startTime}ms");
          finalResult.add(res, _face);
        }
        setState(() {
          _scanResults = finalResult;
        });
        _isDetecting = false;
      }
    });
  }
Run Code Online (Sandbox Code Playgroud)

编辑:我终于找到了解决方案

以下“检测”功能为我解决了问题:

Future<List<Face>> detect(CameraImage image, InputImageRotation rotation) {

final faceDetector = GoogleMlKit.vision.faceDetector(
  const FaceDetectorOptions(
    mode: FaceDetectorMode.accurate,
    enableLandmarks: true,
  ), 
);
final WriteBuffer allBytes = WriteBuffer();
for (final Plane plane in image.planes) {
  allBytes.putUint8List(plane.bytes);
}
final bytes = allBytes.done().buffer.asUint8List();

final Size imageSize =
    Size(image.width.toDouble(), image.height.toDouble());
final inputImageFormat =
    InputImageFormatMethods.fromRawValue(image.format.raw) ??
        InputImageFormat.NV21;
final planeData = image.planes.map(
  (Plane plane) {
    return InputImagePlaneMetadata(
      bytesPerRow: plane.bytesPerRow,
      height: plane.height,
      width: plane.width,
    );
  },
).toList();

final inputImageData = InputImageData(
  size: imageSize,
  imageRotation: rotation,
  inputImageFormat: inputImageFormat,
  planeData: planeData,
);

return  faceDetector.processImage(
  InputImage.fromBytes(
    bytes: bytes,
    inputImageData:inputImageData
  ),
);
Run Code Online (Sandbox Code Playgroud)

}

小智 0

问题出在这个函数上

faceDetector.processImage(
      InputImage.fromBytes(
        bytes: image.planes[0].bytes,
        inputImageData:InputImageData(
          inputImageFormat:InputImageFormatMethods.fromRawValue(image.format.raw)!,
          size: Size(image.width.toDouble(), image.height.toDouble()),
          imageRotation: rotation,
          planeData: image.planes.map(
            (Plane plane) {
              return InputImagePlaneMetadata(
                bytesPerRow: plane.bytesPerRow,
                height: plane.height,
                width: plane.width,
              );
            },
          ).toList(),
        ),
      ),
Run Code Online (Sandbox Code Playgroud)

解决方案不是仅获取第一个平面的字节,而是image.planes[0].bytes使用以下方法组合来自所有平面的字节

faceDetector.processImage(
      InputImage.fromBytes(
        bytes: Uint8List.fromList(
        image.planes.fold(
            <int>[],
            (List<int> previousValue, element) =>
                previousValue..addAll(element.bytes)),
        ),
        inputImageData:InputImageData(
          inputImageFormat:InputImageFormatMethods.fromRawValue(image.format.raw)!,
          size: Size(image.width.toDouble(), image.height.toDouble()),
          imageRotation: rotation,
          planeData: image.planes.map(
            (Plane plane) {
              return InputImagePlaneMetadata(
                bytesPerRow: plane.bytesPerRow,
                height: plane.height,
                width: plane.width,
              );
            },
          ).toList(),
        ),
      ),

Run Code Online (Sandbox Code Playgroud)

我认为这是因为ios和android CameraImage格式之间的差异。在 Android 上,CameraImage 有多个平面,并且所有平面都有字节数据,因此我们必须将它们全部组合起来。我不确定它在 Ios 上如何工作。