Swift iOS - 视觉框架文本识别和矩形

Fil*_*p Z 2 vision uiview text-recognition ios swift

我试图在使用 Vision 框架找到的文本区域上绘制矩形,但它们总是有点偏离。我这样做是这样的:

    public func drawOccurrencesOnImage(_ occurrences: [CGRect], _ image: UIImage) -> UIImage? {

    UIGraphicsBeginImageContextWithOptions(image.size, false, 0.0)

    image.draw(at: CGPoint.zero)
    let currentContext = UIGraphicsGetCurrentContext()

    currentContext?.addRects(occurrences)
    currentContext?.setStrokeColor(UIColor.red.cgColor)
    currentContext?.setLineWidth(2.0)
    currentContext?.strokePath()

    guard let drawnImage = UIGraphicsGetImageFromCurrentImageContext() else { return UIImage() }

    UIGraphicsEndImageContext()
    return drawnImage
}
Run Code Online (Sandbox Code Playgroud)

但返回的图像看起来总是差不多,但并不真正正确: 第一张图片

第二张图片

第三张图片

这就是我创建盒子的方式,与苹果的方式完全相同:

        let boundingRects: [CGRect] = observations.compactMap { observation in

        guard let candidate = observation.topCandidates(1).first else { return .zero }

        let stringRange = candidate.string.startIndex..<candidate.string.endIndex
        let boxObservation = try? candidate.boundingBox(for: stringRange)

        let boundingBox = boxObservation?.boundingBox ?? .zero

        return VNImageRectForNormalizedRect(boundingBox,
                                            Int(UIViewController.chosenImage?.width ?? 0),
                                            Int(UIViewController.chosenImage?.height ?? 0))
    }
Run Code Online (Sandbox Code Playgroud)

(来源:https ://developer.apple.com/documentation/vision/recognizing_text_in_images )

谢谢。

Rob*_*Rob 5

坐标翻转VNImageRectForNormalizedRect后返回。(我怀疑它是为 macOS 编写的,它使用与 iOS 不同的坐标系。)CGRecty

boundingBox相反,我可能会建议改编自Detecting Objects in Still Images的版本:

/// Convert Vision coordinates to pixel coordinates within image.
///
/// Adapted from `boundingBox` method from
/// [Detecting Objects in Still Images](https://developer.apple.com/documentation/vision/detecting_objects_in_still_images).
/// This flips the y-axis.
///
/// - Parameters:
///   - boundingBox: The bounding box returned by Vision framework.
///   - bounds: The bounds within the image (in pixels, not points).
///
/// - Returns: The bounding box in pixel coordinates, flipped vertically so 0,0 is in the upper left corner

func convert(boundingBox: CGRect, to bounds: CGRect) -> CGRect {
    let imageWidth = bounds.width
    let imageHeight = bounds.height

    // Begin with input rect.
    var rect = boundingBox

    // Reposition origin.
    rect.origin.x *= imageWidth
    rect.origin.x += bounds.minX
    rect.origin.y = (1 - rect.maxY) * imageHeight + bounds.minY

    // Rescale normalized coordinates.
    rect.size.width *= imageWidth
    rect.size.height *= imageHeight

    return rect
}
Run Code Online (Sandbox Code Playgroud)

请注意,我更改了方法名称,因为它不返回边界框,而是将边界框(值在 [0,1] 中)转换为CGRect. 我还修复了它们实现中的一个小错误boundingBox。但它抓住了主要思想,即翻转边界框的 y 轴。

无论如何,这会产生正确的框:

在此输入图像描述


例如

func recognizeText(in image: UIImage) {
    guard let cgImage = image.cgImage else { return }
    let imageRequestHandler = VNImageRequestHandler(cgImage: cgImage, orientation: .up)

    let size = CGSize(width: cgImage.width, height: cgImage.height) // note, in pixels from `cgImage`; this assumes you have already rotate, too
    let bounds = CGRect(origin: .zero, size: size)
    // Create a new request to recognize text.
    let request = VNRecognizeTextRequest { [self] request, error in
        guard
            let results = request.results as? [VNRecognizedTextObservation],
            error == nil
        else { return }

        let rects = results.map {
            convert(boundingBox: $0.boundingBox, to: CGRect(origin: .zero, size: size))
        }

        let string = results.compactMap {
            $0.topCandidates(1).first?.string
        }.joined(separator: "\n")

        let format = UIGraphicsImageRendererFormat()
        format.scale = 1
        let final = UIGraphicsImageRenderer(bounds: bounds, format: format).image { _ in
            image.draw(in: bounds)
            UIColor.red.setStroke()
            for rect in rects {
                let path = UIBezierPath(rect: rect)
                path.lineWidth = 5
                path.stroke()
            }
        }

        DispatchQueue.main.async { [self] in
            imageView.image = final
            label.text = string
        }
    }

    DispatchQueue.global(qos: .userInitiated).async {
        do {
            try imageRequestHandler.perform([request])
        } catch {
            print("Failed to perform image request: \(error)")
            return
        }
    }
}

/// Convert Vision coordinates to pixel coordinates within image.
///
/// Adapted from `boundingBox` method from
/// [Detecting Objects in Still Images](https://developer.apple.com/documentation/vision/detecting_objects_in_still_images).
/// This flips the y-axis.
///
/// - Parameters:
///   - boundingBox: The bounding box returned by Vision framework.
///   - bounds: The bounds within the image (in pixels, not points).
///
/// - Returns: The bounding box in pixel coordinates, flipped vertically so 0,0 is in the upper left corner

func convert(boundingBox: CGRect, to bounds: CGRect) -> CGRect {
    let imageWidth = bounds.width
    let imageHeight = bounds.height

    // Begin with input rect.
    var rect = boundingBox

    // Reposition origin.
    rect.origin.x *= imageWidth
    rect.origin.x += bounds.minX
    rect.origin.y = (1 - rect.maxY) * imageHeight + bounds.minY

    // Rescale normalized coordinates.
    rect.size.width *= imageWidth
    rect.size.height *= imageHeight

    return rect
}

///  Scale and orient picture for Vision framework
///
///  From [Detecting Objects in Still Images](https://developer.apple.com/documentation/vision/detecting_objects_in_still_images).
///
///  - Parameter image: Any `UIImage` with any orientation
///  - Returns: An image that has been rotated such that it can be safely passed to Vision framework for detection.

func scaleAndOrient(image: UIImage) -> UIImage {

    // Set a default value for limiting image size.
    let maxResolution: CGFloat = 640

    guard let cgImage = image.cgImage else {
        print("UIImage has no CGImage backing it!")
        return image
    }

    // Compute parameters for transform.
    let width = CGFloat(cgImage.width)
    let height = CGFloat(cgImage.height)
    var transform = CGAffineTransform.identity

    var bounds = CGRect(x: 0, y: 0, width: width, height: height)

    if width > maxResolution ||
        height > maxResolution {
        let ratio = width / height
        if width > height {
            bounds.size.width = maxResolution
            bounds.size.height = round(maxResolution / ratio)
        } else {
            bounds.size.width = round(maxResolution * ratio)
            bounds.size.height = maxResolution
        }
    }

    let scaleRatio = bounds.size.width / width
    let orientation = image.imageOrientation
    switch orientation {
    case .up:
        transform = .identity
    case .down:
        transform = CGAffineTransform(translationX: width, y: height).rotated(by: .pi)
    case .left:
        let boundsHeight = bounds.size.height
        bounds.size.height = bounds.size.width
        bounds.size.width = boundsHeight
        transform = CGAffineTransform(translationX: 0, y: width).rotated(by: 3.0 * .pi / 2.0)
    case .right:
        let boundsHeight = bounds.size.height
        bounds.size.height = bounds.size.width
        bounds.size.width = boundsHeight
        transform = CGAffineTransform(translationX: height, y: 0).rotated(by: .pi / 2.0)
    case .upMirrored:
        transform = CGAffineTransform(translationX: width, y: 0).scaledBy(x: -1, y: 1)
    case .downMirrored:
        transform = CGAffineTransform(translationX: 0, y: height).scaledBy(x: 1, y: -1)
    case .leftMirrored:
        let boundsHeight = bounds.size.height
        bounds.size.height = bounds.size.width
        bounds.size.width = boundsHeight
        transform = CGAffineTransform(translationX: height, y: width).scaledBy(x: -1, y: 1).rotated(by: 3.0 * .pi / 2.0)
    case .rightMirrored:
        let boundsHeight = bounds.size.height
        bounds.size.height = bounds.size.width
        bounds.size.width = boundsHeight
        transform = CGAffineTransform(scaleX: -1, y: 1).rotated(by: .pi / 2.0)
    default:
        transform = .identity
    }

    return UIGraphicsImageRenderer(size: bounds.size).image { rendererContext in
        let context = rendererContext.cgContext

        if orientation == .right || orientation == .left {
            context.scaleBy(x: -scaleRatio, y: scaleRatio)
            context.translateBy(x: -height, y: 0)
        } else {
            context.scaleBy(x: scaleRatio, y: -scaleRatio)
            context.translateBy(x: 0, y: -height)
        }
        context.concatenate(transform)
        context.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height))
    }
}
Run Code Online (Sandbox Code Playgroud)

  • 我通过放弃 VNImageRectForNormalizedRect 并直接返回boundingBox 解决了这个问题。然后只需使用@Rob提供的convert()函数,其中boundingBox是BoundingBox(CGRect),第二个参数是以像素为单位的UIImage大小(不是image.size)。根据高度和宽度(以像素为单位)创建 CGRect。通过此函数转换所有出现的情况并使用它们进行绘制。我写这篇文章只是为了将来遇到这个问题的人。谢谢。 (4认同)