Rob*_*inW 5 c++ opencv computer-vision augmented-reality camera-calibration
我目前正在尝试使用 OpenCV 重建相机姿势。对于实现,我大致遵循对极几何示例。这个想法如下:
cv::xfeatures2d::SURF::detectAndCompute()cv::DescriptorMatcher::knnMatch()cv::findEssentialMat()cv::recoverPose()作为输入数据,我使用免费提供的New Tsukuba Stereo Dataset中的一系列图片。该数据集是人工创建的一系列具有已知地面真实姿势的立体相机帧。为了测试我的实现,我估计了第 1 张和第 60 张左图以及第 1 张和第 60 张右图之间的相对姿势。两对的平移向量应指向大致相同的方向,因为两对是来自左相机和右相机的对应图片。从地面实况来看,两个姿势的旋转幅度应该大约为 7 度。不幸的是,当我运行我的实现时,我得到了以下姿势:
First pair:
-> Rotation vector: [[0.3406, 0.9054, 0.2534]]
-> Rotation angle: 10.975deg
-> Translation vector: [[-0.8103, 0.04748, -0.5841]]
Second pair:
-> Rotation vector: [[0.7907, 0.5027, 0.3494]]
-> Rotation angle: 5.24811deg
-> Translation vector: [[0.748, 0.2306, -0.6223]]
Run Code Online (Sandbox Code Playgroud)
我的结果到处都是,我不太确定出了什么问题。两次旋转都不接近 7 度,并且平移向量指向完全不同的方向。我创建了一个最小的代码示例来演示我的流程:
First pair:
-> Rotation vector: [[0.3406, 0.9054, 0.2534]]
-> Rotation angle: 10.975deg
-> Translation vector: [[-0.8103, 0.04748, -0.5841]]
Second pair:
-> Rotation vector: [[0.7907, 0.5027, 0.3494]]
-> Rotation angle: 5.24811deg
-> Translation vector: [[0.748, 0.2306, -0.6223]]
Run Code Online (Sandbox Code Playgroud)
正如您在上面的代码中看到的,我实现了一些调试机制以确保该过程按预期工作。首先,我确保有合理数量的匹配特征,以便能够估计基本矩阵和相对姿势:
namespace fs = std::filesystem;
const fs::path ROOT_DIR = "NewTsukubaStereoDataset";
const cv::Matx33d CAMERA_MAT(615, 0, 640, 0, 615, 480, 0, 0, 1);
constexpr double HESSIAN_THRESHOLD = 400;
constexpr double LOWE_THRESHOLD = 0.8;
constexpr double RANSAC_CONFIDENCE = 0.999;
constexpr double MAX_DIST_TO_EPIPOLAR = 1;
constexpr int MAX_RANSAC_ITERS = 1000;
constexpr std::size_t MIN_INLIERS = 100;
std::optional<cv::Affine3d> reconstructPose(const cv::Mat& firstPic, const cv::Mat& secondPic, const double scale) {
// initialize data structures
std::vector<cv::KeyPoint> firstKeyPoints, secondKeyPoints;
cv::Mat firstDescriptors, secondDescriptors, inlierMask;
cv::Matx33d essentialMat, rotation;
cv::Vec3d translation;
std::vector<std::vector<cv::DMatch>> knnFeatureMatches;
std::vector<cv::Point2f> firstInlierPts, secondInlierPts;
// initialize algorithms
cv::Ptr<cv::xfeatures2d::SURF> detector = cv::xfeatures2d::SURF::create(HESSIAN_THRESHOLD);
cv::Ptr<cv::DescriptorMatcher> matcher = cv::DescriptorMatcher::create(cv::DescriptorMatcher::FLANNBASED);
// compute features
detector->detectAndCompute(firstPic, cv::noArray(), firstKeyPoints, firstDescriptors);
detector->detectAndCompute(secondPic, cv::noArray(), secondKeyPoints, secondDescriptors);
// find matching features
matcher->knnMatch(firstDescriptors, secondDescriptors, knnFeatureMatches, 2);
// ratio test as per Lowe's paper (copied from the opencv example)
for(std::size_t i = 0; i < knnFeatureMatches.size(); ++i) {
if(knnFeatureMatches[i][0].distance < LOWE_THRESHOLD * knnFeatureMatches[i][1].distance) {
const cv::DMatch& m = knnFeatureMatches[i][0];
firstInlierPts.push_back(firstKeyPoints[m.queryIdx].pt);
secondInlierPts.push_back(secondKeyPoints[m.trainIdx].pt);
}
}
// require a minimum number of inliers for effective ransac execution
if(firstInlierPts.size() < MIN_INLIERS) {
std::cerr << "Not enough inliers for essential matrix estimation" << std::endl;
return std::nullopt;
}
// estimate essential matrix
essentialMat = cv::findEssentialMat(firstInlierPts, secondInlierPts, CAMERA_MAT, cv::RANSAC,
RANSAC_CONFIDENCE, MAX_DIST_TO_EPIPOLAR, MAX_RANSAC_ITERS, inlierMask);
// require minimum number of valid inliers as well as a valid essential matrix (see https://en.wikipedia.org/wiki/Essential_matrix#Properties)
if(!isValidEssentialMatrix(essentialMat) || cv::sum(inlierMask)(0) < MIN_INLIERS) {
std::cerr << "Invalid essential matrix" << std::endl;
return std::nullopt;
}
// estimate pose from the essential matrix
const std::size_t numPoints = cv::recoverPose(essentialMat, firstInlierPts, secondInlierPts, CAMERA_MAT, rotation, translation, inlierMask);
// recoverPose returns a unit length translation that needs to be scaled accordingly
translation *= scale;
// require minimum number of valid inliers as well as a valid rotation matrix
if (isValidRotationMatrix(rotation) && numPoints >= MIN_INLIERS) {
displayDebugPicture(firstPic, secondPic, inlierMask, firstInlierPts, secondInlierPts);
return cv::Affine3d(rotation, translation);
} else {
std::cerr << "Invalid estimated pose" << std::endl;
return std::nullopt;
}
}
int main(int argc, char* argv[]) {
// loading the data
const cv::Mat left0 = cv::imread(ROOT_DIR / "illumination" / "fluorescent" / "L_00001.png", cv::IMREAD_GRAYSCALE);
const cv::Mat left1 = cv::imread(ROOT_DIR / "illumination" / "fluorescent" / "L_00060.png", cv::IMREAD_GRAYSCALE);
const cv::Mat right0 = cv::imread(ROOT_DIR / "illumination" / "fluorescent" / "R_00001.png", cv::IMREAD_GRAYSCALE);
const cv::Mat right1 = cv::imread(ROOT_DIR / "illumination" / "fluorescent" / "R_00060.png", cv::IMREAD_GRAYSCALE);
// reconstruct first pose (rotation angle should be around 7deg)
std::cout << "Left pair:" << std::endl;
std::optional<cv::Affine3d> pose0 = reconstructPose(left0, left1, 1);
if(pose0.has_value()) {
printAffine3d(pose0.value()); // prints the pose like I mentioned it above
}
// reconstruct second pose (rotation angle should be around 7deg)
std::cout << "Right pair:" << std::endl;
std::optional<cv::Affine3d> pose1 = reconstructPose(right0, right1, 1);
if(pose1.has_value()) {
printAffine3d(pose1.value()); // prints the pose like I mentioned it above
}
return EXIT_SUCCESS;
}
Run Code Online (Sandbox Code Playgroud)
除了检查有效内点匹配的数量之外,我还通过确保基本矩阵包含两个相等的奇异值和一个为零来检查基本矩阵的有效性。我还验证了输出cv::recoverPose()以确保它是旋转矩阵:
if(firstInlierPts.size() < MIN_INLIERS) {
std::cerr << "Not enough inliers for essential matrix estimation" << std::endl;
return std::nullopt;
}
Run Code Online (Sandbox Code Playgroud)
为了确保匹配的特征确实有意义,我还将它们显示在并排视图中,并验证它们实际上代表了图片中的对应点:

此时,我已经不知道我还能做些什么来找出我的代码不起作用的原因。我在这里遗漏了一些东西还是我错误地使用了 OpenCV?为什么我在这里得到姿势的无意义数据?OpenCV代码本身有bug吗?(我使用的是4.7.0版本)
我们知道数据集:
\n\n\n图像分辨率为640x480像素,立体相机的基线为10cm,相机焦距为615像素。
\n
相机矩阵包含焦距和光心。
\n光学中心通常位于图像的中心。理想值为cx = (width-1) / 2和cy = (height-1) / 2。
负二分之一是因为像素中心是整数,如果您想象一张 4 像素宽的图像,则中心将位于中间两个像素之间,即坐标 1.5 = (4-1) / 2。
\n在代码中,给出了这个相机矩阵:
\nconst cv::Matx33d CAMERA_MAT( // good for images sized 1280 x 960\n 615, 0, 640,\n 0, 615, 480,\n 0, 0, 1);\nRun Code Online (Sandbox Code Playgroud)\n这意味着图像大小约为 1280 x 960。
\n对于 640 x 480 图像,更合理的矩阵是
\nconst cv::Matx33d CAMERA_MAT( // good for images sized 640 x 480\n 615, 0, (640-1)/2,\n 0, 615, (480-1)/2,\n 0, 0, 1);\nRun Code Online (Sandbox Code Playgroud)\n光学中心位于右下角的投影矩阵非常不寻常,但并非不可能。它将 1280 x 960 传感器图像裁剪到左上角即可产生这种情况。这种裁剪(左侧和顶部保持原样)不会移动光学中心。
\n顺便说一下,假设f = 615,就可以计算出视野。这些源自投影矩阵,并进行了一些简化:
\ntan(theta/2) * f = width/2=> HFoV ~55.0\xc2\xb0tan(theta/2) * f = height/2=> VFoV ~42.6\xc2\xb0tan(theta/2) * f = hypot(width,height)/2=> DFoV ~66.1\xc2\xb0也就是说,不考虑镜头畸变。
\n由于这是一对立体图像,您可能计划计算视差图。视差和距离(沿 Z,而不是欧几里德)通过“基线”(一种瞳孔间距 (IPD))相关。
\n我们有一个由眼睛和 3D 点组成的三角形,您可以假设 (wlog) 一只眼睛直接看着它,而另一只眼睛“错过”了该点一段距离(以像素为单位)。
\n该三角形上的方程:
\ntan(alpha) * f [px] = disparity [px]tan(alpha) * distance [m] = baseline [m]综合:
\ndistance [m] * disparity [px] = baseline [m] * f [px]重新排列口味。
\n考虑到所有这些,我们还可以计算某个点的 X/Y 位移,给定其距离(沿 Z,而不是欧几里德)和屏幕坐标(相对于光心)。
\nX[m] / Z[m] * f + cx = x[px]X[m] = (x[px] - cx) / f * Z[m]因此,如果有一米远的点,其屏幕坐标为x = 300,则x-cx = 300-319.5 = -19.5、 和X [m] = -0.0317 [m]。如果距离十米的话X[m] = -0.317 [m]