矢量化Kinect真实世界坐标处理算法的速度

Question

矢量化Kinect真实世界坐标处理算法的速度

Log*_*ic1 13 python numpy vectorization kinect libfreenect2

我最近开始使用pylibfreenect2在Linux上使用Kinect V2.

当我第一次能够在散点图中显示深度帧数据时,我很失望地看到没有任何深度像素似乎在正确的位置.

房间的侧视图(通知天花板是弯曲的).

我做了一些研究,并意识到有一些简单的触发来完成转换.

为了测试,我开始使用pylibfreenect2中的预编写函数,该函数接受列,行和深度像素强度,然后返回该像素的实际位置:

X, Y, Z = registration.getPointXYZ(undistorted, row, col)

Run Code Online (Sandbox Code Playgroud)

这在纠正职位方面做得非常出色:

使用getPointXYZ()或getPointXYZRGB()的唯一缺点是它们一次只能处理一个像素.这在Python中可能需要一段时间,因为它需要使用嵌套的for循环,如下所示:

n_rows = d.shape[0]
n_columns = d.shape[1]
out = np.zeros((n_rows * n_columns, 3), dtype=np.float64)
for row in range(n_rows):
    for col in range(n_columns):
        X, Y, Z = registration.getPointXYZ(undistorted, row, col)
        out[row * n_columns + col] = np.array([Z, X, -Y])

Run Code Online (Sandbox Code Playgroud)

我试图更好地理解getPointXYZ()如何计算坐标.据我所知,它看起来类似于OpenKinect for Forfor函数:depthToPointCloudPos().虽然我怀疑libfreenect2的版本还有更多内容.

使用该gitHub源代码作为示例,然后我尝试在Python中重新编写它以进行我自己的实验,并出现以下内容:

#camera information based on the Kinect v2 hardware
CameraParams = {
  "cx":254.878,
  "cy":205.395,
  "fx":365.456,
  "fy":365.456,
  "k1":0.0905474,
  "k2":-0.26819,
  "k3":0.0950862,
  "p1":0.0,
  "p2":0.0,
}

def depthToPointCloudPos(x_d, y_d, z, scale = 1000):
    #calculate the xyz camera position based on the depth data    
    x = (x_d - CameraParams['cx']) * z / CameraParams['fx']
    y = (y_d - CameraParams['cy']) * z / CameraParams['fy']

    return x/scale, y/scale, z/scale

Run Code Online (Sandbox Code Playgroud)

这是传统的getPointXYZ和我的自定义函数之间的比较:

它们看起来很相似.但是有明显的差异.左侧比较显示平坦天花板上的边缘更直,也是一些正弦曲线形状.我怀疑还涉及额外的数学.

我很想知道是否有人知道我的函数和libfreenect2的getPointXYZ之间可能有什么不同.

然而,我在这里发布的主要原因是询问是否尝试将上述函数矢量化以处理整个数组而不是循环遍历每个元素.

应用我从上面学到的东西,我能够编写一个函数,它似乎是depthToPointCloudPos的矢量化替代:

[编辑]

感谢Benjamin帮助提高此功能的效率!

def depthMatrixToPointCloudPos(z, scale=1000):
    #bacically this is a vectorized version of depthToPointCloudPos()
    C, R = np.indices(z.shape)

    R = np.subtract(R, CameraParams['cx'])
    R = np.multiply(R, z)
    R = np.divide(R, CameraParams['fx'] * scale)

    C = np.subtract(C, CameraParams['cy'])
    C = np.multiply(C, z)
    C = np.divide(C, CameraParams['fy'] * scale)

    return np.column_stack((z.ravel() / scale, R.ravel(), -C.ravel()))

Run Code Online (Sandbox Code Playgroud)

这可以生成与前一个函数depthToPointCloudPos()相同的pointcloud结果.唯一的区别是我的处理速度从~1 Fps到5-10 Fps(WhooHoo!).我相信这消除了Python进行所有计算所造成的瓶颈.因此,我的散点图现在再次平滑运行,计算出半真实世界的坐标.

既然我有一个从深度帧中检索3d坐标的高效功能,我真的想应用这种方法来将彩色摄像机数据映射到我的深度像素.但是我不确定这样做会涉及哪些数学或变量,并且没有太多提及如何在Google上计算它.

或者,我可以使用libfreenect2使用getPointXYZRGB将颜色映射到我的深度像素:

#Format undistorted and regisered data to real-world coordinates with mapped colors (dont forget color=out_col in setData)
n_rows = d.shape[0]
n_columns = d.shape[1]
out = np.zeros((n_rows * n_columns, 3), dtype=np.float64)
colors = np.zeros((d.shape[0] * d.shape[1], 3), dtype=np.float64)
for row in range(n_rows):
    for col in range(n_columns):
        X, Y, Z, B, G, R = registration.getPointXYZRGB(undistorted, registered, row, col)
        out[row * n_columns + col] = np.array([X, Y, Z])
        colors[row * n_columns + col] = np.divide([R, G, B], 255)
sp2.setData(pos=np.array(out, dtype=np.float64), color=colors, size=2)

Run Code Online (Sandbox Code Playgroud)

Produces a pointcloud and colored vertexes (Very Slow <1Fps):

In summary my two questions basically are:

What additional steps would be required so that the real-world 3d coordinates data returned from my depthToPointCloudPos() function (and the vectorized implementation) are more resemblant of the data returned by getPointXYZ() from libfreenect2?
~~And, what would be involved in creating a (possibly vectorized) way to generate the depth to color registration map in my own application?~~ Please see the update as this has been solved.

[UPDATE]

I managed to map the color data to each pixel using the registered frame. It was very simple and only required adding these lines prior to calling setData():

colors = registered.asarray(np.uint8)
colors = np.divide(colors, 255)
colors = colors.reshape(colors.shape[0] * colors.shape[1], 4 )
colors = colors[:, :3:] #BGRA to BGR (slices out the alpha channel)  
colors = colors[...,::-1] #BGR to RGB

Run Code Online (Sandbox Code Playgroud)

This allows Python to quickly process the color data and gives smooth results. I have updated/added them to the functional example below.

Real-world coordinate processing with color registration running real-time in Python!

(GIF image resolution has been greatly reduced)

[UPDATE]

After spending a little more time with the application I have added some additional parameters and tuned their values with hopes to improve the visual quality of the scatter plot and possibly make things more intuitive for this example/question.

Most importantly I have set the vertexes to be opaque:

sp2 = gl.GLScatterPlotItem(pos=pos)
sp2.setGLOptions('opaque') # Ensures not to allow vertexes located behinde other vertexes to be seen.

Run Code Online (Sandbox Code Playgroud)

I then noticed whenever zooming very close to surfaces, the distance between adjacent verts would appear to expand until all of what was visible was mostly empty space. This was partially a result of the point size of the vertexes not changing.

To help aid in creating a "zoom-friendly" viewport full of colored vertexes I added these lines which calculates the vertex point size based on the current zoom level (for each update):

# Calculate a dynamic vertex size based on window dimensions and camera's position - To become the "size" input for the scatterplot's setData() function.
v_rate = 8.0 # Rate that vertex sizes will increase as zoom level increases (adjust this to any desired value).
v_scale = np.float32(v_rate) / gl_widget.opts['distance'] # Vertex size increases as the camera is "zoomed" towards center of view.
v_offset = (gl_widget.geometry().width() / 1000)**2 # Vertex size is offset based on actual width of the viewport.
v_size = v_scale + v_offset

Run Code Online (Sandbox Code Playgroud)

And lo and behold:

(Again, GIF image resolution has been greatly reduced)

Maybe not quite as good as skinning a pointcloud, but it does seem to help make things easier when trying to understand what you're actually looking at.

All mentioned modifications have been included in the functional example.

[UPDATE]

As seen in the previous two animations it is clear that the pointcloud of real-world coordinates has a skewed orientation compared to the grid axes. This is because I was not compensating for the Kinect's actual orientation in the real word!

Thus I have implemented an additional vectorized trig function which calculates a new (rotated and offset) coordinate for each vertex. This orients them correctly relative to the Kinect's actual position in real space. And is necessary when using tripods that tilt (could also be used to connect the output of an INU or gyro/accelerometer for real-time feedback).

def applyCameraMatrixOrientation(pt):
    # Kinect Sensor Orientation Compensation
    # bacically this is a vectorized version of applyCameraOrientation()
    # uses same trig to rotate a vertex around a gimbal.
    def rotatePoints(ax1, ax2, deg):
        # math to rotate vertexes around a center point on a plane.
        hyp = np.sqrt(pt[:, ax1] ** 2 + pt[:, ax2] ** 2) # Get the length of the hypotenuse of the real-world coordinate from center of rotation, this is the radius!
        d_tan = np.arctan2(pt[:, ax2], pt[:, ax1]) # Calculate the vertexes current angle (returns radians that go from -180 to 180)

        cur_angle = np.degrees(d_tan) % 360 # Convert radians to degrees and use modulo to adjust range from 0 to 360.
        new_angle = np.radians((cur_angle + deg) % 360) # The new angle (in radians) of the vertexes after being rotated by the value of deg.

        pt[:, ax1] = hyp * np.cos(new_angle) # Calculate the rotated coordinate for this axis.
        pt[:, ax2] = hyp * np.sin(new_angle) # Calculate the rotated coordinate for this axis.

    #rotatePoints(1, 2, CameraPosition['roll']) #rotate on the Y&Z plane # Disabled because most tripods don't roll. If an Inertial Nav Unit is available this could be used)
    rotatePoints(0, 2, CameraPosition['elevation']) #rotate on the X&Z plane
    rotatePoints(0, 1, CameraPosition['azimuth']) #rotate on the X&Y plane

    # Apply offsets for height and linear position of the sensor (from viewport's center)
    pt[:] += np.float_([CameraPosition['x'], CameraPosition['y'], CameraPosition['z']])



    return pt

Run Code Online (Sandbox Code Playgroud)

Just a note: rotatePoints() is only being called for 'elevation' and 'azimuth'. This is because most tripods don't support roll and to save on CPU cycles it has been disabled by default. If you intend on doing something fancy then definitely feel free to un-comment it!!

Notice the grid floor is level in this image but the left pointcloud is not aligned with it:

The parameters to set the Kinect's orientation:

CameraPosition = {
    "x": 0, # actual position in meters of kinect sensor relative to the viewport's center.
    "y": 0, # actual position in meters of kinect sensor relative to the viewport's center.
    "z": 1.7, # height in meters of actual kinect sensor from the floor.
    "roll": 0, # angle in degrees of sensor's roll (used for INU input - trig function for this is commented out by default).
    "azimuth": 0, # sensor's yaw angle in degrees.
    "elevation": -15, # sensor's pitch angle in degrees.
}

Run Code Online (Sandbox Code Playgroud)

您应该根据传感器的实际位置和方向更新这些:

两个最重要的参数是theta(仰角)角度和距离地板的高度.我只使用了一个简单的卷尺和一个校准的眼睛,但是我打算在某一天输入编码器或INU数据来实时更新这些参数(当传感器移动时).

同样,所有更改都反映在功能示例中.

如果有人成功地改进了这个例子或者有关于如何使事情变得更紧凑的建议,那么如果你能留下解释细节的评论,我将非常感激.

以下是此项目的完整功能示例:

#! /usr/bin/python

#--------------------------------#
# Kinect v2 point cloud visualization using a Numpy based 
# real-world coordinate processing algorithm and OpenGL.
#--------------------------------#

import sys
import numpy as np

from pyqtgraph.Qt import QtCore, QtGui
import pyqtgraph.opengl as gl

from pylibfreenect2 import Freenect2, SyncMultiFrameListener
from pylibfreenect2 import FrameType, Registration, Frame, libfreenect2

fn = Freenect2()
num_devices = fn.enumerateDevices()
if num_devices == 0:
    print("No device connected!")
    sys.exit(1)

serial = fn.getDeviceSerialNumber(0)
device = fn.openDevice(serial)

types = 0
types |= FrameType.Color
types |= (FrameType.Ir | FrameType.Depth)
listener = SyncMultiFrameListener(types)

# Register listeners
device.setColorFrameListener(listener)
device.setIrAndDepthFrameListener(listener)

device.start()

# NOTE: must be called after device.start()
registration = Registration(device.getIrCameraParams(),
                            device.getColorCameraParams())

undistorted = Frame(512, 424, 4)
registered = Frame(512, 424, 4)


#QT app
app = QtGui.QApplication([])
gl_widget = gl.GLViewWidget()
gl_widget.show()
gl_grid = gl.GLGridItem()
gl_widget.addItem(gl_grid)

#initialize some points data
pos = np.zeros((1,3))

sp2 = gl.GLScatterPlotItem(pos=pos)
sp2.setGLOptions('opaque') # Ensures not to allow vertexes located behinde other vertexes to be seen.

gl_widget.addItem(sp2)

# Kinects's intrinsic parameters based on v2 hardware (estimated).
CameraParams = {
  "cx":254.878,
  "cy":205.395,
  "fx":365.456,
  "fy":365.456,
  "k1":0.0905474,
  "k2":-0.26819,
  "k3":0.0950862,
  "p1":0.0,
  "p2":0.0,
}

def depthToPointCloudPos(x_d, y_d, z, scale=1000):
    # This runs in Python slowly as it is required to be called from within a loop, but it is a more intuitive example than it's vertorized alternative (Purly for example)
    # calculate the real-world xyz vertex coordinate from the raw depth data (one vertex at a time).
    x = (x_d - CameraParams['cx']) * z / CameraParams['fx']
    y = (y_d - CameraParams['cy']) * z / CameraParams['fy']

    return x / scale, y / scale, z / scale

def depthMatrixToPointCloudPos(z, scale=1000):
    # bacically this is a vectorized version of depthToPointCloudPos()
    # calculate the real-world xyz vertex coordinates from the raw depth data matrix.
    C, R = np.indices(z.shape)

    R = np.subtract(R, CameraParams['cx'])
    R = np.multiply(R, z)
    R = np.divide(R, CameraParams['fx'] * scale)

    C = np.subtract(C, CameraParams['cy'])
    C = np.multiply(C, z)
    C = np.divide(C, CameraParams['fy'] * scale)

    return np.column_stack((z.ravel() / scale, R.ravel(), -C.ravel()))

# Kinect's physical orientation in the real world.
CameraPosition = {
    "x": 0, # actual position in meters of kinect sensor relative to the viewport's center.
    "y": 0, # actual position in meters of kinect sensor relative to the viewport's center.
    "z": 1.7, # height in meters of actual kinect sensor from the floor.
    "roll": 0, # angle in degrees of sensor's roll (used for INU input - trig function for this is commented out by default).
    "azimuth": 0, # sensor's yaw angle in degrees.
    "elevation": -15, # sensor's pitch angle in degrees.
}

def applyCameraOrientation(pt):
    # Kinect Sensor Orientation Compensation
    # This runs slowly in Python as it is required to be called within a loop, but it is a more intuitive example than it's vertorized alternative (Purly for example)
    # use trig to rotate a vertex around a gimbal.
    def rotatePoints(ax1, ax2, deg):
        # math to rotate vertexes around a center point on a plane.
        hyp = np.sqrt(pt[ax1] ** 2 + pt[ax2] ** 2) # Get the length of the hypotenuse of the real-world coordinate from center of rotation, this is the radius!
        d_tan = np.arctan2(pt[ax2], pt[ax1]) # Calculate the vertexes current angle (returns radians that go from -180 to 180)

        cur_angle = np.degrees(d_tan) % 360 # Convert radians to degrees and use modulo to adjust range from 0 to 360.
        new_angle = np.radians((cur_angle + deg) % 360) # The new angle (in radians) of the vertexes after being rotated by the value of deg.

        pt[ax1] = hyp * np.cos(new_angle) # Calculate the rotated coordinate for this axis.
        pt[ax2] = hyp * np.sin(new_angle) # Calculate the rotated coordinate for this axis.

    #rotatePoints(0, 2, CameraPosition['roll']) #rotate on the Y&Z plane # Disabled because most tripods don't roll. If an Inertial Nav Unit is available this could be used)
    rotatePoints(1, 2, CameraPosition['elevation']) #rotate on the X&Z plane
    rotatePoints(0, 1, CameraPosition['azimuth']) #rotate on the X&Y plane

    # Apply offsets for height and linear position of the sensor (from viewport's center)
    pt[:] += np.float_([CameraPosition['x'], CameraPosition['y'], CameraPosition['z']])



    return pt

def applyCameraMatrixOrientation(pt):
    # Kinect Sensor Orientation Compensation
    # bacically this is a vectorized version of applyCameraOrientation()
    # uses same trig to rotate a vertex around a gimbal.
    def rotatePoints(ax1, ax2, deg):
        # math to rotate vertexes around a center point on a plane.
        hyp = np.sqrt(pt[:, ax1] ** 2 + pt[:, ax2] ** 2) # Get the length of the hypotenuse of the real-world coordinate from center of rotation, this is the radius!
        d_tan = np.arctan2(pt[:, ax2], pt[:, ax1]) # Calculate the vertexes current angle (returns radians that go from -180 to 180)

        cur_angle = np.degrees(d_tan) % 360 # Convert radians to degrees and use modulo to adjust range from 0 to 360.
        new_angle = np.radians((cur_angle + deg) % 360) # The new angle (in radians) of the vertexes after being rotated by the value of deg.

        pt[:, ax1] = hyp * np.cos(new_angle) # Calculate the rotated coordinate for this axis.
        pt[:, ax2] = hyp * np.sin(new_angle) # Calculate the rotated coordinate for this axis.

    #rotatePoints(1, 2, CameraPosition['roll']) #rotate on the Y&Z plane # Disabled because most tripods don't roll. If an Inertial Nav Unit is available this could be used)
    rotatePoints(0, 2, CameraPosition['elevation']) #rotate on the X&Z plane
    rotatePoints(0, 1, CameraPosition['azimuth']) #rotate on the X&Y

    # Apply offsets for height and linear position of the sensor (from viewport's center)
    pt[:] += np.float_([CameraPosition['x'], CameraPosition['y'], CameraPosition['z']])



    return pt


def update():
    colors = ((1.0, 1.0, 1.0, 1.0))

    frames = listener.waitForNewFrame()

    # Get the frames from the Kinect sensor
    ir = frames["ir"]
    color = frames["color"]
    depth = frames["depth"]

    d = depth.asarray() #the depth frame as an array (Needed only with non-vectorized functions)

    registration.apply(color, depth, undistorted, registered)

    # Format the color registration map - To become the "color" input for the scatterplot's setData() function.
    colors = registered.asarray(np.uint8)
    colors = np.divide(colors, 255) # values must be between 0.0 - 1.0
    colors = colors.reshape(colors.shape[0] * colors.shape[1], 4 ) # From: Rows X Cols X RGB -to- [[r,g,b],[r,g,b]...]
    colors = colors[:, :3:]  # remove alpha (fourth index) from BGRA to BGR
    colors = colors[...,::-1] #BGR to RGB

    # Calculate a dynamic vertex size based on window dimensions and camera's position - To become the "size" input for the scatterplot's setData() function.
    v_rate = 5.0 # Rate that vertex sizes will increase as zoom level increases (adjust this to any desired value).
    v_scale = np.float32(v_rate) / gl_widget.opts['distance'] # Vertex size increases as the camera is "zoomed" towards center of view.
    v_offset = (gl_widget.geometry().width() / 1000)**2 # Vertex size is offset based on actual width of the viewport.
    v_size = v_scale + v_offset

    # Calculate 3d coordinates (Note: five optional methods are shown - only one should be un-commented at any given time)

    """
    # Method 1 (No Processing) - Format raw depth data to be displayed
    m, n = d.shape
    R, C = np.mgrid[:m, :n]
    out = np.column_stack((d.ravel() / 4500, C.ravel()/m, (-R.ravel()/n)+1))
    """

    # Method 2 (Fastest) - Format and compute the real-world 3d coordinates using a fast vectorized algorithm - To become the "pos" input for the scatterplot's setData() function.
    out = depthMatrixToPointCloudPos(undistorted.asarray(np.float32))

    """
    # Method 3 - Format undistorted depth data to real-world coordinates
    n_rows, n_columns = d.shape
    out = np.zeros((n_rows * n_columns, 3), dtype=np.float32)
    for row in range(n_rows):
        for col in range(n_columns):
            z = undistorted.asarray(np.float32)[row][col]
            X, Y, Z = depthToPointCloudPos(row, col, z)
            out[row * n_columns + col] = np.array([Z, Y, -X])
    """

    """
    # Method 4 - Format undistorted depth data to real-world coordinates
    n_rows, n_columns = d.shape
    out = np.zeros((n_rows * n_columns, 3), dtype=np.float64)
    for row in range(n_rows):
        for col in range(n_columns):
            X, Y, Z = registration.getPointXYZ(undistorted, row, col)
            out[row * n_columns + col] = np.array([Z, X, -Y])
    """

    """
    # Method 5 - Format undistorted and regisered data to real-world coordinates with mapped colors (dont forget color=colors in setData)
    n_rows, n_columns = d.shape
    out = np.zeros((n_rows * n_columns, 3), dtype=np.float64)
    colors = np.zeros((d.shape[0] * d.shape[1], 3), dtype=np.float64)
    for row in range(n_rows):
        for col in range(n_columns):
            X, Y, Z, B, G, R = registration.getPointXYZRGB(undistorted, registered, row, col)
            out[row * n_columns + col] = np.array([Z, X, -Y])
            colors[row * n_columns + col] = np.divide([R, G, B], 255)
    """


    # Kinect sensor real-world orientation compensation.
    out = applyCameraMatrixOrientation(out)

    """
    # For demonstrating the non-vectorized orientation compensation function (slow)
    for i, pt in enumerate(out):
        out[i] = applyCameraOrientation(pt)
    """


    # Show the data in a scatter plot
    sp2.setData(pos=out, color=colors, size=v_size)

    # Lastly, release frames from memory.
    listener.release(frames)

t = QtCore.QTimer()
t.timeout.connect(update)
t.start(50)


## Start Qt event loop unless running in interactive mode.
if __name__ == '__main__':
    import sys
    if (sys.flags.interactive != 1) or not hasattr(QtCore, 'PYQT_VERSION'):
        QtGui.QApplication.instance().exec_()

device.stop()
device.close()

sys.exit(0)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Ben*_*min 4

这并不是一个完整的答案...我只是想指出您正在创建很多临时数组，您可以在其中就地执行更多操作：

def depthMatrixToPointCloudPos2(z, scale=1000):

    R, C = numpy.indices(z.shape)

    R -= CameraParams['cx'])
    R *= z
    R /= CameraParams['fx'] * scale

    C -= CameraParams['cy']
    C *= z
    C /= CameraParams['fy'] * scale

    return np.column_stack((z.ravel() / scale, R.ravel(), -C.ravel()))

Run Code Online (Sandbox Code Playgroud)

（如果我正确地阅读了你的代码。）

另外，请注意数据类型，如果您使用的是 64 位计算机，则默认情况下数据类型将为 64 位。您能否摆脱较小的类型，以减少需要处理的数据量？

归档时间：	9 年前
查看次数：	1947 次
最近记录：	7 年，1 月前