验证STL文件是ASCII还是二进制文件

Onl*_*Cop 13 c++ qt

在阅读了STL文件格式的规范之后,我想编写一些测试来确保文件实际上是有效的二进制文件或ASCII文件.

可以通过在字节0处查找文本" solid ",然后是空格(十六进制值\x20),然后是可选的文本字符串,后跟换行符来确定基于ASCII的STL文件.

二进制STL文件具有保留的80字节头,后跟4字节无符号整数(NumberOfTriangles),然后指定每个NumberOfTriangles构面的50字节数据.

每个三角形面的长度为50个字节:12个单精度(4字节)浮点数,后跟无符号短(2字节)无符号整数.

如果二进制文件正好是84 + NumberOfTriangles*50字节长,则通常可以将其视为有效的二进制文件.

不幸的是,二进制文件可以在80字节头的内容中包含从字节0开始的文本" solid ".因此,仅对该关键字进行测试不能正确地确定文件是ASCII还是二进制.

这是我到目前为止:

STL_STATUS getStlFileFormat(const QString &path)
{
    // Each facet contains:
    //  - Normals: 3 floats (4 bytes)
    //  - Vertices: 3x floats (4 bytes each, 12 bytes total)
    //  - AttributeCount: 1 short (2 bytes)
    // Total: 50 bytes per facet
    const size_t facetSize = 3*sizeof(float_t) + 3*3*sizeof(float_t) + sizeof(uint16_t);

    QFile file(path);
    if (!file.open(QIODevice::ReadOnly))
    {
        qDebug("\n\tUnable to open \"%s\"", qPrintable(path));
        return STL_INVALID;
    }

    QFileInfo fileInfo(path);
    size_t fileSize = fileInfo.size();

    if (fileSize < 84)
    {
        // 80-byte header + 4-byte "number of triangles" marker
        qDebug("\n\tThe STL file is not long enough (%u bytes).", uint(fileSize));
        return STL_INVALID;
    }

    // Look for text "solid" in first 5 bytes, indicating the possibility that this is an ASCII STL format.
    QByteArray fiveBytes = file.read(5);

    // Header is from bytes 0-79; numTriangleBytes starts at byte offset 80.
    if (!file.seek(80))
    {
        qDebug("\n\tCannot seek to the 80th byte (after the header)");
        return STL_INVALID;
    }

    // Read the number of triangles, uint32_t (4 bytes), little-endian
    QByteArray nTrianglesBytes = file.read(4);
    file.close();

    uint32_t nTriangles = *((uint32_t*)nTrianglesBytes.data());

    // Verify that file size equals the sum of header + nTriangles value + all triangles
    size_t targetSize = 84 + nTriangles * facetSize;
    if (fileSize == targetSize)
    {
        return STL_BINARY;
    }
    else if (fiveBytes.contains("solid"))
    {
        return STL_ASCII;
    }
    else
    {
        return STL_INVALID;
    }
}
Run Code Online (Sandbox Code Playgroud)

到目前为止,这对我有用,但我担心普通的ASCII文件的第80个字节可能包含一些ASCII字符,当转换为uint32_t时,实际上可能等于文件的长度(非常不可能,但并非不可能) .

是否有其他步骤可以证明我是否可以"绝对确定"文件是ASCII还是二进制?

更新:

根据@Powerswitch和@RemyLebeau的建议,我正在进一步测试关键字.这就是我现在所拥有的:

STL_STATUS getStlFileFormat(const QString &path)
{
    // Each facet contains:
    //  - Normals: 3 floats (4 bytes)
    //  - Vertices: 3x floats (4 byte each, 12 bytes total)
    //  - AttributeCount: 1 short (2 bytes)
    // Total: 50 bytes per facet
    const size_t facetSize = 3*sizeof(float_t) + 3*3*sizeof(float_t) + sizeof(uint16_t);

    QFile file(path);
    bool canFileBeOpened = file.open(QIODevice::ReadOnly);
    if (!canFileBeOpened)
    {
        qDebug("\n\tUnable to open \"%s\"", qPrintable(path));
        return STL_INVALID;
    }

    QFileInfo fileInfo(path);
    size_t fileSize = fileInfo.size();

    // The minimum size of an empty ASCII file is 15 bytes.
    if (fileSize < 15)
    {
        // "solid " and "endsolid " markers for an ASCII file
        qDebug("\n\tThe STL file is not long enough (%u bytes).", uint(fileSize));
        file.close();
        return STL_INVALID;
    }

    // Binary files should never start with "solid ", but just in case, check for ASCII, and if not valid
    // then check for binary...

    // Look for text "solid " in first 6 bytes, indicating the possibility that this is an ASCII STL format.
    QByteArray sixBytes = file.read(6);
    if (sixBytes.startsWith("solid "))
    {
        QString line;
        QTextStream in(&file);
        while (!in.atEnd())
        {
            line = in.readLine();
            if (line.contains("endsolid"))
            {
                file.close();
                return STL_ASCII;
            }
        }
    }

    // Wasn't an ASCII file. Reset and check for binary.
    if (!file.reset())
    {
        qDebug("\n\tCannot seek to the 0th byte (before the header)");
        file.close();
        return STL_INVALID;
    }

    // 80-byte header + 4-byte "number of triangles" for a binary file
    if (fileSize < 84)
    {
        qDebug("\n\tThe STL file is not long enough (%u bytes).", uint(fileSize));
        file.close();
        return STL_INVALID;
    }

    // Header is from bytes 0-79; numTriangleBytes starts at byte offset 80.
    if (!file.seek(80))
    {
        qDebug("\n\tCannot seek to the 80th byte (after the header)");
        file.close();
        return STL_INVALID;
    }

    // Read the number of triangles, uint32_t (4 bytes), little-endian
    QByteArray nTrianglesBytes = file.read(4);
    if (nTrianglesBytes.size() != 4)
    {
        qDebug("\n\tCannot read the number of triangles (after the header)");
        file.close();
        return STL_INVALID;
    }

    uint32_t nTriangles = *((uint32_t*)nTrianglesBytes.data());

    // Verify that file size equals the sum of header + nTriangles value + all triangles
    if (fileSize == (84 + (nTriangles * facetSize)))
    {
        file.close();
        return STL_BINARY;
    }

    return STL_INVALID;
}
Run Code Online (Sandbox Code Playgroud)

它似乎处理更多边缘情况,我试图以一种优雅地处理极大(几千兆字节)STL文件的方式编写它,而不需要立即将ENTIRE文件加载到内存中以便扫描"结束"文本.

随意提供任何反馈和建议(特别是对于将来寻找解决方案的人).

Rem*_*eau 8

如果文件不是以文件开头"solid ",并且文件大小正好是84 + (numTriangles * 50)字节,numTriangles从偏移量80读取的位置,则文件是二进制文件.

如果文件大小至少为15个字节(没有三角形的ASCII文件的绝对最小值)并且以文本开头"solid ",则读取其后的名称,直到到达换行符.检查下一行是否以"facet "或开头"endsolid [name]"(不允许其他值).如果"facet ",寻找文件的末尾,并确保它以一行说明结束"endsolid [name]".如果所有这些都为真,则该文件为ASCII.

将任何其他组合视为无效.

所以,像这样:

STL_STATUS getStlFileFormat(const QString &path)
{
    QFile file(path);
    if (!file.open(QIODevice::ReadOnly))
    {
        qDebug("\n\tUnable to open \"%s\"", qPrintable(path));
        return STL_INVALID;
    }

    QFileInfo fileInfo(path);
    size_t fileSize = fileInfo.size();

    // Look for text "solid " in first 6 bytes, indicating the possibility that this is an ASCII STL format.

    if (fileSize < 15)
    {
        // "solid " and "endsolid " markers for an ASCII file
        qDebug("\n\tThe STL file is not long enough (%u bytes).", uint(fileSize));
        return STL_INVALID;
    }

    // binary files should never start with "solid ", but
    // just in case, check for ASCII, and if not valid then
    // check for binary...

    QByteArray sixBytes = file.read(6);
    if (sixBytes.startsWith("solid "))
    {
        QByteArray name = file.readLine();
        QByteArray endLine = name.prepend("endsolid ");

        QByteArray nextLine = file.readLine();
        if (line.startsWith("facet "))
        {
            // TODO: seek to the end of the file, read the last line,
            // and make sure it is "endsolid [name]"...
            /*
            line = ...;
            if (!line.startsWith(endLine))
                return STL_INVALID;
            */
            return STL_ASCII;
        }
        if (line.startsWith(endLine))
            return STL_ASCII;

        // reset and check for binary...
        if (!file.reset())
        {
            qDebug("\n\tCannot seek to the 0th byte (before the header)");
            return STL_INVALID;
        }
    }

    if (fileSize < 84)
    {
        // 80-byte header + 4-byte "number of triangles" for a binary file
        qDebug("\n\tThe STL file is not long enough (%u bytes).", uint(fileSize));
        return STL_INVALID;
    }

    // Header is from bytes 0-79; numTriangleBytes starts at byte offset 80.
    if (!file.seek(80))
    {
        qDebug("\n\tCannot seek to the 80th byte (after the header)");
        return STL_INVALID;
    }

    // Read the number of triangles, uint32_t (4 bytes), little-endian
    QByteArray nTrianglesBytes = file.read(4);
    if (nTrianglesBytes.size() != 4)
    {
        qDebug("\n\tCannot read the number of triangles (after the header)");
        return STL_INVALID;
    }            

    uint32_t nTriangles = *((uint32_t*)nTrianglesBytes.data());

    // Verify that file size equals the sum of header + nTriangles value + all triangles
    if (fileSize == (84 + (nTriangles * 50)))
        return STL_BINARY;

    return STL_INVALID;
}
Run Code Online (Sandbox Code Playgroud)