使用 nom 5.0 解析二进制文件

Question

使用 nom 5.0 解析二进制文件

问题

有一个文件里面有多个标题，但对我来说，它只重要一个和它后面的数据。此标头在文件中多次重复。

它的幻数是：ASCII 或0x65 0x51 0x48 0x54 0x52十六进制的A3046 。找到第一个字节后，解析器必须获取所有字节，直到0xffEOF，然后对剩余的头重复。

我的解决方案

首先我加载了文件：

let mut file = OpenOptions::new()
        .read(true)
        .open("../assets/sample")
        .unwrap();

    let mut full_file: Vec<u8> = Vec::new();
    file.read_to_end(&mut full_file);

Run Code Online (Sandbox Code Playgroud)

我用以下语句声明幻数：pub static QT_MAGIC: &[u8; 5] = b"A3046"; 作为测试，我编写了以下函数只是为了尝试它是否可以找到第一个标头。

fn parse_block(input: &[u8]) -> IResult<&[u8], &[u8]> {
    tag(QT_MAGIC)(input)
}

Run Code Online (Sandbox Code Playgroud)

但是，当测试运行时，Ok 有 None有价值。它肯定应该发现了什么。我做错了什么？

我没有发现使用 nom5 解析字节的例子，而且作为一个 rust 新手也无济于事。如何使用这些规则解析所有块？

Answer 1

Séb*_*uld 7

该`nom`版本

首先，为这个道歉，操场只有 nom 4.0，因此，代码在这个 github 存储库上。

要解析这样的东西，我们需要结合两个不同的解析器：

take_until, 取字节直到前导码或 EOF
tag, 隔离序言

还有一个组合子，preceded所以我们可以丢弃解析器序列的第一个元素。

// Our preamble
const MAGIC:&[u8] = &[0x65, 0x51, 0x48, 0x54, 0x52];
// Our EOF byte sequence
const EOF:&[u8] = &[0xff];

// Shorthand to catch EOF
fn match_to_eof(data: &[u8]) -> nom::IResult<&[u8], &[u8]> {
    nom::bytes::complete::take_until(EOF)(data)
}

// Shorthand to catch the preamble
fn take_until_preamble(data: &[u8]) -> nom::IResult<&[u8], &[u8]> {
    nom::bytes::complete::take_until(MAGIC)(data)
}
pub fn extract_from_data(data: &[u8]) -> Option<(&[u8], &[u8])> {
    let preamble_parser = nom::sequence::preceded(
        // Ditch anything before the preamble
        take_until_preamble,
        nom::sequence::preceded(
            // Ditch the preamble
            nom::bytes::complete::tag(MAGIC),
            // And take until the EOF (0xff)
            match_to_eof
        )
    );
    // And we swap the elements because it's confusing AF
    // as a return function
    preamble_parser(data).ok().map(|r| {
        (r.1, r.0)
    })
}

Run Code Online (Sandbox Code Playgroud)

代码应该被很好地注释以供遵循。这会丢弃任何字节，直到找到前导字节，然后丢弃这些字节并保留所有内容，直到找到 EOF 字节序列 ( [0xff])。

然后它返回一个相反的 nom结果，因为它是一个例子。如果您愿意，您可以取消反转它以将其与其他解析器结合使用。第一个元素是序列的内容，第二个元素是 EOF 之后的内容。这意味着你可以使用这个函数进行迭代（我在我放在 github 上的 repo 中的一个测试中做到了这一点）。

归档时间：	6 年，3 月前
查看次数：	1455 次
最近记录：	6 年，3 月前

使用 nom 5.0 解析二进制文件

问题

我的解决方案

该nom版本

该`nom`版本