Raku NativeCall 到 Rust FFI CArray 大小限制 - SegFault

p6s*_*eve 6 ffi rust raku

下面是我的代码的最小(?)可重现示例。这是 WIP Raku 模块 Dan::Polars 压力测试/基准测试的第一步。

\n

在 Rust 中,我用这段代码创建了一个 libmre.so

\n
  1 use libc::c_char;\n  2 use libc::size_t;\n  3 use std::slice;\n  4 use std::ffi::*; //{CStr, CString,}\n  5 \n  6 //  Container\n  7 \n  8 pub struct VecC {\n  9     ve: Vec::<String>,\n 10 }   \n 11 \n 12 impl VecC {\n 13     fn new(data: Vec::<String>) -> VecC               \n 14     {   \n 15         VecC {\n 16             ve: data,\n 17         }   \n 18     }   \n 19     \n 20     fn show(&self) {\n 21         println!{"{:?}", self.ve};\n 22     }   \n 23 }   \n 24     \n 25 #[no_mangle]\n 26 pub extern "C" fn ve_new_str(ptr: *const *const c_char, len: size_t)                        \n 27     -> *mut VecC {    \n 28     \n 29     let mut ve_data = Vec::<String>::new();\n 30     unsafe {\n 31         assert!(!ptr.is_null());\n 32         \n 33         for item in slice::from_raw_parts(ptr, len as usize) {\n 34             ve_data.push(CStr::from_ptr(*item).to_string_lossy().into_owned());\n 35         };  \n 36     };  \n 37     \n 38     Box::into_raw(Box::new(VecC::new(ve_data)))\n 39 }   \n 40 \n 41 #[no_mangle]\n 42 pub extern "C" fn ve_show(ptr: *mut VecC) {\n 43     let ve_c = unsafe {\n 44         assert!(!ptr.is_null());\n 45         &mut *ptr\n 46     };  \n 47     \n 48     ve_c.show();\n 49 }  \n
Run Code Online (Sandbox Code Playgroud)\n

和这个 Cargo.toml

\n
  1 [package]\n  2 name = "mre"\n  3 version = "0.1.0"\n  4 edition = "2021"\n  5 \n  6 # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html\n  7 \n  8 [dependencies]\n  9 libc = "0.2.126"\n 10 \n 11 [lib]\n 12 name = "mre"\n 13 path = "src/lib.rs"\n 14 crate-type = ["cdylib"]\n
Run Code Online (Sandbox Code Playgroud)\n

在 Raku 中,我像这样使用 libmre.so

\n
  1 #!/usr/bin/env raku\n  2 use lib '../lib';\n  3 \n  4 use NativeCall;\n  5 \n  6 #my $output;    #mre tried to move decl to here to avoid going out of scope\n  7 sub carray( $dtype, @items ) {\n  8     my $output := CArray[$dtype].new();\n  9     loop ( my $i = 0; $i < @items; $i++ ) {\n 10         $output[$i] = @items[$i]\n 11     }\n 12     say $output;\n 13     $output\n 14 }   \n 15     \n 16 ### Container Classes that interface to Rust lib.rs ###\n 17 \n 18 constant $n-path    = '../mre/target/debug/mre';\n 19 \n 20 class VecC is repr('CPointer') is export {\n 21     sub ve_new_str(CArray[Str],size_t) returns VecC is native($n-path) { * }\n 22     sub ve_show(VecC) is native($n-path) { * }\n 23     \n 24     method new(@data) { \n 25         ve_new_str(carray(Str, @data), @data.elems );\n 26     }   \n 27     \n 28     method show {\n 29         ve_show(self)\n 30     }   \n 31 }   \n 32 \n 33 my \\N = 100;   #should be 2e9  #fails between 30 and 100\n 34 my \\K = 100;   \n 35 \n 36 sub randChar(\\f, \\numGrp, \\N) {\n 37     my @things = [sprintf(f, $_) for 1..numGrp];\n 38     @things[[1..numGrp].roll(N)];\n 39 }   \n 40 \n 41 my @data = [randChar("id%03d", K, N)];\n 42 \n 43 my $vec = VecC.new( @data );\n 44 $vec.show;\n
Run Code Online (Sandbox Code Playgroud)\n

当 \\N <30 时,运行良好,输出如下:

\n
NativeCall::Types::CArray[Str].new\n["id098", "id035", "id024", "id067", "id051", "id025", "id024", "id092", "id044", "id042", "id033", "id004", "id100", "id091", "id087", "id059", "id031", "id063", "id019", "id035"]\n
Run Code Online (Sandbox Code Playgroud)\n

然而,当 \\N > 50 时,我得到:

\n
NativeCall::Types::CArray[Str].new\nSegmentation fault (core dumped)\n
Run Code Online (Sandbox Code Playgroud)\n

这是:

\n
Welcome to Rakudo\xe2\x84\xa2 v2022.04.\nImplementing the Raku\xc2\xae Programming Language v6.d.\nBuilt on MoarVM version 2022.04.\non ubuntu\n
Run Code Online (Sandbox Code Playgroud)\n

由于基准测试要求 \\N 为 2e9,因此我可以使用一些帮助来尝试解决此问题。

\n

如果您想在家尝试,欢迎您在 Docker Hub 上使用 p6steve/raku-dan:polars-2022.02-arm64 (或 -amd64)。别忘了cargo build第一次去。这包括RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | bash -s -- -y

\n

gmo*_*kin 7

功能上有一个小错误randChar

sub randChar(\f, \numGrp, \N) {
    my @things = [sprintf(f, $_) for 1..numGrp];     
    @things[[1..numGrp].roll(N)];
}   
Run Code Online (Sandbox Code Playgroud)

您正在使用从到 的@things索引对数组进行索引,但 的最大索引是。因此有时返回数组的一个(或多个)元素中存在 a而不是字符串。1numGrp@thingsnumGrp - 1(Any)

你想要的是这样的:

sub randChar(\f, \numGrp, \N) {
    my @things = [sprintf(f, $_) for 1..numGrp];     
    @things.roll(N); # call roll directly on @things
}   
Run Code Online (Sandbox Code Playgroud)

  • FWIW 我的 MBA (M1) 现在可以在 6.7 秒内完成 1e6 迭代(然后在 RAM 中)。如果还没有准备好处理 50GB 的数据集,那么对于 0.5GB 的数据集来说还算可以! (2认同)