当第一个元素为空时如何从文件中正确读取列

Beb*_*ron 7 io julia

我有一个数据文件data.txt

\n
  a  \n5 b \n3 c 7\n
Run Code Online (Sandbox Code Playgroud)\n

我想加载并拥有

\n
 julia> loaded_data\n3\xc3\x973 Matrix{Any}:\n ""   "a"  ""\n 5  "b"  ""\n 3  "c"  7\n
Run Code Online (Sandbox Code Playgroud)\n

但我不清楚如何做到这一点。试readdlm

\n
julia> using DelimitedFiles\n\njulia> readdlm("data.txt")\n3\xc3\x973 Matrix{Any}:\n  "a"  ""    ""\n 5     "b"   ""\n 3     "c"  7\n
Run Code Online (Sandbox Code Playgroud)\n

不能正确地将第一列的第一个元素识别为空白,而是读取"a"为第一个元素(这当然是有道理的)。我认为最接近我想要的就是使用readlines

\n
julia> readlines("data.txt")\n3-element Vector{String}:\n "  a  "\n "5 b "\n "3 c 7"\n
Run Code Online (Sandbox Code Playgroud)\n

但从这里我不知道如何继续。我可以抓取其中一行及其所有列split,但不确定这如何帮助我识别其他行中的空元素。

\n

Sim*_*ure 6

这是一种可能性:


cnv(s) = (length(s) > 0 && all(isdigit, s)) ? parse(Int, s) : s

cnv.(stack(split.(replace.(eachline("data.txt"),"  "=>" "), " "), dims=1))
Run Code Online (Sandbox Code Playgroud)


phi*_*ler 6

如果列的内容足以区分以使解析唯一定义,我将在每一行上使用正则表达式:

\n
julia> lines\n3-element Vector{String}:\n "  a  "\n "5 b "\n "3 c 7"\n\njulia> [match(r"\\s*(\\d*)\\s*([a-z]*)\\s*(\\d*)", s).captures for s in lines]\n3-element Vector{Vector{Union{Nothing, SubString{String}}}}:\n ["", "a", ""]\n ["5", "b", ""]\n ["3", "c", "7"]\n
Run Code Online (Sandbox Code Playgroud)\n

然后您可以根据需要继续解析和连接,例如

\n
julia> mapreduce(vcat, lines) do line\n           x, y, z = match(r"\\s*(\\d*)\\s*([a-z]*)\\s*(\\d*)", line).captures\n           [tryparse(Int, x) y tryparse(Int, z)]\n       end\n3\xc3\x973 Matrix{Any}:\n  nothing  "a"   nothing\n 5         "b"   nothing\n 3         "c"  7\n
Run Code Online (Sandbox Code Playgroud)\n

在 Julia 1.9 中,我认为你应该能够将其写为

\n
stack(lines; dims=1) do line\n    x, y, z = match(r"\\s*(\\d*)\\s*([a-z]*)\\s*(\\d*)", line).captures\n    (tryparse(Int, x), y, tryparse(Int, z))\nend\n
Run Code Online (Sandbox Code Playgroud)\n


Dan*_*etz 3

这个问题可能有很多边缘情况需要澄清。

\n

这是一个比其他答案更长的选项,但可能更适合针对边缘情况进行调整:

\n
function splittable(d)\n    # find all non-space locations\n    t = sort(union(findall.(!isspace, d)...))\n    # find initial indices of fields\n    tt = t[vcat(1,findall(diff(t).!=1).+1)]\n    # prepare ranges to extract fields\n    tr = [tt[i]:tt[i+1]-1 for i in 1:length(tt)-1]\n    # extract substrings\n    vs = map(s -> strip.(vcat([s[intersect(r,eachindex(s))] for r in tr],\n                              tt[end]<=length(s) ? s[tt[end]:end] : "")), d)\n    # fit substrings into matrix\n    L = maximum(length.(vs))\n    String.([j <= length(vs[i]) ? vs[i][j] : "" \n      for i in 1:length(vs), j in 1:L])\nend\n
Run Code Online (Sandbox Code Playgroud)\n

和:

\n
julia> d = readlines("data.txt")\n3-element Vector{String}:\n "  a  "\n "5 b "\n "3 c 7"\n\njulia> dd = splittable(d)\n3\xc3\x973 Matrix{String}:\n ""   "a"  ""\n "5"  "b"  ""\n "3"  "c"  "7"\n
Run Code Online (Sandbox Code Playgroud)\n

获得部分解析效果:

\n
function parsewhatmay(m)\n    M = tryparse.(Int, m)\n    map((x,y)->isnothing(x) ? y : x, M, m)\nend\n
Run Code Online (Sandbox Code Playgroud)\n

现在:

\n
julia> parsewhatmay(dd)\n3\xc3\x973 Matrix{Any}:\n  ""  "a"   ""\n 5    "b"   ""\n 3    "c"  7\n
Run Code Online (Sandbox Code Playgroud)\n