当第一个元素为空时如何从文件中正确读取列

Question

当第一个元素为空时如何从文件中正确读取列

我有一个数据文件data.txt

\n

  a  \n5 b \n3 c 7\n

Run Code Online (Sandbox Code Playgroud)\n

我想加载并拥有

\n

 julia> loaded_data\n3\xc3\x973 Matrix{Any}:\n ""   "a"  ""\n 5  "b"  ""\n 3  "c"  7\n

Run Code Online (Sandbox Code Playgroud)\n

但我不清楚如何做到这一点。试readdlm

\n

julia> using DelimitedFiles\n\njulia> readdlm("data.txt")\n3\xc3\x973 Matrix{Any}:\n  "a"  ""    ""\n 5     "b"   ""\n 3     "c"  7\n

Run Code Online (Sandbox Code Playgroud)\n

不能正确地将第一列的第一个元素识别为空白，而是读取"a"为第一个元素（这当然是有道理的）。我认为最接近我想要的就是使用readlines

\n

julia> readlines("data.txt")\n3-element Vector{String}:\n "  a  "\n "5 b "\n "3 c 7"\n

Run Code Online (Sandbox Code Playgroud)\n

但从这里我不知道如何继续。我可以抓取其中一行及其所有列split，但不确定这如何帮助我识别其他行中的空元素。

\n

Answer 1

Sim*_*ure 6

这是一种可能性：


cnv(s) = (length(s) > 0 && all(isdigit, s)) ? parse(Int, s) : s

cnv.(stack(split.(replace.(eachline("data.txt"),"  "=>" "), " "), dims=1))

Run Code Online (Sandbox Code Playgroud)

Answer 2

phi*_*ler 6

如果列的内容足以区分以使解析唯一定义，我将在每一行上使用正则表达式：

\n

julia> lines\n3-element Vector{String}:\n "  a  "\n "5 b "\n "3 c 7"\n\njulia> [match(r"\\s*(\\d*)\\s*([a-z]*)\\s*(\\d*)", s).captures for s in lines]\n3-element Vector{Vector{Union{Nothing, SubString{String}}}}:\n ["", "a", ""]\n ["5", "b", ""]\n ["3", "c", "7"]\n

Run Code Online (Sandbox Code Playgroud)\n

然后您可以根据需要继续解析和连接，例如

\n

julia> mapreduce(vcat, lines) do line\n           x, y, z = match(r"\\s*(\\d*)\\s*([a-z]*)\\s*(\\d*)", line).captures\n           [tryparse(Int, x) y tryparse(Int, z)]\n       end\n3\xc3\x973 Matrix{Any}:\n  nothing  "a"   nothing\n 5         "b"   nothing\n 3         "c"  7\n

Run Code Online (Sandbox Code Playgroud)\n

在 Julia 1.9 中，我认为你应该能够将其写为

\n

stack(lines; dims=1) do line\n    x, y, z = match(r"\\s*(\\d*)\\s*([a-z]*)\\s*(\\d*)", line).captures\n    (tryparse(Int, x), y, tryparse(Int, z))\nend\n

Run Code Online (Sandbox Code Playgroud)\n

Answer 3

Dan*_*etz 3

这个问题可能有很多边缘情况需要澄清。

\n

这是一个比其他答案更长的选项，但可能更适合针对边缘情况进行调整：

\n

function splittable(d)\n    # find all non-space locations\n    t = sort(union(findall.(!isspace, d)...))\n    # find initial indices of fields\n    tt = t[vcat(1,findall(diff(t).!=1).+1)]\n    # prepare ranges to extract fields\n    tr = [tt[i]:tt[i+1]-1 for i in 1:length(tt)-1]\n    # extract substrings\n    vs = map(s -> strip.(vcat([s[intersect(r,eachindex(s))] for r in tr],\n                              tt[end]<=length(s) ? s[tt[end]:end] : "")), d)\n    # fit substrings into matrix\n    L = maximum(length.(vs))\n    String.([j <= length(vs[i]) ? vs[i][j] : "" \n      for i in 1:length(vs), j in 1:L])\nend\n

Run Code Online (Sandbox Code Playgroud)\n

和：

\n

julia> d = readlines("data.txt")\n3-element Vector{String}:\n "  a  "\n "5 b "\n "3 c 7"\n\njulia> dd = splittable(d)\n3\xc3\x973 Matrix{String}:\n ""   "a"  ""\n "5"  "b"  ""\n "3"  "c"  "7"\n

Run Code Online (Sandbox Code Playgroud)\n

获得部分解析效果：

\n

function parsewhatmay(m)\n    M = tryparse.(Int, m)\n    map((x,y)->isnothing(x) ? y : x, M, m)\nend\n

Run Code Online (Sandbox Code Playgroud)\n

现在：

\n

julia> parsewhatmay(dd)\n3\xc3\x973 Matrix{Any}:\n  ""  "a"   ""\n 5    "b"   ""\n 3    "c"  7\n

Run Code Online (Sandbox Code Playgroud)\n

归档时间：	3 年，3 月前
查看次数：	486 次
最近记录：	3 年，2 月前