我有一个数据文件data.txt
a \n5 b \n3 c 7\nRun Code Online (Sandbox Code Playgroud)\n我想加载并拥有
\n julia> loaded_data\n3\xc3\x973 Matrix{Any}:\n "" "a" ""\n 5 "b" ""\n 3 "c" 7\nRun Code Online (Sandbox Code Playgroud)\n但我不清楚如何做到这一点。试readdlm
julia> using DelimitedFiles\n\njulia> readdlm("data.txt")\n3\xc3\x973 Matrix{Any}:\n "a" "" ""\n 5 "b" ""\n 3 "c" 7\nRun Code Online (Sandbox Code Playgroud)\n不能正确地将第一列的第一个元素识别为空白,而是读取"a"为第一个元素(这当然是有道理的)。我认为最接近我想要的就是使用readlines
julia> readlines("data.txt")\n3-element Vector{String}:\n " a "\n "5 b "\n "3 c 7"\nRun Code Online (Sandbox Code Playgroud)\n但从这里我不知道如何继续。我可以抓取其中一行及其所有列split,但不确定这如何帮助我识别其他行中的空元素。
这是一种可能性:
cnv(s) = (length(s) > 0 && all(isdigit, s)) ? parse(Int, s) : s
cnv.(stack(split.(replace.(eachline("data.txt")," "=>" "), " "), dims=1))
Run Code Online (Sandbox Code Playgroud)
如果列的内容足以区分以使解析唯一定义,我将在每一行上使用正则表达式:
\njulia> lines\n3-element Vector{String}:\n " a "\n "5 b "\n "3 c 7"\n\njulia> [match(r"\\s*(\\d*)\\s*([a-z]*)\\s*(\\d*)", s).captures for s in lines]\n3-element Vector{Vector{Union{Nothing, SubString{String}}}}:\n ["", "a", ""]\n ["5", "b", ""]\n ["3", "c", "7"]\nRun Code Online (Sandbox Code Playgroud)\n然后您可以根据需要继续解析和连接,例如
\njulia> mapreduce(vcat, lines) do line\n x, y, z = match(r"\\s*(\\d*)\\s*([a-z]*)\\s*(\\d*)", line).captures\n [tryparse(Int, x) y tryparse(Int, z)]\n end\n3\xc3\x973 Matrix{Any}:\n nothing "a" nothing\n 5 "b" nothing\n 3 "c" 7\nRun Code Online (Sandbox Code Playgroud)\n在 Julia 1.9 中,我认为你应该能够将其写为
\nstack(lines; dims=1) do line\n x, y, z = match(r"\\s*(\\d*)\\s*([a-z]*)\\s*(\\d*)", line).captures\n (tryparse(Int, x), y, tryparse(Int, z))\nend\nRun Code Online (Sandbox Code Playgroud)\n
这个问题可能有很多边缘情况需要澄清。
\n这是一个比其他答案更长的选项,但可能更适合针对边缘情况进行调整:
\nfunction splittable(d)\n # find all non-space locations\n t = sort(union(findall.(!isspace, d)...))\n # find initial indices of fields\n tt = t[vcat(1,findall(diff(t).!=1).+1)]\n # prepare ranges to extract fields\n tr = [tt[i]:tt[i+1]-1 for i in 1:length(tt)-1]\n # extract substrings\n vs = map(s -> strip.(vcat([s[intersect(r,eachindex(s))] for r in tr],\n tt[end]<=length(s) ? s[tt[end]:end] : "")), d)\n # fit substrings into matrix\n L = maximum(length.(vs))\n String.([j <= length(vs[i]) ? vs[i][j] : "" \n for i in 1:length(vs), j in 1:L])\nend\nRun Code Online (Sandbox Code Playgroud)\n和:
\njulia> d = readlines("data.txt")\n3-element Vector{String}:\n " a "\n "5 b "\n "3 c 7"\n\njulia> dd = splittable(d)\n3\xc3\x973 Matrix{String}:\n "" "a" ""\n "5" "b" ""\n "3" "c" "7"\nRun Code Online (Sandbox Code Playgroud)\n获得部分解析效果:
\nfunction parsewhatmay(m)\n M = tryparse.(Int, m)\n map((x,y)->isnothing(x) ? y : x, M, m)\nend\nRun Code Online (Sandbox Code Playgroud)\n现在:
\njulia> parsewhatmay(dd)\n3\xc3\x973 Matrix{Any}:\n "" "a" ""\n 5 "b" ""\n 3 "c" 7\nRun Code Online (Sandbox Code Playgroud)\n