如何在 Julia 中转换 Dataframe 列的类型？

Question

如何在 Julia 中转换 Dataframe 列的类型？

use*_*922 5 julia

我有一些格式错误的数据。具体来说，我有一些数字列，其中包含一些带有虚假文本的元素（例如“8 米”而不是“8”）。我想使用 readtable 读入数据，对数据进行必要的修复，然后将该列转换为 Float64 以使其行为正确（比较等）。

似乎有一个名为 @transform 的宏可以进行转换，但它已被删除。我现在该怎么做？

我目前最好的解决方案是清理数据，将其作为 csv 写出，然后使用 readtable 重新读取它并指定 eltypes。但这太可怕了。

我还可以做些什么？

Answer 1

Mr *_*pha 5

无需通过 csv 文件运行。您可以直接更改或更新 DataFrame。

using DataFrames
# Lets make up some data
df=DataFrame(A=rand(5),B=["8", "9 meters", "4.5", "3m", "12.0"])

# And then make a function to clean the data
function fixdata(arr)
    result = DataArray(Float64, length(arr))
    reg = r"[0-9]+\.*[0-9]*"
    for i = 1:length(arr)
        m = match(reg, arr[i])
        if m == nothing
            result[i] = NA
        else
            result[i] = float64(m.match)
        end
    end
    result
end

# Then just apply the function to the column to clean the data
# and then replace the column with the cleaned data.
df[:B] = fixdata(df[:B])

Run Code Online (Sandbox Code Playgroud)

是否有更多功能（编程）解决方案？ (2认同)

归档时间：	11 年，11 月前
查看次数：	4918 次
最近记录：	9 年，9 月前