你如何使用Julia Dataframes进行分组和透视表?
让我们说我有Dataframe
using DataFrames
df =DataFrame(Location = [ "NY", "SF", "NY", "NY", "SF", "SF", "TX", "TX", "TX", "DC"],
Class = ["H","L","H","L","L","H", "H","L","L","M"],
Address = ["12 Silver","10 Fak","12 Silver","1 North","10 Fak","2 Fake", "1 Red","1 Dog","2 Fake","1 White"],
Score = ["4","5","3","2","1","5","4","3","2","1"])
Run Code Online (Sandbox Code Playgroud)
我想做以下事情:
1)具有Location和Class应输出的枢轴表
Class H L M
Location
DC 0 0 1
NY 2 1 0
SF 1 2 0
TX 1 2 0
Run Code Online (Sandbox Code Playgroud)
2)按"位置"分组,并计算该组中应记录的记录数
Pop
DC 1
NY 3
SF 3
TX 3
Run Code Online (Sandbox Code Playgroud)
您可以使用unstack大部分方式(DataFrames没有索引,因此Class必须保留一列,而不是在它将成为Index的pandas中),这似乎是DataFrames.jl的答案pivot_table:
julia> unstack(df, :Location, :Class, :Score)
WARNING: Duplicate entries in unstack.
4x4 DataFrames.DataFrame
| Row | Class | H | L | M |
|-----|-------|-----|-----|-----|
| 1 | "DC" | NA | NA | "1" |
| 2 | "NY" | "3" | "2" | NA |
| 3 | "SF" | "5" | "1" | NA |
| 4 | "TX" | "4" | "2" | NA |
Run Code Online (Sandbox Code Playgroud)
我不确定你fillna在这里(unstack没有这个选项)......
你可以使用GROUPBY by与nrows(行数)的方法:
julia> by(df, :Location, nrow)
4x2 DataFrames.DataFrame
| Row | Location | x1 |
|-----|----------|----|
| 1 | "DC" | 1 |
| 2 | "NY" | 3 |
| 3 | "SF" | 3 |
| 4 | "TX" | 3 |
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3930 次 |
| 最近记录: |