Julia Dataframe group by和pivot tables函数

ccs*_*csv 5 dataframe julia

你如何使用Julia Dataframes进行分组和透视表?

让我们说我有Dataframe

using DataFrames

df =DataFrame(Location = [ "NY", "SF", "NY", "NY", "SF", "SF", "TX", "TX", "TX", "DC"],
                 Class = ["H","L","H","L","L","H", "H","L","L","M"],
                 Address = ["12 Silver","10 Fak","12 Silver","1 North","10 Fak","2 Fake", "1 Red","1 Dog","2 Fake","1 White"],
                 Score = ["4","5","3","2","1","5","4","3","2","1"])
Run Code Online (Sandbox Code Playgroud)

我想做以下事情:

1)具有LocationClass应输出的枢轴表

Class     H  L  M
Location         
DC        0  0  1
NY        2  1  0
SF        1  2  0
TX        1  2  0
Run Code Online (Sandbox Code Playgroud)

2)按"位置"分组,并计算该组中应记录的记录数

   Pop  
DC  1   
NY  3  
SF  3  
TX  3 
Run Code Online (Sandbox Code Playgroud)

And*_*den 7

您可以使用unstack大部分方式(DataFrames没有索引,因此Class必须保留一列,而不是在它将成为Index的pandas中),这似乎是DataFrames.jl的答案pivot_table:

julia> unstack(df, :Location, :Class, :Score)
WARNING: Duplicate entries in unstack.
4x4 DataFrames.DataFrame
| Row | Class | H   | L   | M   |
|-----|-------|-----|-----|-----|
| 1   | "DC"  | NA  | NA  | "1" |
| 2   | "NY"  | "3" | "2" | NA  |
| 3   | "SF"  | "5" | "1" | NA  |
| 4   | "TX"  | "4" | "2" | NA  |
Run Code Online (Sandbox Code Playgroud)

我不确定你fillna在这里(unstack没有这个选项)......

你可以使用GROUPBY bynrows(行数)的方法:

julia> by(df, :Location, nrow)
4x2 DataFrames.DataFrame
| Row | Location | x1 |
|-----|----------|----|
| 1   | "DC"     | 1  |
| 2   | "NY"     | 3  |
| 3   | "SF"     | 3  |
| 4   | "TX"     | 3  |
Run Code Online (Sandbox Code Playgroud)