您将如何在Mathematica中执行数据透视表功能?

fay*_*sou 13 wolfram-mathematica

Excel中的数据透视(或交叉表)非常有用.有没有人已经考虑过如何在Mathematica中实现类似的功能?

Mr.*_*ard 10

我不熟悉数据透视表的使用,但是在上面链接的页面上举例说明,我建议:

Needs["Calendar`"]
key = # -> #2[[1]] & ~MapIndexed~
       {"Region", "Gender", "Style", "Ship Date", "Units", "Price", "Cost"};
choices = {
   {"North", "South", "East", "West"},
   {"Boy", "Girl"},
   {"Tee", "Golf", "Fancy"},
   IntegerString[#, 10, 2] <> "/2011" & /@ Range@12,
   Range@15,
   Range[8.00, 15.00, 0.01],
   Range[6.00, 14.00, 0.01]
   };
data = RandomChoice[#, 150] & /@ choices // Transpose;
Run Code Online (Sandbox Code Playgroud)

这创建data了如下所示:

 {"East", "Girl", "Golf", "03/2011", 6, 12.29`, 6.18`},
 {"West", "Boy", "Fancy", "08/2011", 6, 13.01`, 12.39`},
 {"North", "Girl", "Golf", "05/2011", 1, 14.87`, 12.89`},
 {"East", "Girl", "Golf", "09/2011", 3, 13.99`, 6.25`},
 {"North", "Girl", "Golf", "09/2011", 13, 12.66`, 8.57`},
 {"East", "Boy", "Fancy", "10/2011", 2, 14.46`, 6.85`},
 {"South", "Boy", "Golf", "11/2011", 13, 12.45`, 11.23`}
 ...
Run Code Online (Sandbox Code Playgroud)

然后:

h1 = Union@data[[All, "Region" /. key]];
h2 = Union@data[[All, "Ship Date" /. key]];

Reap[
   Sow[#[[{"Units", "Ship Date"} /. key]], #[["Region" /. key]]] & ~Scan~ data,
   h1,
   Reap[Sow @@@ #2, h2, Total @ #2 &][[2]] &
][[2]];

TableForm[Join @@ %, TableHeadings -> {h1, h2}]
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

这是一个粗略的例子,但它给出了如何做到这一点的想法.如果您有更具体的要求,我会尝试解决它们.


这是Sjoerd答案方式的更新.

Manipulate模块主要是复制,但我相信我的pivotTableData是更高效,我所追求的正确定位符号,因为这是如今作为可用的代码,而不是一个粗略的例子.

我从相同的样本数据开始,但我嵌入了字段标题,因为我觉得这更能代表正常使用.

data = ImportString[#, "TSV"][[1]] & /@ Flatten[Import["http://lib.stat.cmu.edu/datasets/CPS_85_Wages"][[28 ;; -7]]];

data = Transpose[{
    data[[All, 1]], 
    data[[All, 2]] /. {1 -> "South", 0 -> "Elsewhere"}, 
    data[[All, 3]] /. {1 -> "Female", 0 -> "Male"},
    data[[All, 4]], 
    data[[All, 5]] /. {1 -> "Union Member", 0 -> "No member"}, 
    data[[All, 6]],
    data[[All, 7]], 
    data[[All, 8]] /. {1 -> "Other", 2 -> "Hispanic", 3 -> "White"}, 
    data[[All, 9]] /. {1 -> "Management", 2 -> "Sales", 3 -> "Clerical", 4 -> "Service", 5 -> "Professional", 6 -> "Other"}, 
    data[[All, 10]] /. {0 -> "Other", 1 -> "Manufacturing", 2 -> "Construction"}, 
    data[[All, 11]] /. {1 -> "Married", 0 -> "Unmarried"}
}];

PrependTo[data,
  {"Education", "South", "Sex", "Experience", "Union", "Wage", "Age", "Race", "Occupation", "Sector", "Marriatal status"}
  ];
Run Code Online (Sandbox Code Playgroud)

pivotTableData是自给自足的.

pivotTableData[data_, field1_, field2_, dependent_, op_] :=
  Module[{key, sow, h1, h2, ff},
    (key@# = #2[[1]]) & ~MapIndexed~ data[[1]];
    sow = #[[key /@ {dependent, field2}]] ~Sow~ #[[key@field1]] &;
    {h1, h2} = Union@data[[2 ;;, key@#]] & /@ {field1, field2};
    ff = # /. {{} -> Missing@"NotAvailable", _ :> op @@ #} &;
    {
     {h1, h2},
     Join @@ Reap[sow ~Scan~ Rest@data, h1, ff /@ Reap[Sow @@@ #2, h2][[2]] &][[2]]
    }
  ]
Run Code Online (Sandbox Code Playgroud)

pivotTable仅依赖于pivotTableData:

pivotTable[data_?MatrixQ] :=
 DynamicModule[{raw, t, header = data[[1]], opList =
    {Mean              -> "Mean of \[Rule]",
     Total             -> "Sum of \[Rule]",
     Length            -> "Count of \[Rule]",
     StandardDeviation -> "SD of \[Rule]",
     Min               -> "Min of \[Rule]",
     Max               -> "Max of \[Rule]"}},
  Manipulate[
   raw = pivotTableData[data, f1, f2, f3, op];
   t = ConstantArray["", Length /@ raw[[1]] + 2];
   t[[1, 1]] = Control[{op, opList}];
   t[[1, 3]] = Control[{f2, header}];
   t[[2, 1]] = Control[{f1, header}];
   t[[1, 2]] = Control[{f3, header}];
   {{t[[3 ;; -1, 1]], t[[2, 3 ;; -1]]}, t[[3 ;; -1, 3 ;; -1]]} = raw;
   TableView[N@t, Dividers -> All],
   Initialization :> {op = Mean, f1 = data[[1,1]], f2 = data[[1,2]], f3 = data[[1,3]]}
  ]
 ]
Run Code Online (Sandbox Code Playgroud)

使用很简单:

pivotTable[data]
Run Code Online (Sandbox Code Playgroud)


Sjo*_*ies 10

快速而又脏的数据透视表可视化:

我将从一个更有趣的现实生活数据集开始:

data = ImportString[#, "TSV"][[1]] & /@ 
          Flatten[Import["http://lib.stat.cmu.edu/datasets/CPS_85_Wages"][[28 ;; -7]]
       ];
Run Code Online (Sandbox Code Playgroud)

一点后处理:

data =
  {
    data[[All, 1]],
    data[[All, 2]] /. {1 -> "South", 0 -> "Elsewhere"},
    data[[All, 3]] /. {1 -> "Female", 0 -> "Male"},
    data[[All, 4]],
    data[[All, 5]] /. {1 -> "Union Member", 0 -> "No member"},
    data[[All, 6]],
    data[[All, 7]],
    data[[All, 8]] /. {1 -> "Other", 2 -> "Hispanic", 3 -> "White"},
    data[[All, 9]] /. {1 -> "Management", 2 -> "Sales", 3 -> "Clerical", 
                      4 -> "Service", 5 -> "Professional", 6 -> "Other"},
    data[[All, 10]] /. {0 -> "Other", 1 -> "Manufacturing", 2 -> "Construction"},
    data[[All, 11]] /. {1 -> "Married", 0 -> "Unmarried"}
  }\[Transpose];

header = {"Education", "South", "Sex", "Experience", "Union", "Wage", 
          "Age", "Race", "Occupation", "Sector", "Marriatal status"};
MapIndexed[(headerNumber[#1] = #2[[1]]) &, header];
levelNames = Union /@ Transpose[data];
levelLength = Length /@ levelNames;
Run Code Online (Sandbox Code Playgroud)

现在是真实的东西.它还使用Mathematica工具包中的内容中SelectEquivalents定义的功能

pivotTableData[levelName1_, levelName2_, dependent_, op_] :=
 Table[
  SelectEquivalents[data,
    FinalFunction -> (If[Length[#] == 0, Missing["NotAvailable"], op[# // Flatten]] &),
    TagPattern -> 
        _?(#[[headerNumber[levelName1]]] == levelMember1 && 
           #[[headerNumber[levelName2]]] == levelMember2 &),
    TransformElement -> (#[[headerNumber[dependent]]] &)
   ],
   {levelMember1, levelNames[[headerNumber[levelName1]]]},
   {levelMember2, levelNames[[headerNumber[levelName2]]]}
 ]

DynamicModule[
 {opList = 
    {Mean ->"Mean of \[Rule]", Total ->"Sum of \[Rule]", Length ->"Count of \[Rule]",
     StandardDeviation -> "SD of \[Rule]", Min -> "Min of \[Rule]", 
     Max -> "Max of \[Rule]"
    }, t},
 Manipulate[
  t=Table["",{levelLength[[headerNumber[h1]]]+2},{levelLength[[headerNumber[h2]]]+2}];
  t[[3 ;; -1, 1]] = levelNames[[headerNumber[h1]]];
  t[[2, 3 ;; -1]] = levelNames[[headerNumber[h2]]];
  t[[1, 1]] = Control[{op, opList}];
  t[[1, 3]] = Control[{h2, header}];
  t[[2, 1]] = Control[{h1, header}];
  t[[1, 2]] = Control[{h3, header}];
  t[[3 ;; -1, 3 ;; -1]] = pivotTableData[h1, h2, h3, op] // N;
  TableView[t, Dividers -> All], 
  Initialization :> {op = Mean, h1 = "Sector", h2 = "Union", h3 = "Wage"}
  ]
 ]
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述 在此输入图像描述 在此输入图像描述

还有一些工作要做.本DynamicModule应该成为一个完全独立的功能,与头的东西更精简,但是这应该是足够的第一印象.