Tyl*_*rNG 4 python numpy dataframe pandas
是否有可能将df转换为如下所示的矩阵?鉴于df
:
Name Value
x 5
x 2
x 3
x 3
y 3
y 2
z 4
Run Code Online (Sandbox Code Playgroud)
矩阵将是:
Name 1 2 3 4 5
x 4 4 3 1 1
y 2 2 1 0 0
z 1 1 1 1 0
Run Code Online (Sandbox Code Playgroud)
这是它背后的逻辑:
Name 1 2 3 4 5 (5 columns since 5 is the max in Value)
--------------------------------------------------------------------
x 4 (since x has 4 values >= 1) 4 (since x has 4 values >= 2) 3 (since x has 3 values >= 3) 1 (since x has 1 values >= 4) 1 (since 1 x >= 5)
y 2 (since y has 2 values >= 1) 2 (since y has 2 values >= 2) 1 (since y has 1 values >= 3) 0 (since no more y >= 5) 0 (since no more y >= 5)
z 1 (since z has 1 values >= 1) 1 (since z has 1 values >= 2) 1 (since z has 1 values >= 3) 1 (since z has 1 values >= 4) 0 (since no more z >= 5)
Run Code Online (Sandbox Code Playgroud)
如果这是有道理的,请告诉我.
我知道我必须使用sort,group和count但是无法弄清楚如何设置矩阵.
谢谢!!!
可能是最快的解决方案,使用numpy
广播 -
i = np.arange(1, df.Value.max() + 1)
j = df.Value.values[:, None] >= i
df = pd.DataFrame(j, columns=i, index=df.Name).sum(level=0)
1 2 3 4 5
Name
x 4.0 4.0 3.0 1.0 1.0
y 2.0 2.0 1.0 0.0 0.0
z 1.0 1.0 1.0 1.0 0.0
Run Code Online (Sandbox Code Playgroud)
警告:为了换取性能,这有点像一种记忆饥渴的方法.对于大数据,可能会导致内存爆裂,因此请慎重使用.
细节
创建一系列值,从- 1
到df.Value.max()
i = np.arange(1, df.Value.max() + 1)
i
array([1, 2, 3, 4, 5])
Run Code Online (Sandbox Code Playgroud)
与执行广播比较df.Values
和i
-
j = df.Value.values[:, None] >= i
j
array([[ True, True, True, True, True],
[ True, True, False, False, False],
[ True, True, True, False, False],
[ True, True, True, False, False],
[ True, True, True, False, False],
[ True, True, False, False, False],
[ True, True, True, True, False]], dtype=bool)
Run Code Online (Sandbox Code Playgroud)
将其加载到数据框中,然后执行分组求和df.Name
以获得最终结果.
k = pd.DataFrame(j, columns=i, index=df.Name)
k
1 2 3 4 5
Name
x True True True True True
x True True False False False
x True True True False False
x True True True False False
y True True True False False
y True True False False False
z True True True True False
Run Code Online (Sandbox Code Playgroud)
k.sum(level=0)
1 2 3 4 5
Name
x 4.0 4.0 3.0 1.0 1.0
y 2.0 2.0 1.0 0.0 0.0
z 1.0 1.0 1.0 1.0 0.0
Run Code Online (Sandbox Code Playgroud)
如果您需要将结果转换为整数,请致电.astype(int)
-
k.sum(level=0).astype(int)
1 2 3 4 5
Name
x 4 4 3 1 1
y 2 2 1 0 0
z 1 1 1 1 0
Run Code Online (Sandbox Code Playgroud)