重塑大熊猫数据框

Question

重塑大熊猫数据框

Mor*_*itz 10 python reshape dataframe pandas lreshape

假设像这样的数据帧:

df = pd.DataFrame([[1,2,3,4],[5,6,7,8],[9,10,11,12]], columns = ['A', 'B', 'A1', 'B1'])

Run Code Online (Sandbox Code Playgroud)

我想有一个看起来像这样的数据框:

什么行不通:

new_rows = int(df.shape[1]/2) * df.shape[0]
new_cols = 2
df.values.reshape(new_rows, new_cols, order='F')

Run Code Online (Sandbox Code Playgroud)

当然,我可以循环数据并创建一个新的列表列表,但必须有一个更好的方法.有任何想法吗？

Answer 1

jez*_*ael 8

您可以使用lreshape列id numpy.repeat:

a = [col for col in df.columns if 'A' in col]
b = [col for col in df.columns if 'B' in col]
df1 = pd.lreshape(df, {'A' : a, 'B' : b})

df1['id'] = np.repeat(np.arange(len(df.columns) // 2), len (df.index))  + 1
print (df1)
    A   B  id
0   1   2   1
1   5   6   1
2   9  10   1
3   3   4   2
4   7   8   2
5  11  12   2

Run Code Online (Sandbox Code Playgroud)

编辑:

lreshape目前没有文档,但可能会被删除(pd.wide_to_long也是如此).

可能的解决方案是将所有3个函数合并为一个 - 也许melt,但现在它没有实现.也许在一些新版本的熊猫中.然后我的答案会更新.

Answer 2

Ted*_*rou 5

该pd.wide_to_long函数几乎完全是针对这种情况而构建的，在这种情况下，您有许多相同的变量前缀以不同的数字后缀结尾。唯一的区别是您的第一组变量没有后缀，因此您需要首先重命名列。

唯一的问题pd.wide_to_long是i，与不同，它必须具有一个标识变量melt。reset_index用于创建一个唯一标识的列，稍后将其删除。我认为这种情况将来可能会得到纠正。

df1 = df.rename(columns={'A':'A1', 'B':'B1', 'A1':'A2', 'B1':'B2'}).reset_index()
pd.wide_to_long(df1, stubnames=['A', 'B'], i='index', j='id')\
  .reset_index()[['A', 'B', 'id']]

    A   B id
0   1   2  1
1   5   6  1
2   9  10  1
3   3   4  2
4   7   8  2
5  11  12  2

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，9 月前
查看次数：	29225 次
最近记录：	8 年，4 月前