添加不同长度的pandas列

Question

添加不同长度的pandas列

我在pandas中添加列时遇到问题.我有DataFrame,维度是nxk.在过程中,我需要添加维度为mx1的列,其中m = [1,n],但我不知道m.

当我尝试这样做时:

df['Name column'] = data    
# type(data) = list

Run Code Online (Sandbox Code Playgroud)

结果:

AssertionError: Length of values does not match length of index

Run Code Online (Sandbox Code Playgroud)

我可以添加不同长度的列吗？

Answer 1

The*_*Pea 50

如果您使用已接受的答案,您将丢失列名,如接受的答案示例所示,并在文档中进行了描述(重点已添加):

产生的轴将被标记为0,...,N - 1.这如果你是哪里串联串列轴线确实对象是很有用的不是有有意义的索引信息.

看起来你的列名('Name column')是有意义的.

您可以使用pandas.concat,但没有 ignore_index(默认值ignore_index是false,因此你完全可以忽略这样的说法):

import pandas

# Note these columns have 3 rows of values:
original = pandas.DataFrame({
    'Age':[10, 12, 13], 
    'Gender':['M','F','F']})

# Note this column has 4 rows of values:
additional = pandas.DataFrame({
    'Name': ['Nate A', 'Jessie A', 'Daniel H', 'John D']
})

new = pandas.concat([original, additional], axis=1) 
# Identical:
# new = pandas.concat([original, additional], ignore_index=False, axis=1) 

print(new.head())

#          Age        Gender        Name
#0          10             M      Nate A
#1          12             F    Jessie A
#2          13             F    Daniel H
#3         NaN           NaN      John D

Run Code Online (Sandbox Code Playgroud)

请注意John D没有年龄或性别.

Answer 2

EdC*_*ica 36

使用concat并传递axis=1和ignore_index=True:

In [38]:

import numpy as np
df = pd.DataFrame({'a':np.arange(5)})
df1 = pd.DataFrame({'b':np.arange(4)})
print(df1)
df
   b
0  0
1  1
2  2
3  3
Out[38]:
   a
0  0
1  1
2  2
3  3
4  4
In [39]:

pd.concat([df,df1], ignore_index=True, axis=1)
Out[39]:
   0   1
0  0   0
1  1   1
2  2   2
3  3   3
4  4 NaN

Run Code Online (Sandbox Code Playgroud)

确切地说，想深入了解“红豌豆”的答案。 (2认同)

Answer 3

小智 9

我们可以将不同大小的列表值添加到 DataFrame。

例子

a = [0,1,2,3]
b = [0,1,2,3,4,5,6,7,8,9]
c = [0,1]

Run Code Online (Sandbox Code Playgroud)

查找所有列表的长度

la,lb,lc = len(a),len(b),len(c)
# now find the max
max_len = max(la,lb,lc)

Run Code Online (Sandbox Code Playgroud)

根据确定的最大长度调整所有大小（不在此示例中

if not max_len == la:
  a.extend(['']*(max_len-la))
if not max_len == lb:
  b.extend(['']*(max_len-lb))
if not max_len == lc:
  c.extend(['']*(max_len-lc))

Run Code Online (Sandbox Code Playgroud)

现在所有列表的长度相同并创建数据框

pd.DataFrame({'A':a,'B':b,'C':c})

Run Code Online (Sandbox Code Playgroud)

最终输出是

Run Code Online (Sandbox Code Playgroud)

Answer 4

Mar*_*far 5

我有同样的问题，两个不同的数据框，没有一个公共列。我只需要将它们并排放在一个 csv 文件中。

合并：在这种情况下，“合并”不起作用；甚至向两个 dfs 添加一个临时列然后删除它。因为这种方法使两个dfs具有相同的长度。因此，它重复较短数据帧的行以匹配较长数据帧的长度。
Concat：The Red Pea的想法对我不起作用。它只是将较短的 df 附加到较长的 df（按行），同时在较短的 df 列上方留下一个空列（NaN）。
解决方案：您需要执行以下操作：

df1 = df1.reset_index()
df2 = df2.reset_index()
df = [df1, df2]
df_final = pd.concat(df, axis=1)

df_final.to_csv(filename, index=False)

Run Code Online (Sandbox Code Playgroud)

这样，您将看到dfs彼此并列（按列），每个都有自己的长度。

归档时间：	11 年，6 月前
查看次数：	50073 次
最近记录：	6 年，6 月前