将具有单个分类值的列添加到 pandas 数据框

Question

将具有单个分类值的列添加到 pandas 数据框

Dus*_*yte 6 python categories dataframe pandas

我有一个pandas.DataFrame df并且想添加一个col具有一个值的新列"hello"。我希望此列的数据类型为category单一类别"hello"。我可以做以下事情。

df["col"] = "hello"
df["col"] = df["col"].astype("category")

Run Code Online (Sandbox Code Playgroud)

我真的需要写df["col"] 三遍才能达到这个目的吗？
在第一行之后，我担心df在新列转换为分类之前中间数据框可能会占用大量空间。（数据帧相当大，有数百万行，并且该值"hello"实际上是一个更长的字符串。）

是否有任何其他直接的、“简短而敏捷”的方法来实现这一目标，同时避免上述问题？

另一种解决方案是

df["col"] = pd.Categorical(itertools.repeat("hello", len(df)))

Run Code Online (Sandbox Code Playgroud)

但它需要itertools使用len(df)，而且我不确定内存的使用情况。

Answer 1

Hen*_*ker 4

我们可以显式地构建正确大小和类型的系列，而不是通过__setitem__然后转换隐式地这样做：

df['col'] = pd.Series('hello', index=df.index, dtype='category')

Run Code Online (Sandbox Code Playgroud)

示例程序：

import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3]})

df['col'] = pd.Series('hello', index=df.index, dtype='category')

print(df)
print(df.dtypes)
print(df['col'].cat.categories)

Run Code Online (Sandbox Code Playgroud)

   a    col
0  1  hello
1  2  hello
2  3  hello

a         int64
col    category
dtype: object

Index(['hello'], dtype='object')

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，9 月前
查看次数：	2078 次
最近记录：	3 年，4 月前