相关疑难解决方法(0)

熊猫取代/字典慢

请帮助我理解为什么Python/Pandas中的"替换字典"操作很慢:

# Series has 200 rows and 1 column
# Dictionary has 11269 key-value pairs
series.replace(dictionary, inplace=True)

Run Code Online (Sandbox Code Playgroud)

字典查找应为O(1).替换列中的值应为O(1).这不是矢量化操作吗？即使它没有矢量化,迭代200行只有200次迭代,那么它怎么会变慢呢？

以下是SSCCE演示此问题:

import pandas as pd
import random

# Initialize dummy data
dictionary = {}
orig = []
for x in range(11270):
    dictionary[x] = 'Some string ' + str(x)
for x in range(200):
    orig.append(random.randint(1, 11269))
series = pd.Series(orig)

# The actual operation we care about
print('Starting...')
series.replace(dictionary, inplace=True)
print('Done.')

Run Code Online (Sandbox Code Playgroud)

在我的机器上运行该命令需要1秒以上的时间,这比执行<1000次操作的时间长1000倍.

python performance dictionary pandas

Att*_*nen

2017 02-02

6
推荐指数

1
解决办法

2183
查看次数

用字典替换pandas系列中的值

我想Series用字典替换大熊猫中的值.我正在关注@ DSM 接受的答案:

s = Series(['abc', 'abe', 'abg'])
d = {'b': 'B'}
s.replace(d)

Run Code Online (Sandbox Code Playgroud)

但这没有任何影响:

0    abc
1    abe
2    abg
dtype: object

Run Code Online (Sandbox Code Playgroud)

该文档解释了所需的dict格式DataFrames(即嵌套的dicts与顶级键对应的列名),但我看不到任何特定的Series.

python dictionary replace pandas

jog*_*oga

2017 05-23

5
推荐指数

1
解决办法

6331
查看次数

根据包含字典键替换 Pandas DataFrame 列值

这是一个示例，其中当行是字典键时分配列： /sf/answers/1417569751/

我正在寻找的是当行包含字典键时分配列的情况。

例如（基于上面的例子）：

import pandas as pd
df = pd.DataFrame({'col1': {0: 'aba', 1: 'abc', 2: 'abx'}})

#gives me a DataFrame
     col1 
0    aba #contains 'ba'
1    abc #will NOT be replaced
2    abx #contains 'bx'

dictionary = {'ba': 5, 'bx': 8}

#and I need to get:

     col1 
0    5
1    abc
2    8

Run Code Online (Sandbox Code Playgroud)

python dictionary replace dataframe pandas

bak*_*net

2022 02-07

4
推荐指数

1
解决办法

4541
查看次数

熊猫使用基于第二列的另一列方法替换NaN

我有一个带有两列的熊猫数据框，city和country。双方city并country包含缺失值。考虑以下数据帧：

temp = pd.DataFrame({"country": ["country A", "country A", "country A", "country A", "country B","country B","country B","country B", "country C", "country C", "country C", "country C"],
                     "city": ["city 1", "city 2", np.nan, "city 2", "city 3", "city 3", np.nan, "city 4", "city 5", np.nan, np.nan, "city 6"]})

Run Code Online (Sandbox Code Playgroud)

我现在想在填写NaN在S city与列模式该国在剩余的数据帧的城市，例如对于A国：城市1被提及一次; 提到城市2两次；因此，用etc 填充city索引处的列。2city 2

我已经做好了

cities = [city for city in temp["country"].value_counts().index]
modes = temp.groupby(["country"]).agg(pd.Series.mode)
dict_locations …

Run Code Online (Sandbox Code Playgroud)

python pandas

Ivo*_*Ivo

2019 09-16

3
推荐指数

1
解决办法

65
查看次数

如何使用 pandas map() 函数，而不覆盖不匹配的项目？

在我之前的问题中：如何有效地替换 pandas 中的数据框之间的项目？

我得到了一个可以使用 map() 函数的解决方案，但它会覆盖不匹配的项目。

如果我有 2 df

df = pd.DataFrame({'Ages':[20, 22, 57, 250], 'Label':[1,1,2,7]})
label_df = pd.DataFrame({'Label':[1,2,3], 'Description':['Young','Old','Very Old']})

Run Code Online (Sandbox Code Playgroud)

我想将 df 中的标签值替换为 label_df 中的描述，但如果索引之间不匹配，则保留原始值。

我得到了什么df['Label'] = df['Label'].map(label_df.set_index('Label')['Description'])

{'Ages':[20, 22, 57, 250], 'Label':['Young','Young','Old', nan]}

Run Code Online (Sandbox Code Playgroud)

想要的结果：

{'Ages':[20, 22, 57, 250], 'Label':['Young','Young','Old', 7]}

Run Code Online (Sandbox Code Playgroud)

python pandas

Mos*_*she

lucky-day

3
推荐指数

1
解决办法

4552
查看次数

在python中使用字典翻译数据帧

有以下pandas Dataframe样本:

df = pd.DataFrame([[1,2],[1,2],[3,5]])

df
    0   1
0   1   2
1   1   2
2   3   5

Run Code Online (Sandbox Code Playgroud)

以下字典:

d = {1:'foo',2:'bar',3:'tar',4:'tartar',5:'foofoo'}

Run Code Online (Sandbox Code Playgroud)

我想通过使用字典来"翻译"数据框d.输出如下:

result = pd.DataFrame([['foo','bar'],['foo','bar'],['tar','fofo']])

result
    0   1
0   foo bar
1   foo bar
2   tar fofo

Run Code Online (Sandbox Code Playgroud)

我想避免使用for循环.我试图找到的解决方案是地图或类似物...

python dictionary pandas

PeC*_*aDe

lucky-day

2
推荐指数

1
解决办法

297
查看次数

Python Pandas根据字典键将字典值分配给数据框列

不是重复的：

引用的答案映射了字典中的值，以替换包含等于该字典中的键的值的列。我的问题是关于将dict值添加到基于键值的数据框中的ANOTHER列中。

我对Python不太熟悉 dictionary所以如果这是一个简单的问题，请事先道歉。

我期待map的value一个字典一列的数据帧，其中key在字典等于在数据帧第二列

例如：

如果我的字典是：

dict = {abc:1/2/2003, def:1/5/2017, ghi:4/10/2013}

Run Code Online (Sandbox Code Playgroud)

我的数据框是：

      Member    Group      Date

 0     xyz       A         np.Nan
 1     uvw       B         np.Nan
 2     abc       A         np.Nan
 3     def       B         np.Nan
 4     ghi       B         np.Nan

Run Code Online (Sandbox Code Playgroud)

我想得到以下消息：

      Member    Group      Date

 0     xyz       A         np.Nan
 1     uvw       B         np.Nan
 2     abc       A         1/2/2003
 3     def       B         1/5/2017
 4     ghi       B         4/10/2013

Run Code Online (Sandbox Code Playgroud)

注意：dict在df中，“成员”下没有所有值。我不希望将这些值转换为np.Nan我的地图。所以我认为我必须做一个fillna(df['Member'])保留它们？

提前致谢。

python dictionary pandas

Win*_*981

2018 08-17

2
推荐指数

2
解决办法

2777
查看次数

如何在数据框中用字符串值替换 int 值

我目前有一个包含很多分类变量的 csv 文件。数据最初来自spss，并对数据进行进一步的聚类分析，我需要变量的名称而不是数字。因此，我将用字符串替换 int 值，例如在下面的示例中，1 代表男性，而 2 代表女性

df[(df['gender']==1)]['gender'] = 'male'

Run Code Online (Sandbox Code Playgroud)

但是我知道它无法工作，因为该列最初包含 int 值，因此不可能用字符串值替换，所以首先我尝试将列转换为字符串，例如使用以下代码，然后将 1 替换为male

df['gender'] = df['gender'].astype(str)

Run Code Online (Sandbox Code Playgroud)

或者

df['gender'].apply(str)

Run Code Online (Sandbox Code Playgroud)

但是当我之后运行以下代码时

df[(df['gender']=='1')]['gender'] = 'male'

Run Code Online (Sandbox Code Playgroud)

我收到以下错误

TypeError: invalid type comparison

Run Code Online (Sandbox Code Playgroud)

所以我不知道如何处理这个问题:(

python pandas

Joy*_*Joy

2018 07-19

1
推荐指数

1
解决办法

8369
查看次数

Python Pandas - 使用 dict 替换 NaN 值

使用以下 DF：

     A  B
0    a  1
1    b  2
2  NaN  1
3  NaN  2

Run Code Online (Sandbox Code Playgroud)

我想A根据的数字表示替换 NaN 值B，以获得：

Run Code Online (Sandbox Code Playgroud)

我建立了一个 B/A 值字典： {1 : 'a', 2: 'b'}

如何将更改应用于 NaN 值？

python pandas

Shl*_*rtz

lucky-day

1
推荐指数

1
解决办法

58
查看次数

将整数映射到 Pandas 列中的特定字符串

pandas 中的数据框有两列“ text ”、“ Condition ”。在“条件”列中，它包含每行“文本”的数值。值为 1、-1、0。我想将这些整数值转换为文本标签，例如1 表示正数，-1 表示负数，0 表示中性。我怎样才能做到这一点？

python pandas

Sau*_*aud

2019 02-08

0
推荐指数

1
解决办法

5227
查看次数

标签统计

pandas ×10

python ×10

dictionary ×5

replace ×2

dataframe ×1

performance ×1

标签 统计

标签统计