import pandas as pd
import numpy as np
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry'], 'Age':[20, 21, 19, 18]}
df = pd.DataFrame(data)
Run Code Online (Sandbox Code Playgroud)
假设我有一个看起来像这样的数据框。我想弄清楚如何检查 Name 列的值“Tom”,如果我第一次找到它,我用值“FirstTom”替换它,第二次出现时我用值“SecondTom”替换它. 你如何做到这一点?我之前使用过 replace 方法,但仅用于用单个值替换所有 Toms。我不想在值的末尾添加 1,而是将字符串完全更改为其他内容。
编辑:
如果 df 看起来更像下面这样,我们将如何检查第一列和第二列中的 Tom,然后将第一个实例替换为 FirstTom,将第二个实例替换为 SecondTom
data = {'Name':['Tom', 'Jerry', 'Jack', 'Terry'], 'OtherName':[Tom, John, Bob,Steve]}
ank*_*_91 12
只需添加到现有的解决方案中,您就可以使用它inflect来创建动态字典
import inflect
p = inflect.engine()
df['Name'] += df.groupby('Name').cumcount().add(1).map(p.ordinal).radd('_')
print(df)
Run Code Online (Sandbox Code Playgroud)
Name Age
0 Tom_1st 20
1 Tom_2nd 21
2 Jack_1st 19
3 Terry_1st 18
Run Code Online (Sandbox Code Playgroud)
我们可以做的 cumcount
df.Name=df.Name+df.groupby('Name').cumcount().astype(str)
df
Name Age
0 Tom0 20
1 Tom1 21
2 Jack0 19
3 Terry0 18
Run Code Online (Sandbox Code Playgroud)
更新
suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n if n<20 else n%10,"th"))
g=df.groupby('Name')
df.Name=df.Name.radd(g.cumcount().add(1).map(suf).mask(g.Name.transform('count')==1,''))
df
Name Age
0 1stTom 20
1 2ndTom 21
2 Jack 19
3 Terry 18
Run Code Online (Sandbox Code Playgroud)
列的更新 2
suf = lambda n: "%d%s"%(n,{1:"st",2:"nd",3:"rd"}.get(n if n<20 else n%10,"th"))
g=s.groupby([s.index.get_level_values(0),s])
s=s.radd(g.cumcount().add(1).map(suf).mask(g.transform('count')==1,''))
s=s.unstack()
Name OtherName
0 1stTom 2ndTom
1 Jerry John
2 Jack Bob
3 Terry Steve
Run Code Online (Sandbox Code Playgroud)
编辑:对于每行重复的计数,请使用:
df = pd.DataFrame(data = {'Name':['Tom', 'Jerry', 'Jack', 'Terry'],
'OtherName':['Tom', 'John', 'Bob','Steve'],
'Age':[20, 21, 19, 18]})
print (df)
Name OtherName Age
0 Tom Tom 20
1 Jerry John 21
2 Jack Bob 19
3 Terry Steve 18
import inflect
p = inflect.engine()
#map by function for dynamic counter
f = lambda i: p.number_to_words(p.ordinal(i))
#columns filled by names
cols = ['Name','OtherName']
#reshaped to MultiIndex Series
s = df[cols].stack()
#counter per groups
count = s.groupby([s.index.get_level_values(0),s]).cumcount().add(1)
#mask for filter duplicates
mask = s.reset_index().duplicated(['level_0',0], keep=False).values
#filter only duplicates and map, reshape back and add to original data
df[cols] = count[mask].map(f).unstack().add(df[cols], fill_value='')
print (df)
Name OtherName Age
0 firstTom secondTom 20
1 Jerry John 21
2 Jack Bob 19
3 Terry Steve 18
Run Code Online (Sandbox Code Playgroud)
使用GroupBy.cumcountwith Series.map,但仅用于重复值Series.duplicated:
data = {'Name':['Tom', 'Tom', 'Jack', 'Terry'], 'Age':[20, 21, 19, 18]}
df = pd.DataFrame(data)
nth = {
0: "First",
1: "Second",
2: "Third",
3: "Fourth"
}
mask = df.Name.duplicated(keep=False)
df.loc[mask, 'Name'] = df[mask].groupby('Name').cumcount().map(nth) + df.loc[mask, 'Name']
print (df)
Name Age
0 FirstTom 20
1 SecondTom 21
2 Jack 19
3 Terry 18
Run Code Online (Sandbox Code Playgroud)
动态字典应该是这样的:
import inflect
p = inflect.engine()
mask = df.Name.duplicated(keep=False)
f = lambda i: p.number_to_words(p.ordinal(i))
df.loc[mask, 'Name'] = df[mask].groupby('Name').cumcount().add(1).map(f) + df.loc[mask, 'Name']
print (df)
Name Age
0 firstTom 20
1 secondTom 21
2 Jack 19
3 Terry 18
Run Code Online (Sandbox Code Playgroud)
transformnth = ['First', 'Second', 'Third', 'Fourth']
def prefix(d):
n = len(d)
if n > 1:
return d.radd([nth[i] for i in range(n)])
else:
return d
df.assign(Name=df.groupby('Name').Name.transform(prefix))
Name Age
0 FirstTom 20
1 SecondTom 21
2 Jack 19
3 Terry 18
4 FirstSteve 17
5 SecondSteve 16
6 ThirdSteve 15
?
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2809 次 |
| 最近记录: |