Pandas:如果A列中的行包含"x",则将"y"写入B列中的行

Win*_*ags 19 python pandas

因为pandas,我正在寻找一种方法,根据A列中相应行的子串,将条件值写入B列中的每一行.

因此,如果单元格A中包含"BULL",写"Long"B.或者,如果细胞中A含有"BEAR",写"Short"B.

期望的输出:

A                  B
"BULL APPLE X5"    "Long"
"BEAR APPLE X5"    "Short"
"BULL APPLE X5"    "Long"
Run Code Online (Sandbox Code Playgroud)

B最初是空的: df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']],columns=['A','B'])

Pad*_*ham 17

在您错误地创建Dataframe时,您的代码会出错,只需创建一个列,A然后B根据以下内容添加A:

import pandas as pd
df = pd.DataFrame(["BULL","BEAR","BULL"], columns=['A'])
df["B"] = ["Long" if ele  == "BULL" else "Short" for ele in df["A"]]

print(df)

    A      B
0  BULL   Long
1  BEAR  Short
2  BULL   Long
Run Code Online (Sandbox Code Playgroud)

或者在创建数据帧之前使用数据逻辑:

import pandas as pd
data = ["BULL","BEAR","BULL"]
data2 = ["Long" if ele  == "BULL" else "Short" for ele in data]
df = pd.DataFrame(list(zip(data, data2)), columns=['A','B'])

print(df)
      A      B
 0  BULL   Long
 1  BEAR  Short
 2  BULL   Long
Run Code Online (Sandbox Code Playgroud)

为了您的编辑:

df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']], columns=['A','B'])

df["B"] = df["A"].map(lambda x: "Long" if "BULL" in x else "Short" if "BEAR" in x else "")

print(df)

            A      B
0  BULL APPLE X5   Long
1  BEAR APPLE X5  Short
2  BULL APPLE X5   Long
Run Code Online (Sandbox Code Playgroud)

或者只需在以下后面添加列:

df = pd.DataFrame(['BULL APPLE X5','BEAR APPLE X5','BLL APPLE X5'], columns=['A'])

df["B"] = df["A"].map(lambda x: "Long" if "BULL" in x else "Short" if "BEAR" in x else "")

print(df)
Run Code Online (Sandbox Code Playgroud)

或使用包含:

df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']], columns=['A','B'])


df["B"][df['A'].str.contains("BULL")] = "Long"
df["B"][df['A'].str.contains("BEAR")] = "Short"

print(df)
0  BULL APPLE X5   Long
1  BEAR APPLE X5  Short
2  BULL APPLE X5   Long
Run Code Online (Sandbox Code Playgroud)


Ana*_*mar 5

另外,为了填充df['B']你可以尝试以下方法 -

def applyFunc(s):
    if s == 'BULL':
        return 'Long'
    elif s == 'BEAR':
        return 'Short'
    return ''

df['B'] = df['A'].apply(applyFunc)
df
>>
       A      B
0  BULL   Long
1  BEAR  Short
2  BULL   Long
Run Code Online (Sandbox Code Playgroud)

apply函数的作用是,对于每个行的值df['A'],它调用applyFunc函数将参数作为该行的值,并将返回的值放入同一行中df['B'],在场景后面真正发生的事情有点不同虽然,该值不是直接放入df['B'],而是Series创建一个新的,最后,新系列被分配给df['B'].


unu*_*tbu 5

您可以使用str.extract搜索正则表达式模式BULL|BEAR,然后使用或Series.map替换这些字符串:LongShort

In [50]: df = pd.DataFrame([['BULL APPLE X5',''],['BEAR APPLE X5',''],['BULL APPLE X5','']],columns=['A','B'])

In [51]: df['B'] = df['A'].str.extract(r'(BULL|BEAR)').map({'BULL':'Long', 'BEAR':'Short'})

In [55]: df
Out[55]: 
               A      B
0  BULL APPLE X5   Long
1  BEAR APPLE X5  Short
2  BULL APPLE X5   Long
Run Code Online (Sandbox Code Playgroud)

然而,与之str.extract相比,形成中间系列的速度相当慢df['A'].map(lambda x:...).使用IPython %timeit来计算基准,

In [5]: df = pd.concat([df]*10000)

In [6]: %timeit df['A'].str.extract(r'(BULL|BEAR)').map({'BULL':'Long', 'BEAR':'Short'})
10 loops, best of 3: 39.7 ms per loop

In [7]: %timeit df["A"].map(lambda x: "Long" if "BULL" in x else "Short" if "BEAR" in x else "")
100 loops, best of 3: 4.98 ms per loop
Run Code Online (Sandbox Code Playgroud)

大部分时间花在str.extract:

In [8]: %timeit df['A'].str.extract(r'(BULL|BEAR)')
10 loops, best of 3: 37.1 ms per loop
Run Code Online (Sandbox Code Playgroud)

虽然呼叫Series.map相对较快:

In [9]: x = df['A'].str.extract(r'(BULL|BEAR)')

In [10]: %timeit x.map({'BULL':'Long', 'BEAR':'Short'})
1000 loops, best of 3: 1.82 ms per loop
Run Code Online (Sandbox Code Playgroud)