将现有 pandas 数据框中的一些行复制到新的数据框中

Question

将现有 pandas 数据框中的一些行复制到新的数据框中

并且必须对以“BH”开头的“城市”列进行复制。复制的 df.index 应与原始 Eg 相同 -

              STATE            CITY
315           KA               BLR
423           WB               CCU
554           KA               BHU
557           TN               BHY

# state_df is new dataframe, df is existing
state_df = pd.DataFrame(columns=['STATE', 'CITY'])      
for index, row in df.iterrows():
    city = row['CITY']

    if(city.startswith('BH')):
        append row from df to state_df # pseudocode

Run Code Online (Sandbox Code Playgroud)

作为 pandas 和 Python 的新手，我需要伪代码方面的帮助才能以最有效的方式。

Answer 1

jez*_*ael 7

startswith与的解决方案boolean indexing：

print (df['CITY'].str.startswith('BH'))
315    False
423    False
554     True
557     True

state_df = df[df['CITY'].str.startswith('BH')]
print (state_df)
    STATE CITY
554    KA  BHU
557    TN  BHY

Run Code Online (Sandbox Code Playgroud)

如果需要仅复制某些列，请添加loc：

state_df = df.loc[df['CITY'].str.startswith('BH'), ['STATE']]
print (state_df)
    STATE
554    KA
557    TN

Run Code Online (Sandbox Code Playgroud)

时间：

#len (df) = 400k
df = pd.concat([df]*100000).reset_index(drop=True)


In [111]: %timeit (df.CITY.str.startswith('BH'))
10 loops, best of 3: 151 ms per loop

In [112]: %timeit (df.CITY.str.contains('^BH'))
1 loop, best of 3: 254 ms per loop

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，10 月前
查看次数：	26833 次
最近记录：	8 年，10 月前