熊猫:将数据框附加到另一个df

Pet*_*rov 10 python pandas

我有附加数据帧的问题.我尝试执行此代码

df_all = pd.read_csv('data.csv', error_bad_lines=False, chunksize=1000000)
urls = pd.read_excel('url_june.xlsx')
substr = urls.url.values.tolist()
df_res = pd.DataFrame()
for df in df_all:
    for i in substr:
        res = df[df['url'].str.contains(i)]
        df_res.append(res)
Run Code Online (Sandbox Code Playgroud)

当我尝试保存时,df_res我得到空的数据帧. df_all好像

ID,"url","used_at","active_seconds"
b20f9412f914ad83b6611d69dbe3b2b4,"mobiguru.ru/phones/apple/comp/32gb/apple_iphone_5s.html",2015-10-01 00:00:25,1
b20f9412f914ad83b6611d69dbe3b2b4,"mobiguru.ru/phones/apple/comp/32gb/apple_iphone_5s.html",2015-10-01 00:00:31,30
f85ce4b2f8787d48edc8612b2ccaca83,"4pda.ru/forum/index.php?showtopic=634566&view=getnewpost",2015-10-01 00:01:49,2
d3b0ef7d85dbb4dbb75e8a5950bad225,"shop.mts.ru/smartfony/mts/smartfon-smart-sprint-4g-sim-lock-white.html?utm_source=admitad&utm_medium=cpa&utm_content=300&utm_campaign=gde_cpa&uid=3",2015-10-01 00:03:19,34
078d388438ebf1d4142808f58fb66c87,"market.yandex.ru/product/12675734/spec?hid=91491&track=char",2015-10-01 00:03:48,2
d3b0ef7d85dbb4dbb75e8a5950bad225,"avito.ru/yoshkar-ola/telefony/mts",2015-10-01 00:04:21,4
d3b0ef7d85dbb4dbb75e8a5950bad225,"shoppingcart.aliexpress.com/order/confirm_order",2015-10-01 00:04:25,1
d3b0ef7d85dbb4dbb75e8a5950bad225,"shoppingcart.aliexpress.com/order/confirm_order",2015-10-01 00:04:26,9
Run Code Online (Sandbox Code Playgroud)

urls看起来像

url
shoppingcart.aliexpress.com/order/confirm_order
ozon.ru/?context=order_done&number=
lk.wildberries.ru/basket/orderconfirmed
lamoda.ru/checkout/onepage/success/quick
mvideo.ru/confirmation?_requestid=
eldorado.ru/personal/order.php?step=confirm
Run Code Online (Sandbox Code Playgroud)

当我res在循环中打印时,它不会为空.但是当我df_res在追加后尝试在循环中打印时,它返回空数据帧.我找不到我的错误.我该如何解决?

cs9*_*s95 22

Why am I getting "AttributeError: 'DataFrame' object has no attribute 'append'?

pandas >= 2.0 append has been removed, use pd.concat instead1

Starting from pandas 2.0, append has been removed from the API. It was previously deprecated in version 1.4. See the docs on Deprecations as well as this github issue that originally proposed its deprecation.

The rationale for its removal was to discourage iteratively growing DataFrames in a loop (which is what people typically use append for). This is because append makes a new copy at each stage, resulting in quadratic complexity in memory.

1. This assume you're appending one DataFrame to another. If you're appending a row to a DataFrame, the solution is slightly different - see below.


The idiomatic way to append DataFrames is to collect all your smaller DataFrames into a list, and then make one single call to pd.concat. Here's a(n oversimplified) example

df_list = []
for df in some_function_that_yields_dfs():
    df_list.append(df)

final_df = pd.concat(df_list)
Run Code Online (Sandbox Code Playgroud)

Note that if you are trying to append one row at a time rather than one DataFrame at a time, the solution is even simpler.

data = []
for a, b, c from some_function_that_yields_data():
    data.append([a, b, c])

df = pd.DataFrame(data, columns=['a', 'b', 'c'])
Run Code Online (Sandbox Code Playgroud)

More information in Creating an empty Pandas DataFrame, and then filling it?


Ami*_*ory 18

如果你看一下文档pd.DataFrame.append

将其他行附加到此帧的末尾,返回一个新对象.不在此框架中的列将添加为新列.

(强调我的).

尝试

df_res = df_res.append(res)
Run Code Online (Sandbox Code Playgroud)

顺便提一下,请注意,pandas对于通过连续连接创建DataFrame效率不高.您可以尝试这样做,而不是:

all_res = []
for df in df_all:
    for i in substr:
        res = df[df['url'].str.contains(i)]
        all_res.append(res)

df_res = pd.concat(all_res)
Run Code Online (Sandbox Code Playgroud)

首先创建所有部件的列表,然后在结束时从所有部件创建一个DataFrame.

  • +1指出使用此方法在循环中连接多个数据帧的效率低下。我不断地在代码中发现这一点,这让我发疯。 (3认同)
  • 谢谢你的解释。有时 `df_res.append(res)` 有效,但有时只有 `df_res = df_res.append(res)` 有效。但我不知道为什么会发生 (2认同)

小智 5

如果我们想基于索引追加:

df_res = pd.DataFrame(data = None, columns= df.columns)

all_res = []

d1 = df.ix[index-10:index-1,]     #it will take 10 rows before i-th index

all_res.append(d1)

df_res = pd.concat(all_res)
Run Code Online (Sandbox Code Playgroud)