标签: python-polars

Polars groupby 按总和聚合，返回所有唯一值的列表，而不是实际总和

我正在尝试从极坐标数据帧进行聚合。但我没有得到我所期待的。

这是该问题的最小复制：

\nimport polars as pl\n\n# Create a DataFrame\ndf = pl.DataFrame({"category": ["A", "A", "B", "B", "B"],\n"value": [1., 2., 3., 4., 5.]})\n\n# Group by 'category' and sum 'value'\nresult = df.groupby("category").agg({"value": pl.sum})\n\n# Print the result\nprint(result)\n

Run Code Online (Sandbox Code Playgroud)\n

我得到：

\xe2\x94\x8c\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\xac\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x90\n\xe2\x94\x82 category \xe2\x94\x86 value           \xe2\x94\x82\n\xe2\x94\x82 ---      \xe2\x94\x86 ---             \xe2\x94\x82\n\xe2\x94\x82 str      \xe2\x94\x86 list[f64]       \xe2\x94\x82\n\xe2\x95\x9e\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\xaa\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\xa1\n\xe2\x94\x82 A        \xe2\x94\x86 [1.0, 2.0]      \xe2\x94\x82\n\xe2\x94\x82 B        \xe2\x94\x86 [3.0, 4.0, 5.0] \xe2\x94\x82\n\xe2\x94\x94\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\xb4\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x98\n

Run Code Online (Sandbox Code Playgroud)\n

我想得到：

\xe2\x94\x8c\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\xac\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x90\n\xe2\x94\x82 category \xe2\x94\x86 value           \xe2\x94\x82\n\xe2\x94\x82 ---      \xe2\x94\x86 ---             \xe2\x94\x82\n\xe2\x94\x82 str      \xe2\x94\x86 list[f64]       \xe2\x94\x82\n\xe2\x95\x9e\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\xaa\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\x90\xe2\x95\xa1\n\xe2\x94\x82 A        \xe2\x94\x86 …

Run Code Online (Sandbox Code Playgroud)

python group-by dataframe rust-polars python-polars

作者

2023 04-14

1
推荐指数

1
解决办法

4119
查看次数

如何对 Polar 数据框中的持续时间求和？

我有以下数据框：

import datetime\n\nimport polars as pl\n\n\ndf = pl.DataFrame(\n    {\n        "idx": [259, 123],\n        "timestamp": [\n            [\n                datetime.datetime(2023, 4, 20, 1, 45),\n                datetime.datetime(2023, 4, 20, 1, 51, 7),\n                datetime.datetime(2023, 4, 20, 2, 29, 50),\n            ],\n            [\n                datetime.datetime(2023, 4, 19, 6, 0, 1),\n                datetime.datetime(2023, 4, 19, 6, 0, 17),\n                datetime.datetime(2023, 4, 19, 6, 0, 26),\n                datetime.datetime(2023, 4, 19, 19, 53, 29),\n                datetime.datetime(2023, 4, 19, 19, 54, 4),\n                datetime.datetime(2023, 4, 19, 19, 57, 52),\n            ],\n        ],\n    }\n)\n

Run Code Online (Sandbox Code Playgroud)\n

print(df)\n# Output\nshape: (2, 2)\n\xe2\x94\x8c\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\xac\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x90\n\xe2\x94\x82 idx …

Run Code Online (Sandbox Code Playgroud)

python duration dataframe python-polars

Lau*_*ent

2023 04-23

1
推荐指数

1
解决办法

177
查看次数

是否如预期的那样，Polars 对于简单算术运算的性能比 Numpy 慢？

我进行基准测试的任务只是按元素裁剪值。我已经使用numpy和完成了此操作polars。但是，事实证明，使用numpy比使用要快得多（~5 倍）polars（如下所示）。

所以，我的问题是：

这种行为是否符合预期？
如果是这样，这是否意味着polars（尽管它针对 join/groupby 进行了高度优化），可能不太适合执行相对简单的数值向量/数组操作，例如我的示例中的裁剪？

import timeit
import numpy as np
import polars as pl

N = 10_000_000
x = np.random.normal(size=N)
y = np.random.normal(size=N)
z = y + 0.5
df = pl.DataFrame({"x": x, "y": y, "z": z})

>>> timeit.timeit(lambda: np.minimum(np.maximum(x, y), z), number=10)
0.60923

>>> timeit.timeit(lambda: df.select(pl.min(pl.max(pl.col("x"), pl.col("y")), pl.col("z"))), number=10)
3.39337

Run Code Online (Sandbox Code Playgroud)

numpy python-polars

Kep*_*ain

lucky-day

1
推荐指数

1
解决办法

225
查看次数

transforming multiple columns using loop

I am trying to transform groups of columns based on certain conditions. I am having trouble looping over groups of columns in my select statement. Simplified example (actual df has many more columns in avars and bvars):

df = pl.DataFrame(
    {'a': [0,1,2,4],
     'b': [1,0,3,5],
     'w': [4,7,5,8],
     'x': [10, 20, 25, 30],
     'y': [15,3,16,88],
     'z': [22,17,4,32]
     }
)
avars=['w','x']
bvars=['y','z']

Run Code Online (Sandbox Code Playgroud)

This works:

df.select(
    (pl.when(pl.col('a')>0)
    .then(pl.col(var)/pl.col('a')) 
    .otherwise(pl.col(var)) for var in avars),
)

Run Code Online (Sandbox Code Playgroud)

But I get an error message when I try

df.select( …

Run Code Online (Sandbox Code Playgroud)

python-polars

cra*_*igm

lucky-day

1
推荐指数

1
解决办法

99
查看次数

如何根据条件连接两个 Polars 数据框？

给定以下 3 个 Polars 数据框

journeys = pl.DataFrame({'id':[1,2,3,4,5,6],'order_id':[11,12,13,14,14,15],'order_type':['restaurant','restaurant','restaurant','restaurant','grocery','grocery']})

restaurant_orders = pl.DataFrame({'id':[11,12,13,14],'item_count':[4,7,3,5]})

grocery_orders = pl.DataFrame({'id':[14,15],'item_count':[23,21]})

Run Code Online (Sandbox Code Playgroud)

journeys：

restaurant_orders：

grocery_orders：

我想将其引入item_count数据journeys框。

journeys最简单的方法是根据列过滤数据帧order_type，对每个过滤后的数据帧执行连接，最后将它们连接在一起。

journeys是否有一种 Polars 惯用方法可以根据的值对数据帧执行条件（多态？）连接order_type。

python dataframe python-polars

Tah*_*eri

2023 10-26

1
推荐指数

1
解决办法

325
查看次数

Polars python相当于R中的一瞥和总结

我找不到一个函数可以总结极坐标数据框中的内容，就像 R 中的一瞥和摘要一样？

python python-polars

aaj*_*tak

lucky-day

0
推荐指数

1
解决办法

1076
查看次数

从 Polars 中的每个组中获取元素

如何在 Polars 的每组中按索引获取元素DataFrame？例如，如果我想获取每组的第一个和第三个元素，我可能会尝试这样的操作：

import polars as pl

df = pl.DataFrame(dict(x=[1,0,1,0,1,0], y=[1,2,3,4,5,6]))

df.groupby('x').take([0,2])
# AttributeError: 'GroupBy' object has no attribute 'take'

Run Code Online (Sandbox Code Playgroud)

但这显然是行不通的。

python-polars

drh*_*gen

lucky-day

0
推荐指数

1
解决办法

1467
查看次数

如何比较 python 极坐标中行的日期值？

我有一个数据框，出生日期为

pl.DataFrame({'idx':[1,2,3,4,5,6],
              'date_of_birth':['03/06/1990','3/06/1990','11/12/2000','01/02/2021','1/02/2021','3/06/1990']})

Run Code Online (Sandbox Code Playgroud)

在这里，我想比较每行的出生日期（格式：月/日/年），如果月份相等，例如 03 - 3、01 -1，则标记 yes。

有日期 03/06/1900、3/06/1990，它们通常是相同的。但在这里它们被视为不同的。如何弄清楚这些场景？

预期输出为：

python pandas python-polars

mya*_*cia

lucky-day

0
推荐指数

1
解决办法

2143
查看次数

python-polars 有等效的 np.where 吗？

Polars 有等效的 np.where 吗？尝试在极坐标中复制以下代码。如果该值高于某个阈值，称为 Is_Acceptable 的列为 1，如果低于该阈值，则为 0

import pandas as pd
import numpy as np 

df = pd.DataFrame({"fruit":["orange","apple","mango","kiwi"], "value":[1,0.8,0.7,1.2]})
df["Is_Acceptable?"] = np.where(df["value"].lt(0.9), 1, 0)
print(df)

Run Code Online (Sandbox Code Playgroud)

python-polars

Ros*_*oss

lucky-day

0
推荐指数

1
解决办法

1935
查看次数

从极地日期时间获取Python日期时间

如何从 Polars 日期时间对象（这是一个 Polars 表达式）获取 python 时间戳？

import polars as pl
pl_timestamp = pl.datetime(year=2020, month=6, day=5).dt.with_time_zone("UTC")
py_timestamp = ...

Run Code Online (Sandbox Code Playgroud)

python datetime python-polars

use*_*698

2023 02-01

0
推荐指数

1
解决办法

410
查看次数

标签统计

python-polars ×10

python ×6

dataframe ×3

datetime ×1

duration ×1

group-by ×1

numpy ×1

pandas ×1

rust-polars ×1

标签 统计

标签统计