小编moz*_*way的帖子

How to compare two columns value in pandas

I Have a dataframe which has some unique IDs in two of the columns.for e.g

S.no. Column1 Column2
    1  00001x  00002x
    2  00003j  00005k
    3  00002x  00001x
    4  00004d  00008e

Run Code Online (Sandbox Code Playgroud)

Value can be anything in the string format I want to compare the two column in such a way that either of s.no 1 or 3 data remains. as these id contains the same information. only its order is different.

Basically if for one row value in a column 1 …

python comparison string-comparison dataframe pandas

Ank*_*kit

2021 09-15

1
推荐指数

1
解决办法

221
查看次数

删除多列上的重复项，无论顺序如何 (a/b == b/a)

有没有办法在不考虑顺序的情况下删除 pandas 中重复对的行？

删除前的数据框 --> 想要删除重复对（黄色）

删除重复后

示例数据：

df = pd.DataFrame({'a': [1,2,1,1,2,2],
                   'b': [2,1,3,4,3,4]
                  })

Run Code Online (Sandbox Code Playgroud)

python duplicates pandas

puh*_*huk

2022 01-07

1
推荐指数

1
解决办法

1340
查看次数

数据框的复杂掩码

我有一个数据框，其中一列包含时间序列。数据如下图所示

我想创建一个每次数据等于或低于 -0.20 时为 TRUE 的掩码。在达到 -0.20 且为负值之前，它也应该为 TRUE 。当负数达到 -0.20后也应该如此。此版本的图表

是我手动尝试显示（以红色）掩码为 TRUE 的值。我开始创建掩码，但只能在数据小于 -0.20 时使其等于 TRUE mask = (df['data'] < -0.2)。我不能做得更好，有人知道如何实现我的目标吗？

python mask conditional-statements pandas

ric*_*ric

2022 01-25

1
推荐指数

1
解决办法

327
查看次数

访问 Match.group() 的成本有多高？

尝试优化一些重用匹配组的代码，我想知道访问是否Match.group()昂贵。我试图挖掘re.py的源代码，但代码有点神秘。

一些测试似乎表明将的输出存储在变量中可能更好Match.group()，但我想了解Match.group()调用时到底发生了什么，以及是否有另一种内部方法可以直接访问组的内容。

一些示例代码来说明潜在用途：

import re\n\nm = re.search(\'X+\', f\'__{"X"*10000}__\')\n\n# do something\n# m.group()\n\n# do something else\n# m.group()\n

Run Code Online (Sandbox Code Playgroud)\n

时间安排

直接访问：

%%timeit\nlen(m.group())\n220 ns \xc2\xb1 1.31 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 1000000 loops each)\n

Run Code Online (Sandbox Code Playgroud)\n

中间变量：

X = m.group()\n

Run Code Online (Sandbox Code Playgroud)\n

%%timeit\nlen(X)\n# 51 ns \xc2\xb1 0.172 ns per loop (mean \xc2\xb1 std. dev. of 7 runs, 10000000 loops each)\n

Run Code Online (Sandbox Code Playgroud)\n

参考文献：
\n当前的 re.py 代码 …

python regex optimization

moz*_*way

2022 05-31

0
推荐指数

1
解决办法

378
查看次数

pandas：将列名中的全部或前 n 个单词大写

我试图找到如何将列标题的第一个和第二个单词大写。

例子：

check match==>Check Match

date found==>Date Found

谢谢

python string capitalize pandas

作者

2022 04-15

0
推荐指数

1
解决办法

2482
查看次数

结构化 numpy 数组未就地修改

我有一个结构化的 numpy 数组，我试图就地修改它，但新值没有反映出来。

import numpy as np

dt = {'names':['A', 'B', 'C'],
        'formats': [np.int64, np.int64, np.dtype('U8')]}
arr = np.empty(0, dtype=dt)

arr = np.append(arr, np.array([(1, 100, 'ab')], dtype = dt))
arr = np.append(arr, np.array([(2, 800, 'ax')], dtype = dt))
arr = np.append(arr, np.array([(3, 700, 'asb')], dtype = dt))
arr = np.append(arr, np.array([(4, 600, 'gdf')], dtype = dt))
arr = np.append(arr, np.array([(5, 500, 'hfg')], dtype = dt))

print(arr)

arr[arr['A'] == 1]['B'] = 555

print(arr)

Run Code Online (Sandbox Code Playgroud)

是否可以更改结构化数组中的值？解决方法是什么？

请不要建议 Pandas 或其他基于库的解决方案，因为我只允许在工作中使用 numpy。

python arrays numpy

arj*_*iva

2022 10-12

0
推荐指数

1
解决办法

56
查看次数

熊猫根据组内的排名创建一个分类列

我有一个像这样的数据框：

DURATION   CLUSTER   COEFF
3          0         0.34
3          1        -0.005
3          2         1
3          3         0.33 
4          0        -0.02
4          1        -0.28
4          2         0.22
4          3         0.48
5          0         0.65
5          1        -0.26
5          2         0.1
5          3         0.15

Run Code Online (Sandbox Code Playgroud)

我想根据每个“DURATION”的“COEFF”系数创建一个 RESULT 分类列。具有最大“COEFF”值的将是“第一”，依此类推。

期望的输出如下：

DURATION   CLUSTER   COEFF  RESULT
3          0         0.34   Second
3          1        -0.005  Fourth
3          2         1      First
3          3         0.33   Third
4          0        -0.02   Third
4          1        -0.28   Fourth
4          2         0.22   Second
4          3         0.48   First …

Run Code Online (Sandbox Code Playgroud)

python dataframe pandas

Sal*_*lih

2023 02-02

0
推荐指数

1
解决办法

67
查看次数

在 python 极坐标上，对数据帧上的一组列应用百分位数排名

df = pl.DataFrame(
    {   
        "era": ["01", "01", "02", "02", "03", "03"],
        "pred1": [1, 2, 3, 4, 5,6],
        "pred2": [2,4,5,6,7,8],
        "pred3": [3,5,6,8,9,1],
        "something_else": [5,4,3,67,5,4],
    }
)
pred_cols = ["pred1", "pred2", "pred3"]
ERA_COL = "era"

Run Code Online (Sandbox Code Playgroud)

我正在尝试做一个相当于北极熊排名百分位的事情。Polars 的rank功能缺乏pctPandas 所具有的旗帜。

我在这里查看了另一个问题：how to replacement pandas df.rank(axis=1) with Polars

但问题的结果（并将其应用到我的代码中）有些不对劲。计算 Pandas 中的排名百分比，给我一个浮点数，Polars 提供的示例给我一个数组，而不是浮点数，因此示例中计算的内容有所不同。

举个例子，Pandas 代码是这样的：

df[list(pred_cols)] = df.groupby(ERA_COL, group_keys=False).apply(
    lambda d: d[list(pred_cols)].rank(pct=True)
)

Run Code Online (Sandbox Code Playgroud)

python rank pandas python-polars

jbs*_*ssm

2022 11-30

0
推荐指数

1
解决办法

1432
查看次数

如何在Python中增加字符串值

如何在Python中增加字符串的值？

我有以下字符串：str = 'tt0000002' 并且我想增加该字符串以'tt0000003', 'tt0000004','tt0000005' (...) to 'tt0010000'使用循环。

我怎样才能做到这一点？

python string for-loop while-loop

Mar*_*icz

2021 10-06

-2
推荐指数

1
解决办法

1258
查看次数

标签统计

python ×9

pandas ×6

dataframe ×2

string ×2

arrays ×1

capitalize ×1

comparison ×1

conditional-statements ×1

duplicates ×1

for-loop ×1

mask ×1

numpy ×1

optimization ×1

python-polars ×1

rank ×1

regex ×1

string-comparison ×1

while-loop ×1

时间安排

标签 统计

小编moz_way的帖子

标签统计