小编Erf*_*fan的帖子

在 GitHub Actions 上的构建阶段安装私有存储库

我正在使用 GitHub Actions 部署到 Azure。在这个项目中，我使用我们自己的私有存储库，该存储库托管在 GitHub 上。这些存储库将在构建期间安装，它们的链接存储在中requirements.txt，例如：

git+ssh://git@github.com/org-name/package-name.git

Run Code Online (Sandbox Code Playgroud)

在本地，安装需求没有问题，因为我可以通过 SSH 访问这些私有存储库。但是我如何在 GitHub 操作中构建期间访问这些内容。

我收到错误：

Collecting git+ssh://****@github.com/org-name/package-name.git (from -r requirements.txt (line 1))
  Cloning ssh://****@github.com/org-nam/package-name.git to /tmp/pip-req-build-9nud9608
ERROR: Command errored out with exit status 128: git clone -q 'ssh://****@github.com/org-name/package-name.git' /tmp/pip-req-build-9nud9608 Check the logs for full command output.
Error: Process completed with exit code 1.

Run Code Online (Sandbox Code Playgroud)

这是有道理的，因为它是一个私人存储库。

github continuous-deployment github-actions

Erf*_*fan

2021 08-24

15
推荐指数

2
解决办法

6355
查看次数

列中除某些词外的标题词

除了列表中的单词，我如何命名所有单词，保留？

keep = ['for', 'any', 'a', 'vs']
df.col
 ``         
0    1. The start for one
1    2. Today's world any
2    3. Today's world vs. yesterday.

Run Code Online (Sandbox Code Playgroud)

预期输出：

     number   title
0     1       The Start for One
1     2       Today's World any
2     3       Today's World vs. Yesterday.

Run Code Online (Sandbox Code Playgroud)

我试过

df['col'] = df.col.str.title().mask(~clean['col'].isin(keep))

Run Code Online (Sandbox Code Playgroud)

python python-3.x pandas

asd*_*asd

2021 02-24

15
推荐指数

2
解决办法

562
查看次数

Python Pandas - 重塑数据框

给定以下数据框：

pd.DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[6,7,8]})

   A   B   C
0  1   4   6
1  2   5   7
2  3   6   8
3  11  14  16
4  12  15  17
5  13  16  18

Run Code Online (Sandbox Code Playgroud)

我想重塑它，使其看起来像这样：

   A   B   C   A_1   B_1   C_1   A_2   B_2   C_2
0  1   4   6     2     5     7     3     6     8
1  11  14  16    12    15    17    13    16    18

Run Code Online (Sandbox Code Playgroud)

所以每 3 行被分组为 1 行

我怎样才能用熊猫实现这一目标？

python pandas

Shl*_*rtz

2020 06-07

10
推荐指数

2
解决办法

224
查看次数

枚举具有相同前缀的列

假设我们有以下简化数据：

df = pd.DataFrame({'A':list('abcd'),
                   'B':list('efgh'),
                   'Data_mean':[1,2,3,4],
                   'Data_std':[5,6,7,8],
                   'Data_corr':[9,10,11,12],
                   'Text_one':['foo', 'bar', 'foobar', 'barfoo'],
                   'Text_two':['bar', 'foo', 'barfoo', 'foobar'],
                   'Text_three':['bar', 'bar', 'barbar', 'foofoo']})

   A  B  Data_mean  Data_std  Data_corr Text_one Text_two Text_three
0  a  e          1         5          9      foo      bar        bar
1  b  f          2         6         10      bar      foo        bar
2  c  g          3         7         11   foobar   barfoo     barbar
3  d  h          4         8         12   barfoo   foobar     foofoo

Run Code Online (Sandbox Code Playgroud)

我想枚举具有相同前缀的列。在这种情况下，前缀为Data, Text。因此，预期输出为：

   A  B  Data_mean1  Data_std2  Data_corr3 Text_one1 Text_two2 Text_three3
0  a …

Run Code Online (Sandbox Code Playgroud)

python dataframe pandas

Erf*_*fan

2019 07-02

8
推荐指数

1
解决办法

95
查看次数

在其他列为NaN的情况下，填写相同数量的字符

我有以下虚拟数据框：

df = pd.DataFrame({'Col1':['a,b,c,d', 'e,f,g,h', 'i,j,k,l,m'],
                   'Col2':['aa~bb~cc~dd', np.NaN, 'ii~jj~kk~ll~mm']})

        Col1            Col2
0    a,b,c,d     aa~bb~cc~dd
1    e,f,g,h             NaN
2  i,j,k,l,m  ii~jj~kk~ll~mm

Run Code Online (Sandbox Code Playgroud)

真实数据集具有形状500000, 90。

我需要将这些值取消嵌套到行中，并且explode为此使用了新方法，该方法工作正常。

问题是NaN，这些将导致后面的长度不相等explode，因此我需要填充与填充值相同数量的定界符。在这种情况下，~~~由于第1行具有三个逗号。

预期产量

        Col1            Col2
0    a,b,c,d     aa~bb~cc~dd
1    e,f,g,h             ~~~
2  i,j,k,l,m  ii~jj~kk~ll~mm

Run Code Online (Sandbox Code Playgroud)

尝试1：

df['Col2'].fillna(df['Col1'].str.count(',')*'~')

Run Code Online (Sandbox Code Playgroud)

尝试2：

np.where(df['Col2'].isna(), df['Col1'].str.count(',')*'~', df['Col2'])

Run Code Online (Sandbox Code Playgroud)

这可行，但是我觉得有一个更简单的方法：

characters = df['Col1'].str.replace('\w', '').str.replace(',', '~')
df['Col2'] = df['Col2'].fillna(characters)

print(df)

        Col1            Col2
0    a,b,c,d     aa~bb~cc~dd
1    e,f,g,h             ~~~
2  i,j,k,l,m  ii~jj~kk~ll~mm

d1 = df.assign(Col1=df['Col1'].str.split(',')).explode('Col1')[['Col1']]
d2 = …

Run Code Online (Sandbox Code Playgroud)

python explode pandas unnest

Erf*_*fan

2019 09-03

8
推荐指数

1
解决办法

80
查看次数

升级 OpenSSL 后 ODBC 找不到正确的 OpenSSL 版本

更新：如果您遇到同样的问题，这里正在讨论问题的根源

使用自制软件升级到Python3.10后，我的OpenSSL也升级到版本3。

现在我无法再连接到 SQL Server，因为 ODBC 需要 OpenSSL 1.1 或 1.0。所以当我跑步时：

isql -v -k "<connection string"\n

Run Code Online (Sandbox Code Playgroud)\n

我收到以下错误：

[08001][Microsoft][ODBC Driver 17 for SQL Server]SSL Provider: [OpenSSL library could not be loaded, make sure OpenSSL 1.0 or 1.1 is installed]\n[08001][Microsoft][ODBC Driver 17 for SQL Server]Client unable to establish connection\n

Run Code Online (Sandbox Code Playgroud)\n

但是当我查看时，/usr/local/etc/我发现它openssl@1.1已安装：

我怎么解决这个问题？对这个还真不熟悉。所以ODBC需要找到正确的OpenSSL版本，即1.1。

我试过：

ln -s /usr/local/Cellar/openssl@1.1/1.1.1g /usr/local/opt/openssl\n

Run Code Online (Sandbox Code Playgroud)\n

另外，当我运行openssl命令时，它会找到正确的版本：

\xe2\x9e\x9c  ~ openssl\nOpenSSL> version\nOpenSSL …

Run Code Online (Sandbox Code Playgroud)

sql-server odbc openssl

Erf*_*fan

2021 10-08

8
推荐指数

1
解决办法

5636
查看次数

如何计算数据帧中值可被3或5整除的行数？

我有一个包含两列的数据框：

 ones   zeros
0   6   13
1   8   7
2   11  7
3   8   5
4   11  5
5   10  6
6   11  6
7   7   4
8   9   4
9   4   6
10  7   5
11  6   7
12  9   10
13  14  3
14  7   7
15  7   7
16  9   7
17  7   10
18  9   5
19  12  7
20  4   8
21  6   4
22  11  5
23  9   7
24  3   10
25  7 …

Run Code Online (Sandbox Code Playgroud)

python data-analysis dataframe pandas

Kir*_*nty

2019 09-24

6
推荐指数

1
解决办法

70
查看次数

在 if else 中定义类型时，Mypy 引发“无法分配多个类型”

在我们的测试中，我们测试了 numpy 的多个版本。旧版本没有np.random.Generator我们想要在打字中定义的某些类（），因此我选择根据检查 numpy 版本来定义类型：

# random generator
if np_version_under1p17:
    RandomState = Union[int, ArrayLike, np.random.RandomState]
else:
    RandomState = Union[int, ArrayLike, np.random.Generator, np.random.RandomState]

Run Code Online (Sandbox Code Playgroud)

但这会导致：

Cannot assign multiple types to name "RandomState" without an explicit "Type[...]" annotation

Run Code Online (Sandbox Code Playgroud)

删除if .. else可以解决此错误：

Cannot assign multiple types to name "RandomState" without an explicit "Type[...]" annotation

Run Code Online (Sandbox Code Playgroud)

但是我们使用旧 numpy 版本的测试将会失败。

定义的最佳方法是什么RandomState，但是以这样的方式定义它，以便它在我们的测试中既适用于新旧的 numpy 版本。

python type-hinting mypy

Erf*_*fan

2020 12-31

6
推荐指数

0
解决办法

1444
查看次数

Pandas 中的最近邻匹配

给定两个 DataFrame（t1，t2），都具有列“x”，我如何将具有 t2 ID 的列附加到 t1，其中 t2 的“x”值最接近 t1 中的“x”值？

t1:
id  x
1   1.49
2   2.35

t2:
id  x
3   2.36
4   1.5

output:
id  id2
1   4
2   3

Run Code Online (Sandbox Code Playgroud)

我可以通过创建一个新的 DataFrame 并迭代 t1.groupby() 并在 t2 上查找然后合并来完成此操作，但是对于 1700 万行的 t1 DataFrame，这需要非常长的时间。

有没有更好的方法来实现？我已经搜索了有关 groupby、apply、transform、agg 等的 pandas 文档。尽管我认为这将是一个常见问题，但一个优雅的解决方案尚未出现。

python pandas

rob*_*pes

2019 04-19

5
推荐指数

1
解决办法

5530
查看次数

如何在 ReportLab 的内存中生成 pdf

就我而言，我想在 Flask 应用程序的内存中生成 PDF，这样我就可以直接将其作为下载发送给用户，而不是先将其保存到磁盘。

我们现在的代码：

import os
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import A4


# normally this in a class but I simplified it for the example
c = canvas.Canvas(os.path.join("mypath", "report.pdf"), pagesize=A4)
c.drawString(100, 100, "Hello World")
c.save()

Run Code Online (Sandbox Code Playgroud)

python reportlab

Erf*_*fan

lucky-day

5
推荐指数

1
解决办法

2475
查看次数