小编Osc*_*sca的帖子

隐藏 pandas 警告：SQLAlchemy

我想隐藏这个警告UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy并且我已经尝试过

import warnings
warnings.simplefilter(action='ignore', category=UserWarning)

import pandas

Run Code Online (Sandbox Code Playgroud)

但警告仍然显示。

我的 python 脚本从数据库读取数据。我用于pandas.read_sqlSQL 查询和psycopg2数据库连接。

我还想知道哪一行触发了警告。

python postgresql pandas

Osc*_*sca

2022 09-06

5
推荐指数

1
解决办法

1万
查看次数

One-sample test for proportion

I want to do "One-sample test for proportion" with Python. I found this document one sample proportion ztest example but I don't understand how to use it. For example, what are count and nobs. In the 2 examples, example1 gives single number for count and nobs, however, example2 gives 2 numbers.

For result, I'd like to know the p-value that the event happen rate is higher than 60%

Example1

>>> count = 5
>>> nobs = 83
>>> value = …

Run Code Online (Sandbox Code Playgroud)

python

Osc*_*sca

2018 11-05

4
推荐指数

1
解决办法

5127
查看次数

将数据帧转换为 fasttext 数据格式

我想将数据帧转换为 fasttext 格式

我的数据框

text                                                             label 
Fan bake vs bake                                                 baking
What's the purpose of a bread box?                               storage-method
Michelin Three Star Restaurant; but if the chef is not there     restaurant

Run Code Online (Sandbox Code Playgroud)

快速文本格式

__label__baking Fan bake vs bake
__label__storage-method What's the purpose of a bread box?
__label__restaurant Michelin Three Star Restaurant; but if the chef is not there

Run Code Online (Sandbox Code Playgroud)

我尝试过df['label'].apply(lambda x: '__label__' + x).add_suffix(df['text']) ，但它没有按我的预期工作。我应该如何更改我的代码？

python pandas fasttext

Osc*_*sca

lucky-day

3
推荐指数

1
解决办法

1183
查看次数

fasttext 错误：预测一次处理一行（删除 '\n'）

您好，我有一个包含文本的数据框列。我想使用 fasttext 模型来进行预测。我可以通过将文本数组传递给 fasttext 模型来实现此目的。

import fasttext
d = {'id':[1, 2, 3], 'name':['a', 'b', 'c']}
df = pd.DataFrame(data=d)

Run Code Online (Sandbox Code Playgroud)

我从系列中删除了“\n”

name_list = df['name'].tolist()
name_list = [name.strip() for name in name_list]

Run Code Online (Sandbox Code Playgroud)

并做出预测model.predict(name_list)

然而，我得到了ValueError: predict processes one line at a time (remove '\n')

我的列表中没有 '\n' 并且'\n' in name_list返回False

我还发现了一个有类似问题的帖子，但仍然遇到同样的错误。

predictions=[]
for line in df['name']:
    pred_label=model.predict(line, k=-1, threshold=0.5)[0][0]
    predictions.append(pred_label)
df['prediction']=predictions

Run Code Online (Sandbox Code Playgroud)

pandas fasttext

Osc*_*sca

2021 01-21

3
推荐指数

1
解决办法

3347
查看次数

Remove dataframe rows if index larger than x

I want to remove dataframe rows that index is larger than 13491.

I tried

df.drop(df.index > [13491])

Run Code Online (Sandbox Code Playgroud)

but received error

KeyError: 'labels [False False False ...  True  True  True] not contained in axis'

Run Code Online (Sandbox Code Playgroud)

This one works fine

df= df[df.index < 13492]

Run Code Online (Sandbox Code Playgroud)

But how to remove the filtered rows from dataframe ?

Can someone give me some suggestions ? Thank you in advanced !

python pandas

Osc*_*sca

2018 09-07

2
推荐指数

1
解决办法

3900
查看次数

在SQL Server中按两列分组

嗨,我有一个如下表所示,我希望按照date_contact和user_id分组创建群组.我收到错误消息,说"cohort_month"不是有效名称.

SELECT user_id, CONVERT(VARCHAR(7), min(date_contact), 120) AS cohort_month
from cohort
group by user_id, cohort_month

Run Code Online (Sandbox Code Playgroud)

有什么建议吗？谢谢!

sql-server

Osc*_*sca

lucky-day

2
推荐指数

1
解决办法

65
查看次数

Pandas：从频率表中选择百分比最高的列

您好，我有一个数据框，我想从频率表中选择百分比最高的列。

d = {'c1':['a', 'a', 'b', 'b', 'c', 'c'], 'c2':['Low', 'High', 'Low', 'High', 'High', 'High']}
dd = pd.DataFrame(data=d)
dd.groupby('c1')['c2'].value_counts(normalize=True).mul(100)

Run Code Online (Sandbox Code Playgroud)

它将返回一个频率表

c1  c2  
a   High     50.0
    Low      50.0
b   High     50.0
    Low      50.0
c   High    100.0
Name: c2, dtype: float64

Run Code Online (Sandbox Code Playgroud)

我想打印出c百分比最高的100.0

我可以使用max()打印输出100.0，但不知道如何打印输出c

pandas

Osc*_*sca

lucky-day

2
推荐指数

1
解决办法

140
查看次数

使用 squareify.plot 在标签上显示多列值

我有一个数据框，我想用它来绘制树图squarify。我想通过编辑参数在图表上显示country_name和，但它似乎只采用一个值。countslabels

示例数据

import squarify
import pandas as pd
from matplotlib import pyplot as plt
d = {'country_name':['USA', 'UK', 'Germany'], 'counts':[100, 200, 300]}
dd = pd.DataFrame(data=d)

Run Code Online (Sandbox Code Playgroud)

fig = plt.gcf()
ax = fig.add_subplot()
fig.set_size_inches(16, 4.5)
norm = matplotlib.colors.Normalize(vmin=min(dd.counts), vmax=max(dd.counts))
colors = [matplotlib.cm.Blues(norm(value)) for value in dd.counts]
squarify.plot(label=dd.country_name, sizes=dd.counts, alpha=.7, color=colors)
plt.axis('off')
plt.show()

Run Code Online (Sandbox Code Playgroud)

预期输出将在图表上同时出现counts和。country_name

matplotlib pandas squarify

Osc*_*sca

lucky-day

1
推荐指数

1
解决办法

855
查看次数

将字典的所有值更改为1

我想将字典的所有值更改为1(浮点数),我在网上进行了研究,但似乎人们很少有这种随机需求.

这本词典有数以千计的条目,下面是其中的一部分

{
 '2015': [2.8216107792591907],
 '2016': [2.3686578052627687],
 '2017': [2.03069274701226]
}

Run Code Online (Sandbox Code Playgroud)

有人可以给我一些想法吗？谢谢!

python arrays dictionary

Osc*_*sca

2018 08-31

0
推荐指数

1
解决办法

68
查看次数

标签统计

pandas ×6

python ×5

fasttext ×2

arrays ×1

dictionary ×1

matplotlib ×1

postgresql ×1

sql-server ×1

squarify ×1

标签 统计

小编Osc_sca的帖子

标签统计