我很困惑为什么没有交叉验证的随机森林分类模型产生的平均准确度分数为 0.996,但使用 5 折交叉验证,模型的平均准确度分数为 0.687。
有 275,956 个样本。第 0 类 = 217891,第 1 类 = 6073,第 2 类 = 51992
我试图预测“TARGET”列,它是 3 个类 [0,1,2]:
data.head()
bottom_temperature bottom_humidity top_temperature top_humidity external_temperature external_humidity weight TARGET
26.35 42.94 27.15 40.43 27.19 0.0 0.0 1
36.39 82.40 33.39 49.08 29.06 0.0 0.0 1
36.32 73.74 33.84 42.41 21.25 0.0 0.0 1
Run Code Online (Sandbox Code Playgroud)
从文档中,数据分为训练和测试
# link to docs http://scikit-learn.org/stable/modules/cross_validation.html
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn import svm
# Create a list of …
Run Code Online (Sandbox Code Playgroud) 我想将一个空格分隔的字符串分成 5 个并为每个创建列,但发现很难产生所需的输出。编辑:使用标准 SQL 方言
样本数据:
Row published_at data_string device id
1 2016-10-26T22:53:03.209Z 70.77 3.38 61.65 7.98 73.20 3.29 63.55 nan nan nan nan 2a0025000351353337353037
...
1 of 570 rows
Run Code Online (Sandbox Code Playgroud)
期望的输出:
Row published_at battery temp1 humid1 temp2 humid2 temp3 humid3 device_id
1 2016-11-03T16:24:09.833Z 70.77 3.38 61.65 7.98 73.20 3.29 63.55 2a0025000351353337353037
1 of 570 rows
Run Code Online (Sandbox Code Playgroud)
尝试查询 1.a:
WITH
h2a0025_2 AS (
SELECT
TIMESTAMP '2016-10-26T22:53:03.209Z' AS published_at,
'70.77 3.38 61.65 7.98 73.20 3.29 63.55 nan nan nan nan' AS data_string,
'2a0025000351353337353037' …
Run Code Online (Sandbox Code Playgroud) 我在过去 24 小时内查询表中的所有数据时遇到困难,并且很难说这是否是我的 python 部分的 postgres 错误,因为我是初学者
我看到“publishedAt”字段返回一个 datetime.datetime 值。
# print out columns names
cur.execute(
"""
SELECT *
FROM "table"
LIMIT 1
"""
)
# print out columns names
colnames = [desc[0] for desc in cur.description]
print(colnames)
# print out col values
rows = cur.fetchall()
print(rows)
['id', 'publishedAt', ......]
[['5a086f56-d080-40c0-b6fc-ee78b08aec3d', datetime.datetime(2018, 11, 11,
15, 39, 58, tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=0, name=None)), .....]
Run Code Online (Sandbox Code Playgroud)
然而,
cur.execute(
"""
SELECT *
FROM "table"
WHERE publishedAt BETWEEN %s and %s;""",
(dt.datetime.now() - dt.timedelta(days=1))
)
Run Code Online (Sandbox Code Playgroud)
结果是:
TypeError: 'datetime.datetime' …
Run Code Online (Sandbox Code Playgroud)