嗨,我正在使用pandas将列转换为月份.当我读取我的数据时,它们就是对象:
Date object
dtype: object
Run Code Online (Sandbox Code Playgroud)
所以我首先让它们到达约会时间,然后尝试将它们作为月份:
import pandas as pd
file = '/pathtocsv.csv'
df = pd.read_csv(file, sep = ',', encoding='utf-8-sig', usecols= ['Date', 'ids'])
df['Date'] = pd.to_datetime(df['Date'])
df['Month'] = df['Date'].dt.month
Run Code Online (Sandbox Code Playgroud)
如果这有帮助:
In [10]: df['Date'].dtype
Out[10]: dtype('O')
Run Code Online (Sandbox Code Playgroud)
所以,我得到的错误是这样的:
/Library/Frameworks/Python.framework/Versions/2.7/bin/User/lib/python2.7/site-packages/pandas/core/series.pyc in _make_dt_accessor(self)
2526 return maybe_to_datetimelike(self)
2527 except Exception:
-> 2528 raise AttributeError("Can only use .dt accessor with datetimelike "
2529 "values")
2530
AttributeError: Can only use .dt accessor with datetimelike values
Run Code Online (Sandbox Code Playgroud)
编辑:
日期列如下:
0 2014-01-01
1 2014-01-01
2 2014-01-01
3 2014-01-01
4 2014-01-03 …Run Code Online (Sandbox Code Playgroud) 我有一个嵌套的 JSON 字段,如下所示:
trend
{"trend":0,"abs":0,"per":null}
Run Code Online (Sandbox Code Playgroud)
我想查询并获取里面的值,所以我有一个 case 语句来获取值:
select
(CASE WHEN trend like '%"trend"%' THEN json_extract_path_text(trend, 'trend') ELSE NULL END)::SMALLINT,
(CASE WHEN trend like '%"abs"%' THEN json_extract_path_text(trend, 'abs') ELSE NULL END)::INTEGER,
(CASE WHEN trend like '%"per"%' THEN json_extract_path_text(trend, 'per') ELSE NULL END)::DOUBLE PRECISION
from staging_raw.table
Run Code Online (Sandbox Code Playgroud)
问题出在最后一个字段中per,因为null我已经在 Redshift 中收到此错误消息:
我尝试使用该子句:
(CASE WHEN trend like '%"per"%' and json_extract_path_text(trend, 'per') is not NULL THEN json_extract_path_text(trend, 'per') ELSE NULL END)::DOUBLE PRECISION
Run Code Online (Sandbox Code Playgroud)
但我仍然遇到同样的错误。如果来源是这样的,如何解决?
我正在尝试在创建的表中插入一些数据。我有一个看起来像这样的数据框:
我创建了一个表:
create table online.ds_attribution_probabilities
(
attribution_type text,
channel text,
date date ,
value float
)
Run Code Online (Sandbox Code Playgroud)
我正在运行此python脚本:
engine = create_engine("postgresql://@e.eu-central-1.redshift.amazonaws.com:5439/mdhclient_encoding=utf8")
connection = engine.raw_connection()
result.to_sql('online.ds_attribution_probabilities', con=engine, index = False, if_exists = 'append')
Run Code Online (Sandbox Code Playgroud)
我没有收到任何错误,但是当我检查表中没有数据时。有什么事吗 我是否必须承诺或执行其他步骤?
我试图在数据库中插入不同的值:我的查询是这样的:
query2 = "INSERT INTO users (id, name, screenName, createdAt, defaultProfile, defaultProfileImage,\
description) SELECT DISTINCT %s, %s, %s, %s, %s, %s, %s;"
cur.execute(query2, (user_id, user_name, user_screenname, user_createdat, \
default_profile, default_profile_image, description))
Run Code Online (Sandbox Code Playgroud)
但是,我仍然收到错误:psycopg2.IntegrityError: 重复的键值违反了唯一约束 "user_key"。
我猜查询正在插入所有数据,然后选择不同的值?我猜想的另一种方法是将所有数据存储在临时数据库中,然后在那里检索它们。但有没有更简单的方法呢?
谢谢你!
我有一个脚本,我从csv文件中读取一些zipcodes.zipcodes的格式如下:
zipcode
75180
90672
01037
20253
09117
31029
07745
90453
12105
18140
36108
10403
76470
06628
93105
88069
31094
84095
63069
Run Code Online (Sandbox Code Playgroud)
然后我运行一个脚本:
import requests
import pandas as pd
import time
file = '/Users/zipcode.csv'
reader = pd.read_csv(file, sep=';', encoding='utf-8-sig')
zipcodes = reader["zipcode"].astype(str)
base_url = "https://api.blabla/?zipcode={zipcode}"
headers = {'Authentication': 'random'}
for zipcode in zipcodes:
url = base_url.format(zipcode=zipcode)
r = requests.get(url,
headers=headers)
for r_info in r.json()["data"]:
print zipcode,r_info["id"]
time.sleep(0.5)
Run Code Online (Sandbox Code Playgroud)
但是,每当有一个以0开头的邮政编码时,我得到的结果是4位数,它与实际的0不匹配.我已经格式化了我的csv,其中有一个文本列,但它仍然不起作用.
我得到的zipcodes是这样的:
zipcode
75180
90672
1037
20253
9117
31029
7745
90453
12105
18140
36108 …Run Code Online (Sandbox Code Playgroud) 我有这个查询,我想返回零值而不是空值。
create view ct as
select userid, coalerse(count(tweets), 0) as nooftweets, coalerse(count(distinct mention), 0) as mention
from (
select t.user_id as userid, t.id as tweets, m.mentionedusers_id as mention, row_number() over (partition by m.tweet_id order by m.mentionedusers_id
) rn
from "tweet_mentUsers" m right join tweet t on m.tweet_id = t.id where text like '@%') a where rn <= 2 group by 1
Run Code Online (Sandbox Code Playgroud)
但是我收到此错误消息:
ERROR: function coalerse(bigint, integer) does not exist
LINE 2: select userid, coalerse(nooftweets, 0), coalerse(mention, 0)...
^
HINT: No …Run Code Online (Sandbox Code Playgroud) 我试图从本周而不是从过去 7 天获取数据。我的查询是:
select
order_datetime_tz::date AS date,
orders
FROM
order_fact f
where order_datetime_tz < current_date
and order_datetime_tz >= date_trunc('week',current_date) - interval '1 week'
Run Code Online (Sandbox Code Playgroud)
然而,这让我在过去 7 天返回.. 对此有什么想法吗?
谢谢!!