我有一个类似 NaT 的数据框,这给了我一个DataError: (psycopg2.DataError) invalid input syntax for type timestamp: "NaT"
:当我尝试将值插入 postgres db 时
from sqlalchemy import MetaData
from sqlalchemy.dialects.postgresql import insert
import pandas as pd
tst_df = pd.DataFrame({'colA':['a','b','c','a','z', 'q'],
'colB': pd.date_range(end=datetime.datetime.now() , periods=6),
'colC' : ['a1','b2','c3','a4','z5', 'q6']})
tst_df.loc[5, 'colB'] = pd.NaT
insrt_vals = tst_df.to_dict(orient='records')
engine = sqlalchemy.create_engine("postgresql://user:password@localhost/postgres")
connect = engine.connect()
meta = MetaData(bind=engine)
meta.reflect(bind=engine)
table = meta.tables['tstbl']
insrt_stmnt = insert(table).values(insrt_vals)
do_nothing_stmt = insrt_stmnt.on_conflict_do_nothing(index_elements=['colA','colB'])
Run Code Online (Sandbox Code Playgroud)
results = engine.execute(do_nothing_stmt)
DataError: (psycopg2.DataError) invalid input syntax for type timestamp: "NaT" …
Run Code Online (Sandbox Code Playgroud) 使用prefect,我想从其他两个流创建一个新流。
我得到的错误是A task with the slug "add_num" already exists in this flow.
是否可以更新Flows
使用相同tasks
或Parameters
. 下面是我试图完成的一个最小的例子。`
from prefect import task, Flow, Parameter
@task
def add_one(x):
return x+1
with Flow("Flow 1") as flow_1:
add_num = Parameter("add_num", default=10)
new_num1 = add_one(add_num)
@task
def add_two(y):
return y+1
with Flow("Flow 2") as flow_2:
add_num = Parameter("add_num", default=10)
new_num2 = add_two(add_num)
combo_fl = Flow("Add Numbers")
combo_fl.update(flow_1)
combo_fl.update(flow_2, validate=False)
Run Code Online (Sandbox Code Playgroud)
我确实在 slack 频道上看到了这段代码,它可能与解决这个问题有关,但我不知道如何使用它。
class GlobalParameter(Parameter):
def __init__(self, name, slug=None, *args, …
Run Code Online (Sandbox Code Playgroud) 如何将 a 分配name
给merge
流程中的任务?
thing_three = merge(thing_one, thing_two)
Run Code Online (Sandbox Code Playgroud)
最终目标是merge
使用检索flow.get_tasks(name="thing_merger")
提前致谢。
我有一个包含两列的熊猫数据框。列值之一需要映射到十六进制颜色。另一个绘图过程从那里接管。
这是我到目前为止所尝试的。部分玩具代码取自此处。
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
# Create dataframe
df = pd.DataFrame(np.random.randint(0,21,size=(7, 2)), columns=['some_value', 'another_value'])
# Add a nan to handle realworld
df.iloc[-1] = np.nan
# Try to map values to colors in hex
# # Taken from here
norm = matplotlib.colors.Normalize(vmin=0, vmax=21, clip=True)
mapper = plt.cm.ScalarMappable(norm=norm, cmap=plt.cm.viridis)
df['some_value_color'] = df['some_value'].apply(lambda x: mapper.to_rgba(x))
df
Run Code Online (Sandbox Code Playgroud)
哪些输出:
如何'some_value'
一次性将df 列值转换为十六进制?理想情况下使用sns.cubehelix_palette(light=1)
我不反对使用除 matplotlib
提前致谢。
我有一个数据框,其列数据类型需要映射到python本机数据类型.
我希望能够从numpy获取字典并将每列转换为它的本机类型.
例如:
{numpy.object_: object,
numpy.bool_: bool,
numpy.string_: str,
numpy.unicode_: unicode,
numpy.int64: int,
numpy.float64: float,
numpy.complex128: complex}
Run Code Online (Sandbox Code Playgroud)
我试过了两个,astype
并pd.to_numeric
没有充分地低估这个专栏.
df['source'] = df['source'].astype(int)
和返回int32
一样pd.to_numeric
大多数评论质疑这样做的智慧.networkx
阅读dataframes
并接受np datatypes
.但是,json_dumps
由于这个记录良好的错误,无法使用图形编写:TypeError: Object of type 'int64' is not JSON serializable
谢谢
我有一个非常小 的短字符串列表,我希望(1)集群和(2)使用该模型来预测新字符串属于哪个集群.
运行第一部分工作正常,获得新字符串的预测不会.
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
# List of
documents_lst = ['a small, narrow river',
'a continuous flow of liquid, air, or gas',
'a continuous flow of data or instructions, typically one having a constant or predictable rate.',
'a group in which schoolchildren of the same age and ability are taught',
'(of liquid, air, gas, etc.) run or flow in a continuous current in a specified direction',
'transmit or receive (data, …
Run Code Online (Sandbox Code Playgroud) python-3.x ×6
pandas ×3
python ×3
prefect ×2
colors ×1
dataframe ×1
k-means ×1
matplotlib ×1
nlp ×1
numpy ×1
postgresql ×1
psycopg2 ×1
scikit-learn ×1
sqlalchemy ×1