我想在Jupyter笔记本中执行一个长期运行的Python脚本,这样我就可以破解中期生成的数据结构.
该脚本具有许多依赖项和命令行参数,并使用特定的virtualenv执行.是否可以从指定的virtualenv(与Jupyter安装的不同)交互式地在笔记本中运行Python脚本?
谢谢!
我试图对数据帧进行列绑定并遇到熊猫问题concat
,因为ignore_index=True
似乎不起作用:
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=[0, 2, 3,4])
df2 = pd.DataFrame({'A1': ['A4', 'A5', 'A6', 'A7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D2': ['D4', 'D5', 'D6', 'D7']},
index=[ 5, 6, 7,3])
df1
# A B D
# 0 A0 B0 D0
# 2 A1 B1 D1
# 3 A2 B2 D2
# 4 A3 B3 D3
df2
# A1 C D2
# 5 A4 C4 …
Run Code Online (Sandbox Code Playgroud) 我正在尝试将列分组并计算另一列上的值计数.
import pandas as pd
dftest = pd.DataFrame({'A':[1,1,1,1,1,1,1,1,1,2,2,2,2,2],
'Amt':[20,20,20,30,30,30,30,40, 40,10, 10, 40,40,40]})
print(dftest)
Run Code Online (Sandbox Code Playgroud)
dftest看起来像
A Amt
0 1 20
1 1 20
2 1 20
3 1 30
4 1 30
5 1 30
6 1 30
7 1 40
8 1 40
9 2 10
10 2 10
11 2 40
12 2 40
13 2 40
Run Code Online (Sandbox Code Playgroud)
执行分组
grouper = dftest.groupby('A')
df_grouped = grouper['Amt'].value_counts()
Run Code Online (Sandbox Code Playgroud)
这使
A Amt
1 30 4
20 3
40 2
2 40 3
10 2 …
Run Code Online (Sandbox Code Playgroud) import asyncio
from sqlalchemy import Column
from sqlalchemy import DateTime
from sqlalchemy import ForeignKey
from sqlalchemy import func
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.ext.asyncio import create_async_engine
from sqlalchemy.future import select
from sqlalchemy.orm import declarative_base
from sqlalchemy.orm import relationship
from sqlalchemy.orm import selectinload
from sqlalchemy.orm import sessionmaker
engine = create_async_engine(
"postgresql+asyncpg://user:pass@localhost/db",
echo=True,
)
# expire_on_commit=False will prevent attributes from being expired
# after commit.
async_session = sessionmaker(
engine, expire_on_commit=False, class_=AsyncSession
)
Base = …
Run Code Online (Sandbox Code Playgroud) 在 pydantic 中,有一种更干净的方法可以从模型中排除多个字段,例如:
class User(UserBase):
class Config:
exclude = ['user_id', 'some_other_field']
Run Code Online (Sandbox Code Playgroud)
我知道以下有效。
class User(UserBase):
class Config:
fields = {'user_id': {'exclude':True},
'some_other_field': {'exclude':True}
}
Run Code Online (Sandbox Code Playgroud)
但我一直在寻找像 django Rest Framework 这样更干净的东西,你可以在其中指定排除或包含的列表。
一个pyspark.sql.DataFrame
混乱的显示DataFrame.show()
- 行换行而不是滚动.
我试过这些选择
import IPython
IPython.auto_scroll_threshold = 9999
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display
Run Code Online (Sandbox Code Playgroud)
但没有运气.虽然在Atom编辑器中使用jupyter插件时滚动工作:
对于这样的数据帧:
d = {'id': [1,1,1,2,2], 'Month':[1,2,3,1,3],'Value':[12,23,15,45,34], 'Cost':[124,214,1234,1324,234]}
df = pd.DataFrame(d)
Cost Month Value id
0 124 1 12 1
1 214 2 23 1
2 1234 3 15 1
3 1324 1 45 2
4 234 3 34 2
Run Code Online (Sandbox Code Playgroud)
我应用pivot_table
df2 = pd.pivot_table(df,
values=['Value','Cost'],
index=['id'],
columns=['Month'],
aggfunc=np.sum,
fill_value=0)
Run Code Online (Sandbox Code Playgroud)
得到df2:
Cost Value
Month 1 2 3 1 2 3
id
1 124 214 1234 12 23 15
2 1324 0 234 45 0 34
Run Code Online (Sandbox Code Playgroud)
是否有一种简单的方法来格式化结果数据帧列名称,如
id Cost1 Cost2 Cost3 Value1 Value2 …
Run Code Online (Sandbox Code Playgroud) 我发现了两种在 FastAPI 中使用数据库的方法。有什么根本区别吗?如果是这样,哪种方法更可取?
方法 1 这是在官方 FastAPI-fullstack 示例中找到的:
from fastapi import Depends
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
engine = create_engine(SQLALCHEMY_DATABASE_URI, pool_pre_ping=True)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
def get_db() -> Generator:
try:
db = SessionLocal()
yield db
finally:
db.close()
def get_current_user(
db: Session = Depends(get_db), token: str = Depends(reusable_oauth2)
) -> models.User:
...
Run Code Online (Sandbox Code Playgroud)
方法 2在官方FastAPI 文档、一些博客以及包文档
中找到以下方法fastapi_users
:
import databases
DATABASE_URL = "sqlite:///./test.db"
database = databases.Database(DATABASE_URL)
@app.on_event("startup")
async def startup():
await database.connect()
@app.on_event("shutdown")
async …
Run Code Online (Sandbox Code Playgroud) 我想切片多索引pandas数据帧
这是获取我的测试数据的代码:
import pandas as pd
testdf = {
'Name': {
0: 'H', 1: 'H', 2: 'H', 3: 'H', 4: 'H'}, 'Division': {
0: 'C', 1: 'C', 2: 'C', 3: 'C', 4: 'C'}, 'EmployeeId': {
0: 14, 1: 14, 2: 14, 3: 14, 4: 14}, 'Amt1': {
0: 124.39, 1: 186.78, 2: 127.94, 3: 258.35000000000002, 4: 284.77999999999997}, 'Amt2': {
0: 30.0, 1: 30.0, 2: 30.0, 3: 30.0, 4: 60.0}, 'Employer': {
0: 'Z', 1: 'Z', 2: 'Z', 3: 'Z', 4: …
Run Code Online (Sandbox Code Playgroud) 如何使用dask
特定的AWS配置文件(存储在凭证文件中)在s3上读取镶木地板文件。Dask s3fs
使用boto
。这是我尝试过的:
>>>import os
>>>import s3fs
>>>import boto3
>>>import dask.dataframe as dd
>>>os.environ['AWS_SHARED_CREDENTIALS_FILE'] = "~/.aws/credentials"
>>>fs = s3fs.S3FileSystem(anon=False,profile_name="some_user_profile")
>>>fs.exists("s3://some.bucket/data/parquet/somefile")
True
>>>df = dd.read_parquet('s3://some.bucket/data/parquet/somefile')
NoCredentialsError: Unable to locate credentials
Run Code Online (Sandbox Code Playgroud) python ×9
pandas ×5
dataframe ×2
fastapi ×2
pivot-table ×2
amazon-s3 ×1
append ×1
async-await ×1
asyncpg ×1
backend ×1
boto3 ×1
concat ×1
dask ×1
data-science ×1
ipython ×1
jupyter ×1
model ×1
multi-index ×1
pydantic ×1
pyspark ×1
pyspark-sql ×1
python-3.x ×1
reshape ×1
rest ×1
sqlalchemy ×1
virtualenv ×1