我有以下代码:
from pyspark.sql import functions as func
cols = ("id","size")
result = df.groupby(*cols).agg({
func.max("val1"),
func.median("val2"),
func.std("val2")
})
Run Code Online (Sandbox Code Playgroud)
但它在无法找到func.median("val2")的消息行中失败。同样发生在.medianfuncstd
我有以下短语列表:
[
'This is erleada comp. recub. con película 60 mg.',
'This is auxina e-200 uicaps. blanda 200 mg.',
'This is ephynalsol. iny. 100 mg.',
'This is paracethamol 100 mg.'
]
Run Code Online (Sandbox Code Playgroud)
我需要得到以下结果:
[
'This is erleada.',
'This is auxina.',
'This is ephynalsol.',
'This is paracethamol.'
]
Run Code Online (Sandbox Code Playgroud)
我编写了以下函数来清理短语:
def clean(string):
sub_strings = [".","iny","comp","uicaps]
try:
string = [string[:string.index(sub_str)].rstrip() for sub_str in sub_strings]
return string
except:
return string
Run Code Online (Sandbox Code Playgroud)
并按如下方式使用它:
for phrase in phrases:
drug = clean(phrase)
Run Code Online (Sandbox Code Playgroud) 我有以下熊猫数据帧:
col1 col2 col3 col4
A 2021-03-28 01:40:00 1.381158 0.0
A 2021-03-28 01:50:00 0.480089 0.0
A 2021-03-28 03:00:00 0.000000 0.0
A 2021-03-28 03:00:00 0.111088 0.0
A 2021-03-28 03:10:00 0.000000 0.0
A 2021-03-28 03:10:00 0.000000 0.0
A 2021-03-28 03:10:00 0.151066 0.0
B 2021-03-28 03:10:00 1.231341 1.0
Run Code Online (Sandbox Code Playgroud)
我需要合并具有相同col1和col2值的行,并为col3.
这是预期的输出:
col1 col2 col3 col4
A 2021-03-28 01:40:00 1.381158 0.0
A 2021-03-28 01:50:00 0.480089 0.0
A 2021-03-28 03:00:00 0.111088 0.0
A 2021-03-28 03:10:00 0.151066 0.0
B 2021-03-28 03:10:00 …Run Code Online (Sandbox Code Playgroud) 我有以下数据框:
dt_datetime stage proc_val
2011-11-13 11:00 0 20
2011-11-13 11:10 0 21
2011-11-13 11:30 1 25
2011-11-13 11:40 2 22
2011-11-13 11:55 2 28
2011-11-13 12:00 2 29
Run Code Online (Sandbox Code Playgroud)
我需要添加一个名为的新列stage_duration并获得以下结果:
dt_datetime stage proc_val stage_duration
2011-11-13 11:00 0 20 30
2011-11-13 11:10 0 21 30
2011-11-13 11:30 1 25 10
2011-11-13 11:40 2 22 20
2011-11-13 11:55 2 28 20
2011-11-13 12:00 2 29 20
Run Code Online (Sandbox Code Playgroud)
我该怎么做?
这是我当前的代码片段,但它没有提供预期的结果。它应该计算具有相同阶段值的行之间的持续时间,然后获取每个阶段的累积持续时间,但事实并非如此。
df['stage_duration'] = df.groupby('stage')['dt_datetime'].diff().dt.total_seconds() / 60
df['stage_duration'] = df['stage_duration'].cumsum()
Run Code Online (Sandbox Code Playgroud)
更新:
如果数据帧包含多组阶段,该解决方案也应该有效,例如,请参阅从2011-11-13 11:00和开始的阶段 …
我在集群中训练了一个模型,下载了它(pkl 格式)并尝试在本地加载。我知道 sklearn 的 joblib 版本用于保存模型mymodel.pkl(但我不知道到底是哪个版本......)。
from sklearn.externals import joblib
print(joblib.__version__)
model = joblib.load("mymodel.pkl")
Run Code Online (Sandbox Code Playgroud)
0.13.0我本地使用sklearn的joblib版本。
这是我得到的错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-100-d0a3c42e5c53> in <module>
3 print(joblib.__version__)
4
----> 5 model = joblib.load("mymodel.pkl")
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py in load(filename, mmap_mode)
596 return load_compatibility(fobj)
597
--> 598 obj = _unpickle(fobj, filename, mmap_mode)
599
600 return obj
~\AppData\Local\Continuum\anaconda3\lib\site-packages\sklearn\externals\joblib\numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
524 obj = None
525 try:
--> 526 obj = unpickler.load()
527 if unpickler.compat_mode:
528 warnings.warn("The file '%s' has …Run Code Online (Sandbox Code Playgroud) 我有以下字符串列表:
list_of_str = ['Notification message', 'Warning message', 'This is the |xxx - show| message.', 'Notification message is defined by |xxx - show|', 'Notification message']
Run Code Online (Sandbox Code Playgroud)
如何获取最接近尾部且包含 的字符串show|,并替换show|为Placeholder|?
预期结果:
list_of_str = ['Notification message', 'Warning message', 'This is the |xxx - show| message.', 'Notification message is defined by |xxx - Placeholder|', 'Notification message']
Run Code Online (Sandbox Code Playgroud) 我有以下字符串:
'"{\\"values\\": [3.304000000004, 3.010000000002, 5.8220000000063]}"'
Run Code Online (Sandbox Code Playgroud)
我需要将其转换为 JSON。如果我做:
parsed = json.loads(data)
parsed["values"]
Run Code Online (Sandbox Code Playgroud)
...然后我收到以下错误:
TypeError: string indices must be integers
Run Code Online (Sandbox Code Playgroud)
如何解决?
python ×7
pandas ×2
apache-spark ×1
json ×1
pickle ×1
pyspark ×1
regex ×1
scikit-learn ×1
string ×1