我试图在两个具有相同列但行数不同的 Pandas DataFrame 中找到重叠的行:
df1.shape
(187399, 784)
df2.shape
(9790, 784)
Run Code Online (Sandbox Code Playgroud)
在之后pd.merge()的操作
common_cols = df1.columns.tolist()
df3 = pd.merge(df1, df2, on=common_cols, how="inner")
Run Code Online (Sandbox Code Playgroud)
我得到的结果比 df1 和 df2 都大
df3.shape
(283979, 784)
Run Code Online (Sandbox Code Playgroud)
这怎么可能,我做错了什么?我有两个 dfs,[0,1,2,3...783]每个 df都有 784 列命名和不同的行数。我只想在这些 dfs 中找到相同行的交集。这意味着,如果df1and 中存在一行df2,则它必须转到df3
在上一步中,我从每个 df 中删除了重复项pd.drop_duplicates()
在标题“问题 5”之后链接到带有代码的 jupyter 笔记本 https://github.com/kuatroka/udacity_deep_learning/blob/master/1_notmnist-Copy1.ipynb
在我的keras模型的预测阶段,当我打印出预测值和类时,我在predict_proba()和predict()中给出了不同的概率.此外,predict_classes()的输出与概率不对应.以下代码示例和打印输出示例:
码:
p = pd.read_csv("test.csv", header=None)
p = np.reshape(p.values, (50, seq_length))
for i in range(len(p)):
p[i] = scaler.fit_transform(p[i])
p = np.reshape(p, (50, seq_length, 1))
model.predict(p, batch_size=50)
model.predict_classes(p, batch_size=50)
model.predict_on_batch(p)
model.predict_proba(p, batch_size=50)
for i in zip(model.predict_proba(p, batch_size=50), model.predict_classes(p, batch_size=50), model.predict(p, batch_size=50)):
print("model.predict_proba", "--", i[0], "model.predict","--", i[2], "predict_clases", "--", i[1])
Run Code Online (Sandbox Code Playgroud)
输出样本与不同predict_proba()和predict()
model.predict_proba -- [ 0.18768159 0.81231844] model.predict -- [ 0.18982948 0.81017047] predict_classes -- 1
model.predict_proba -- [ 0.55918539 0.4408147 ] model.predict -- [ 0.78916383 0.2108362 ] predict_classes -- 1 …Run Code Online (Sandbox Code Playgroud) 在下面的代码中,我不明白为什么download_progress_hook在从maybe_download方法中调用它时没有传递参数的情况下工作.
download_progress_hook状态的定义有三个必须传递的参数:count, blockSize, totalSize.但是,当download_progress_hook从maybe_download那里调用时,没有传递参数.为什么不失败?
这是完整的代码:
url = 'http://commondatastorage.googleapis.com/books1000/'
last_percent_reported = None
data_root = '.' # Change me to store data elsewhere
def download_progress_hook(count, blockSize, totalSize):
"""A hook to report the progress of a download. This is mostly intended for users with
slow internet connections. Reports every 5% change in download progress.
"""
global last_percent_reported
percent = int(count * blockSize * 100 / totalSize)
if last_percent_reported != percent:
if percent …Run Code Online (Sandbox Code Playgroud) python ×2
duplicates ×1
function ×1
keras ×1
merge ×1
numpy ×1
pandas ×1
prediction ×1
probability ×1
python-3.x ×1
tensorflow ×1