假设我有两个这样的DataFrame:
left = pd.DataFrame({'key1': ['foo', 'bar'], 'lval': [1, 2]})
right = pd.DataFrame({'key2': ['foo', 'bar'], 'rval': [4, 5]})
Run Code Online (Sandbox Code Playgroud)
我想合并它们,所以我尝试这样的事情:
pd.merge(left, right, left_on='key1', right_on='key2')
Run Code Online (Sandbox Code Playgroud)
而且我很高兴
key1 lval key2 rval
0 foo 1 foo 4
1 bar 2 bar 5
Run Code Online (Sandbox Code Playgroud)
但是我正在尝试使用join方法,我一直认为它非常相似.
left.join(right, on=['key1', 'key2'])
Run Code Online (Sandbox Code Playgroud)
我得到了这个:
//anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in _validate_specification(self)
406 if self.right_index:
407 if not ((len(self.left_on) == self.right.index.nlevels)):
--> 408 raise AssertionError()
409 self.right_on = [None] * n
410 elif self.right_on is not None:
AssertionError:
Run Code Online (Sandbox Code Playgroud)
我错过了什么?
我想在id字段上加入两个pandas数据帧,这是一个字符串uuid.我收到一个Value错误:
ValueError:您正在尝试合并object和int64列.如果您想继续,请使用pd.concat
代码如下.我试图将字段转换为字符串按照尝试合并2个数据帧但得到ValueError但错误仍然存在.请注意,pdf来自火花,dataframe.toPandas()而outputsPdf是从字典创建的.
pdf.id = pdf.id.apply(str)
outputsPdf.id = outputsPdf.id.apply(str)
inOutPdf = pdf.join(outputsPdf, on='id', how='left', rsuffix='fs')
pdf.dtypes
id object
time float64
height float32
dtype: object
outputsPdf.dtypes
id object
labels float64
dtype: object
Run Code Online (Sandbox Code Playgroud)
我该怎么调试呢?完全追溯:
ValueError Traceback (most recent call last)
<ipython-input-13-deb429dde9ad> in <module>()
61 pdf['id'] = pdf['id'].astype(str)
62 outputsPdf['id'] = outputsPdf['id'].astype(str)
---> 63 inOutPdf = pdf.join(outputsPdf, on=['id'], how='left', rsuffix='fs')
64
65 # idSparkDf = spark.createDataFrame(idPandasDf, schema=StructType([StructField('id', StringType(), True),
~/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
6334 …Run Code Online (Sandbox Code Playgroud)