jea*_*elj 5 python merge type-conversion pandas
我正在尝试合并列df1, df2上的两个数据框Customer_ID。两者似乎都Customer_ID具有相同的数据类型(object)。
df1:
Customer_ID | Flag
12345 A
Run Code Online (Sandbox Code Playgroud)
df2:
Customer_ID | Transaction_Value
12345 258478
Run Code Online (Sandbox Code Playgroud)
当我合并两个表时:
new_df = df2.merge(df1, on='Customer_ID', how='left')
Run Code Online (Sandbox Code Playgroud)
对于某些Customer_ID,它起作用,而对于另一些,则无效。对于此示例,我将得到以下结果:
Customer_ID | Transaction_Value | Flag
12345 258478 NaN
Run Code Online (Sandbox Code Playgroud)
我检查了数据类型,它们是相同的:
df1.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 873353 entries, 0 to 873352
Data columns (total 2 columns):
Customer_ID 873353 non-null object
Flag 873353 non-null object
dtypes: object(2)
memory usage: 20.0+ MB
df2.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 873353 entries, 0 to 873352
Data columns (total 2 columns):
Customer_ID 873353 non-null object
Transaction_Value 873353 int64
dtypes: object(2)
memory usage: 20.0+ MB
Run Code Online (Sandbox Code Playgroud)
当我上传df1时,确实收到了以下消息:
C:\Users\xxx\AppData\Local\Continuum\Anaconda2\lib\site-packages\IPython\core\interactiveshell.py:2717: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)
Run Code Online (Sandbox Code Playgroud)
当我想检查是否存在客户ID时,我意识到必须在两个数据框中以不同的方式指定它。
df1.loc[df1['Customer_ID'] == 12345]
df2.loc[df2['Customer_ID'] == '12345']
Run Code Online (Sandbox Code Playgroud)
Customer_ID是dtype==object在这两种情况下......但是,这并不意味着单个元素都是同一类型。您需要同时str或int
使用 int
dtype = dict(Customer_ID=int)
df1.astype(dtype).merge(df2.astype(dtype), 'left')
Customer_ID Flag Transaction_Value
0 12345 A 258478
Run Code Online (Sandbox Code Playgroud)
使用 str
dtype = dict(Customer_ID=str)
df1.astype(dtype).merge(df2.astype(dtype), 'left')
Customer_ID Flag Transaction_Value
0 12345 A 258478
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6241 次 |
| 最近记录: |