Alp*_*001 5 python dataframe pandas
我已经开始学习大熊猫了,偶然发现了以下问题:
以下是一个包含以下数据的表:
书:
B_IDX B_NAME B_AUTHOR B_PRICE B_UTYPE B_ID
1 ABC aaa 12.21 SCI 182
2 BCD bbb 98 ECN 920
3 CDE ccc 22.34 SCI 228
4 DEF ddd 44.11 LIT 761
5 EFG eee 0.99 MAT 10242
6 FGH fff 4.99 MAT 77721
Run Code Online (Sandbox Code Playgroud)
UCODE:
U_ID U_CD
182 9982825
950 9992822
228 9999983
776 9912876
332 9003931
Run Code Online (Sandbox Code Playgroud)
要求是使用if..else逻辑从上面提到的表中提取数据.
请求:
if B_UTYPE == 'SCI':
pull the record from 'UCODE'
elif B_UTYPE == 'MAT':
split the B_ID in 4 and 1 digits i.e. B_UTYPE.split[:2] and B_UTYPE.split[3:5]
else:
keep the data as it is.
Run Code Online (Sandbox Code Playgroud)
O/P例外:
B_ID B_NAME B_AUTHOR B_PRICE B_UTYPE B_ID U_ID U_CD N_COL1 N_COL2
1 ABC aaa 12.21 SCI 182 182 9982825 NA NA
2 BCD bbb 98 ECN 920 NA NA NA NA
3 CDE ccc 22.34 SCI 228 228 9999983 NA NA
4 DEF ddd 44.11 LIT 761 NA NA NA NA
5 EFG eee 0.99 MAT 10242 NA NA 102 42
6 FGH fff 4.99 MAT 77721 NA NA 777 21
Run Code Online (Sandbox Code Playgroud)
任何帮助/教程,我可以通过满足上述条件获得一些见解以达到预期的输出?
为了便于阅读,请分别构建每个结果,然后将各个部分连接在一起.
u_id = df.B_ID.astype(str).where(df.B_UTYPE.eq('SCI'))
u_cd = df.B_ID.map(ucode.set_index('U_ID').U_CD.astype(str))
ncol = (df.B_ID.astype(str)
.str.extract(r'(\d{3})(\d+)')
.where(df.B_UTYPE.eq('MAT'))
.rename(columns=lambda x: f'N_COL{x+1}'))
df = pd.concat([df, u_id, u_cd, ncol], axis=1)
print(df)
B_IDX B_NAME B_AUTHOR B_PRICE B_UTYPE B_ID B_ID B_ID N_COL1 N_COL2
0 1 ABC aaa 12.21 SCI 182 182 9982825 NaN NaN
1 2 BCD bbb 98.00 ECN 920 NaN NaN NaN NaN
2 3 CDE ccc 22.34 SCI 228 228 9999983 NaN NaN
3 4 DEF ddd 44.11 LIT 761 NaN NaN NaN NaN
4 5 EFG eee 0.99 MAT 10242 NaN NaN 102 42
5 6 FGH fff 4.99 MAT 77721 NaN NaN 777 21
Run Code Online (Sandbox Code Playgroud)
这是一个两步的方法.首先,您需要确定哪些行匹配哪个条件.然后,一旦有了条件和输出,就可以使用蒙版并将assign系列添加到DataFrame中.
c1 = book.B_UTYPE.eq("SCI")
c2 = book.B_UTYPE.eq("MAT")
s1 = book.B_ID.map(ucode.set_index('U_ID').U_CD)
s2 = book.B_ID.astype(str)
Run Code Online (Sandbox Code Playgroud)
现在是有趣的部分:
parts = {
'U_ID': book.B_ID.mask(~c1),
'U_CD': pd.Series(s1).mask(~c1),
'N_COL1': s2.str[:3].mask(~c2),
'N_COL2': s2.str[3:].mask(~c2)
}
book.assign(**parts)
Run Code Online (Sandbox Code Playgroud)
ID B_NAME B_AUTHOR B_PRICE B_UTYPE B_ID U_ID U_CD N_COL1 N_COL2
0 1 ABC aaa 12.21 SCI 182 182.0 9982825.0 NaN NaN
1 2 BCD bbb 98.00 ECN 920 NaN NaN NaN NaN
2 3 CDE ccc 22.34 SCI 228 228.0 9999983.0 NaN NaN
3 4 DEF ddd 44.11 LIT 761 NaN NaN NaN NaN
4 5 EFG eee 0.99 MAT 10242 NaN NaN 102 42
5 6 FGH fff 4.99 MAT 77721 NaN NaN 777 21
Run Code Online (Sandbox Code Playgroud)
设置,以便您可以重现:
book = pd.DataFrame({'ID': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6},
'B_NAME': {0: 'ABC', 1: 'BCD', 2: 'CDE', 3: 'DEF', 4: 'EFG', 5: 'FGH'},
'B_AUTHOR': {0: 'aaa', 1: 'bbb', 2: 'ccc', 3: 'ddd', 4: 'eee', 5: 'fff'},
'B_PRICE': {0: 12.21, 1: 98.0, 2: 22.34, 3: 44.11, 4: 0.99, 5: 4.99},
'B_UTYPE': {0: 'SCI', 1: 'ECN', 2: 'SCI', 3: 'LIT', 4: 'MAT', 5: 'MAT'},
'B_ID': {0: 182, 1: 920, 2: 228, 3: 761, 4: 10242, 5: 77721}})
ucode = pd.DataFrame({'U_ID': {0: 182, 1: 950, 2: 228, 3: 776, 4: 332},
'U_CD': {0: 9982825, 1: 9992822, 2: 9999983, 3: 9912876, 4: 9003931}})
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
111 次 |
| 最近记录: |