the*_*wla 38 python dataframe pandas
我有一个pandas数据帧"df".在这个数据框中,我有多个列,其中一个我必须子串.让我们说列名是"col".我可以运行像下面这样的"for"循环并对该列进行子串:
for i in range(0,len(df)):
df.iloc[i].col = df.iloc[i].col[:9]
Run Code Online (Sandbox Code Playgroud)
但我想知道,如果有一个选项,我不必使用"for"循环,并直接使用属性.我有大量的数据,如果我这样做,数据将采取很长一段时间的过程.
ayh*_*han 73
使用str.slice:
df.col = df.col.str.slice(0, 9)
Run Code Online (Sandbox Code Playgroud)
你也可以使用它[],它使用水下切片:
df.col = df.col.str[:9]
Run Code Online (Sandbox Code Playgroud)
小智 16
如果该列不是字符串,请使用以下方法astype将其转换:
df['col'] = df['col'].astype(str).str[:9]
Run Code Online (Sandbox Code Playgroud)
Gon*_*ica 12
由于不确切知道 OP 的数据帧,因此可以创建一个用作测试。
df = pd.DataFrame({'col': {0: '2020-12-08', 1: '2020-12-08', 2: '2020-12-08', 3: '2020-12-08', 4: '2020-12-08', 5: '2020-12-08', 6: '2020-12-08', 7: '2020-12-08', 8: '2020-12-08', 9: '2020-12-08'}})
[Out]:
col
0 2020-12-08
1 2020-12-08
2 2020-12-08
3 2020-12-08
4 2020-12-08
5 2020-12-08
6 2020-12-08
7 2020-12-08
8 2020-12-08
9 2020-12-08
Run Code Online (Sandbox Code Playgroud)
假设想要将列存储在同一个数据帧中df,并且我们只想保留 4 个字符,在名为 的列上col_substring,可以执行多种选项。
选项1
df['col_substring'] = df['col'].str[:4]
[Out]:
col col_substring
0 2020-12-08 2020
1 2020-12-08 2020
2 2020-12-08 2020
3 2020-12-08 2020
4 2020-12-08 2020
5 2020-12-08 2020
6 2020-12-08 2020
7 2020-12-08 2020
8 2020-12-08 2020
9 2020-12-08 2020
Run Code Online (Sandbox Code Playgroud)
选项2
df['col_substring'] = df['col'].str.slice(0, 4)
[Out]:
col col_substring
0 2020-12-08 2020
1 2020-12-08 2020
2 2020-12-08 2020
3 2020-12-08 2020
4 2020-12-08 2020
5 2020-12-08 2020
6 2020-12-08 2020
7 2020-12-08 2020
8 2020-12-08 2020
9 2020-12-08 2020
Run Code Online (Sandbox Code Playgroud)
或者像这样
df['col_substring'] = df['col'].str.slice(stop=4)
Run Code Online (Sandbox Code Playgroud)
选项3
使用自定义 lambda 函数
df['col_substring'] = df['col'].apply(lambda x: x[:4])
[Out]:
col col_substring
0 2020-12-08 2020
1 2020-12-08 2020
2 2020-12-08 2020
3 2020-12-08 2020
4 2020-12-08 2020
5 2020-12-08 2020
6 2020-12-08 2020
7 2020-12-08 2020
8 2020-12-08 2020
9 2020-12-08 2020
Run Code Online (Sandbox Code Playgroud)
选项4
使用带有正则表达式的自定义 lambda 函数 (with re)
import re
df['col_substring'] = df['col'].apply(lambda x: re.findall(r'^.{4}', x)[0])
[Out]:
col col_substring
0 2020-12-08 2020
1 2020-12-08 2020
2 2020-12-08 2020
3 2020-12-08 2020
4 2020-12-08 2020
5 2020-12-08 2020
6 2020-12-08 2020
7 2020-12-08 2020
8 2020-12-08 2020
9 2020-12-08 2020
Run Code Online (Sandbox Code Playgroud)
选项5
df['col_substring'] = np.vectorize(lambda x: x[:4])(df['col'])
[Out]:
col col_substring
0 2020-12-08 2020
1 2020-12-08 2020
2 2020-12-08 2020
3 2020-12-08 2020
4 2020-12-08 2020
5 2020-12-08 2020
6 2020-12-08 2020
7 2020-12-08 2020
8 2020-12-08 2020
9 2020-12-08 2020
Run Code Online (Sandbox Code Playgroud)
笔记:
| 归档时间: |
|
| 查看次数: |
82190 次 |
| 最近记录: |