Pandas 将列拆分为 str 和 int 列

Question

Pandas 将列拆分为 str 和 int 列

我目前正在尝试将 pandas 数据框中的一列拆分为 2 列，其中 1 列作为 int，另一列作为 string。据我了解，为了能够将一列拆分为两列，可以使用以下代码（其中 A 是要拆分为整数列和字符串列的列）：

df[['integer','string']] = df['A'].str.split(" ",expand=True,)

然而，我的数据集的问题是整数和字符串之间没有空格或“-”作为分割列的指示符。我的数据框的示例如下：

A     B 
3     abc
629S  def
84S   ghi  
S72   jkl

Run Code Online (Sandbox Code Playgroud)

正如所见，并非所有行都有字母，并且 int 不一定位于字母之前。我的预期输出如下：

integer      string      B
3            NaN         abc
629          S           def
84           S           ghi
72           S           jkl

Run Code Online (Sandbox Code Playgroud)

非常感谢你的帮助！真的很感激:)

Answer 1

Dat*_*ice 3

IIUC，您需要str.extract您的用例看起来很简单，所以我们可以利用\D+&\d+

D+ matches any character that's not a digit (equal to [^0-9])

\d+ matches a digit (equal to [0-9])

df['String'] = df['A'].str.extract('(\D+)') 

df['A'] = df['A'].str.extract('(\d+)').astype(int)

print(df.rename(columns={'A' : 'Integer'}))


   Integer    B String
0        3  abc    NaN
1      629  def      S
2       84  ghi      S
3       72  jkl      S

print(df.dtypes)

Integer     int32
B          object
String     object
dtype: object

Run Code Online (Sandbox Code Playgroud)

如果您NaN的专栏中有：

d = """A     B 
3     abc
629S  def
84S   ghi  
Sss   jkl"""

from io import StringIO

df = pd.read_csv(StringIO(d),sep='\s+')

df['A'] = df['A'].str.extract('(\d+)').astype(float)

print(df)

       A    B
0    3.0  abc
1  629.0  def
2   84.0  ghi
3    NaN  jkl

Run Code Online (Sandbox Code Playgroud)

或者

df['A'] = pd.to_numeric(df['A'].str.extract('(\d+)')[0],errors='coerce')
print(df)

       A    B
0    3.0  abc
1  629.0  def
2   84.0  ghi
3    NaN  jkl

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，6 月前
查看次数：	2897 次
最近记录：	5 年，6 月前