通过分隔符拆分熊猫数据框中的多列

Question

通过分隔符拆分熊猫数据框中的多列

我有令人讨厌的调查数据，这些数据通过以下方式返回了多项选择题。它在一个 excel 表中大约有 60 列，响应从单个到多个被 / 分割。这是我到目前为止所拥有的，有没有办法更快地做到这一点，而不必为每个单独的列做这件事

data = {'q1': ['one', 'two', 'three'],
   'q2' : ['one/two/three', 'a/b/c', 'd/e/f'],
   'q3' : ['a/b/c', 'd/e/f','g/h/i']}

df = pd.DataFrame(data)

df[['q2a', 'q2b', 'q2c']]= df['q2'].str.split('/', expand = True, n=0)
df[['q3a', 'q3b', 'q3c']]= df['q2'].str.split('/', expand = True, n=0)

clean_df = df.drop(df[['q2', 'q3']], axis=1)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Erf*_*fan 6

我们可以使用 list comprehension with add_prefix，然后我们使用pd.concat将所有内容连接到您的最终 df ：

splits = [df[col].str.split(pat='/', expand=True).add_prefix(col) for col in df.columns]
clean_df = pd.concat(splits, axis=1)

Run Code Online (Sandbox Code Playgroud)

     q10  q20  q21    q22 q30 q31 q32
0    one  one  two  three   a   b   c
1    two    a    b      c   d   e   f
2  three    d    e      f   g   h   i

Run Code Online (Sandbox Code Playgroud)

如果您确实希望您的列名以字母为后缀，您可以使用以下命令执行以下操作string.ascii_lowercase：

from string import ascii_lowercase

dfs = []
for col in df.columns:
    d = df[col].str.split('/', expand=True)
    c = d.shape[1]
    d.columns = [col + l for l in ascii_lowercase[:c]]
    dfs.append(d)
    
clean_df = pd.concat(dfs, axis=1)

Run Code Online (Sandbox Code Playgroud)

     q1a  q2a  q2b    q2c q3a q3b q3c
0    one  one  two  three   a   b   c
1    two    a    b      c   d   e   f
2  three    d    e      f   g   h   i

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年前
查看次数：	389 次
最近记录：	5 年前