Uma*_*tar 5 python data-analysis pandas
我想使用 python 将单元格中包含的数据拆分为多行。下面给出了这样的一个例子:
这是我的数据:
fuel cert_region veh_class air_pollution city_mpg hwy_mpg cmb_mpg smartway
ethanol/gas FC SUV 6/8 9/14 15/20 1/16 yes
ethanol/gas FC SUV 6/3 1/14 14/19 10/16 no
Run Code Online (Sandbox Code Playgroud)
我想把它转换成这种形式:
fuel cert_region veh_class air_pollution city_mpg hwy_mpg cmb_mpg smartway
ethanol FC SUV 6 9 15 1 yes
gas FC SUV 8 14 20 16 yes
ethanol FC SUV 6 1 14 10 no
gas FC SUV 3 14 19 16 no
Run Code Online (Sandbox Code Playgroud)
以下代码返回错误:
import numpy as np
from itertools import chain
# return list from series of comma-separated strings
def chainer(s):
return list(chain.from_iterable(s.str.split('/')))
# calculate lengths of splits
lens = df_08['fuel'].str.split('/').map(len)
# create new dataframe, repeating or chaining as appropriate
res = pd.DataFrame({
'cert_region': np.repeat(df_08['cert_region'], lens),
'veh_class': np.repeat(df_08['veh_class'], lens),
'smartway': np.repeat(df_08['smartway'], lens),
'fuel': chainer(df_08['fuel']),
'air_pollution': chainer(df_08['air_pollution']),
'city_mpg': chainer(df_08['city_mpg']),
'hwy_mpg': chainer(df_08['hwy_mpg']),
'cmb_mpg': chainer(df_08['cmb_mpg'])})
Run Code Online (Sandbox Code Playgroud)
它给了我这个错误:
TypeError Traceback (most recent call last)
<ipython-input-31-916fed75eee2> in <module>()
20 'fuel': chainer(df_08['fuel']),
21 'air_pollution_score': chainer(df_08['air_pollution_score']),
---> 22 'city_mpg': chainer(df_08['city_mpg']),
23 'hwy_mpg': chainer(df_08['hwy_mpg']),
24 'cmb_mpg': chainer(df_08['cmb_mpg']),
<ipython-input-31-916fed75eee2> in chainer(s)
4 # return list from series of comma-separated strings
5 def chainer(s):
----> 6 return list(chain.from_iterable(s.str.split('/')))
7
8 # calculate lengths of splits
TypeError: 'float' object is not iterable
Run Code Online (Sandbox Code Playgroud)
但city_mpg具有Object数据类型:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2404 entries, 0 to 2403
Data columns (total 14 columns):
fuel 2404 non-null object
cert_region 2404 non-null object
veh_class 2404 non-null object
air_pollution 2404 non-null object
city_mpg 2205 non-null object
hwy_mpg 2205 non-null object
cmb_mpg 2205 non-null object
smartway 2404 non-null object
Run Code Online (Sandbox Code Playgroud)
我认为你最好构建一个新的数据框
result = pd.DataFrame(columns=[your_columns])
for index, series in df_08.iterrows():
temp1 = {}
temp2 = {}
for key, value in dict(series).items():
if '/' in value:
val1, val2 = value.split('/')
temp1[key] = [val1]
temp2[key] = [val2]
else:
temp1[key] = temp2[key] = [value]
result = pd.concat([result, pd.DataFrame(data=temp1),
pd.DataFrame(data=temp2)], axis=0, ignore_index=True)
Run Code Online (Sandbox Code Playgroud)