如何有效地遍历熊猫中的日期列

Question

如何有效地遍历熊猫中的日期列

我有一个大型数据集，其中列的索引具有日期格式。为了解释我的问题，我正在构建一个类似的数据集，如下所示：

将熊猫导入为 pd

Cities = ['San Francisco', 'Los Angeles', 'New York', 'Huston', 'Chicago']
Jan = [10, 20, 15, 10, 35]
Feb = [12, 23, 17, 15, 41]
Mar = [15, 29, 21, 21, 53]
Apr = [27, 48, 56, 49, 73]

data = pd.DataFrame({'City': Cities, '01/01/20': Jan, '02/01/20': Feb, '03/01/20': Mar, '04/01/20': Apr})

print (data)

            City  01/01/20  02/01/20  03/01/20  04/01/20
0  San Francisco        10        12        15        27
1    Los Angeles        20        23        29        48
2       New York        15        17        21        56
3         Huston        10        15        21        49
4        Chicago        35        41        53        73

Run Code Online (Sandbox Code Playgroud)

我想将每个城市的数据绘制为时间的函数。这是我的尝试：

import matplotlib.pyplot as plt 

cols = data.columns 

dates = data.loc[:, cols[1:]].columns

San_Francisco = []
Los_Angeles = []
New_York = []
Huston = []
Chicago = []

for i in dates:
    San_Francisco.append(data[data['City'] == 'San Francisco'][i].sum())
    Los_Angeles.append(data[data['City'] == 'Los Angeles'][i].sum())
    New_York.append(data[data['City'] == 'New York'][i].sum())
    Huston.append(data[data['City'] == 'Huston'][i].sum())
    Chicago.append(data[data['City'] == 'Chicago'][i].sum())
    
plt.plot(dates, San_Francisco, label='San Francisco')
plt.plot(dates, Los_Angeles, label='Los Angeles')
plt.plot(dates, New_York, label='New York')
plt.plot(dates, Huston, label='Huston')
plt.plot(dates, Chicago, label='Chicago')
plt.legend()

Run Code Online (Sandbox Code Playgroud)

结果是我想要的，但是，对于大型数据集，我的方法效率不高。我怎样才能加快速度？同样对于绘图部分，我有一大排城市，手动硬编码名称很乏味；有没有更好的办法？

谢谢

Answer 1

jez*_*ael 5

如果可能，的某些值City首先被复制GroupBy.sum，然后由聚合，然后由转置DataFrame.T，最后由绘制DataFrame.plot：

data.groupby('City').sum().T.plot()

Run Code Online (Sandbox Code Playgroud)

如果列City始终具有唯一值，则可以使用DataFrame.set_index：

data.set_index("City").T.plot()

Run Code Online (Sandbox Code Playgroud)

编辑：

df = data.groupby('City').sum().T
    
N = 10
df.groupby(np.arange(len(df.columns)) // N, axis=1).plot()

Run Code Online (Sandbox Code Playgroud)

归档时间：	5 年，5 月前
查看次数：	87 次
最近记录：	5 年，5 月前