从多重索引中获取键?

cjm*_*671 4 levels multi-index dataframe python-3.x pandas

我有一个带有 ID 和日期的 MultiIndex,其形式为:

MultiIndex(levels=[[196003, 196005, 196007, 196009, 196012, 196103, 196105, 196107, 196109, 196112, 196203, 196205, 196207, 196209, 196212, 196303, 196305, 196307, 196309, 196312, 196403, 196405, 196407, 196409, 196412, 201705, 201707, 201709, 201712, 201803, 201805, 201807, 201809, 201812], ['1959-07-01', '1959-07-02', '1959-07-06', '1959-07-07', '1959-07-08', '1959-07-09', '1959-07-10', '1959-07-13', '1959-07-14', '1959-07-15', '1959-07-16', '1959-07-17', '1959-07-20', '1959-07-21', '1959-07-22', '1959-07-23', ...]])
Run Code Online (Sandbox Code Playgroud)

ID 和日期都需要唯一指定一行。

我想要做的是提取索引的第一级。

当我这样做时df.index[0],我得到一个形式的元组(196003, '1959-07-01')

我想要的是一系列[196003, 196005, ...]0 级形式的键。

我设法得到它:

list(df[~df['ID'].duplicated()]['ID'].sort_values().reset_index()['ID'])

但我认为这是一个混乱且缓慢的解决方案。

熊猫之路是什么?

jez*_*ael 5

我认为你可以get_level_values使用unique

\n\n
import pandas as pd\n\ndf = pd.DataFrame({\'ID\':[1,1,3],\n                   \'Dates\':[\'2015-01-01\',\'2015-01-01\',\'2015-02-01\'],\n                   \'C\':[7,8,9]})\ndf[\'Dates\'] = pd.to_datetime(df.Dates)\ndf.set_index([\'ID\', \'Dates\'], inplace=True)\nprint (df)\n               C\nID Dates        \n1  2015-01-01  7\n   2015-01-01  8\n3  2015-02-01  9\n\nprint (df.index.get_level_values(\'ID\').unique().tolist())\n[1, 3]\n\n#another a bit slowier solution\nprint (df.index.get_level_values(\'ID\').drop_duplicates().tolist())\n[1, 3]\n
Run Code Online (Sandbox Code Playgroud)\n\n

时间

\n\n
In [134]: %timeit (orig(df1))\n1000 loops, best of 3: 1.54 ms per loop\n\nIn [138]: %timeit (df.index.get_level_values(\'ID\').unique().tolist())\n10000 loops, best of 3: 131 \xc2\xb5s per loop\n\nIn [139]: %timeit (df.index.get_level_values(\'ID\').drop_duplicates().tolist())\n10000 loops, best of 3: 182 \xc2\xb5s per loop\n
Run Code Online (Sandbox Code Playgroud)\n\n

计时代码

\n\n

len(df) = 3k

\n\n
import pandas as pd\n\ndf = pd.DataFrame({\'ID\':[1,1,3],\n                   \'Dates\':[\'2015-01-01\',\'2015-01-01\',\'2015-02-01\'],\n                   \'C\':[7,8,9]})\ndf = pd.concat([df]*1000).reset_index(drop=True)\ndf[\'Dates\'] = pd.to_datetime(df.Dates)\ndf.set_index([\'ID\', \'Dates\'], inplace=True)\nprint (df)\n\n\ndf1 = df.copy()\ndf1.reset_index(\'ID\', inplace=True)\n\ndef orig(df):\n\n    return list(df[~df[\'ID\'].duplicated()][\'ID\'].sort_values().reset_index()[\'ID\'])\n\nprint (df.index.get_level_values(\'ID\').unique().tolist())\n\nprint (orig(df1))\n\nprint (df.index.get_level_values(\'ID\').drop_duplicates().tolist())\n
Run Code Online (Sandbox Code Playgroud)\n