Jah*_*yst 7 python csv excel python-2.7 pandas
我有一个.csv
包含多个表的文件.
使用熊猫,这将是拿到两个数据帧的最佳策略inventory
,并HPBladeSystemRack
从这个文件?
输入.csv
看起来像这样:
Inventory
System Name IP Address System Status
dg-enc05 Normal
dg-enc05_vc_domain Unknown
dg-enc05-oa1 172.20.0.213 Normal
HP BladeSystem Rack
System Name Rack Name Enclosure Name
dg-enc05 BU40
dg-enc05-oa1 BU40 dg-enc05
dg-enc05-oa2 BU40 dg-enc05
Run Code Online (Sandbox Code Playgroud)
到目前为止,我提出的最好的方法是将此.csv
文件转换为Excel工作簿(xlxs
),将表拆分为表并使用:
inventory = read_excel('path_to_file.csv', 'sheet1', skiprow=1)
HPBladeSystemRack = read_excel('path_to_file.csv', 'sheet2', skiprow=2)
Run Code Online (Sandbox Code Playgroud)
然而:
xlrd
模块.DSM*_*DSM 11
如果您事先知道表名,那么这样的事情:
df = pd.read_csv("jahmyst2.csv", header=None, names=range(3))
table_names = ["Inventory", "HP BladeSystem Rack", "Network Interface"]
groups = df[0].isin(table_names).cumsum()
tables = {g.iloc[0,0]: g.iloc[1:] for k,g in df.groupby(groups)}
Run Code Online (Sandbox Code Playgroud)
应该生成一个字典,其中键作为表名和值作为子表.
>>> list(tables)
['HP BladeSystem Rack', 'Inventory']
>>> for k,v in tables.items():
... print("table:", k)
... print(v)
... print()
...
table: HP BladeSystem Rack
0 1 2
6 System Name Rack Name Enclosure Name
7 dg-enc05 BU40 NaN
8 dg-enc05-oa1 BU40 dg-enc05
9 dg-enc05-oa2 BU40 dg-enc05
table: Inventory
0 1 2
1 System Name IP Address System Status
2 dg-enc05 NaN Normal
3 dg-enc05_vc_domain NaN Unknown
4 dg-enc05-oa1 172.20.0.213 Normal
Run Code Online (Sandbox Code Playgroud)
一旦你有了,你可以将列名设置为第一行等.