即使密钥存在，Pandas groupby 也会给出“keyError”

Question

即使密钥存在，Pandas groupby 也会给出“keyError”

S.K*_*S.K 9 python csv json pandas keyerror

我是 Python 新手，对于我的一个项目，我需要将 csv 转换为嵌套 Json。在网上搜索，我发现pandas对这种情况很有帮助。我按照将CSV 数据转换为 Python 中的嵌套 JSON 中给出的方法进行操作，但我收到了 keyError 异常KeyError: 'state'

df info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
country    4 non-null object
 state     4 non-null object
 city      4 non-null object
dtypes: object(3)
memory usage: 176.0+ bytes
None
Traceback (most recent call last):
  File "csvToJson.py", line 31, in <module>
    grouped = df.groupby(['country', 'state'])
  File "/home/simarpreet/Envs/j/lib/python3.7/site-packages/pandas/core/generic.py", line 7632, in groupby
    observed=observed, **kwargs)
  File "/home/simarpreet/Envs/j/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2110, in groupby
    return klass(obj, by, **kwds)
  File "/home/simarpreet/Envs/j/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 360, in __init__
    mutated=self.mutated)
  File "/home/simarpreet/Envs/j/lib/python3.7/site-packages/pandas/core/groupby/grouper.py", line 578, in _get_grouper
    raise KeyError(gpr)
KeyError: 'state'

Run Code Online (Sandbox Code Playgroud)

输入 csv：

country, state, city
India, Delhi, Tilak nagar
India, Mumbai, Bandra
Australia, Queensland, Gold Coast
US, California, Los Angeles

Run Code Online (Sandbox Code Playgroud)

我的代码：

csvFilePath = "/home/simarpreet/sampleCsv.csv"
jsonFilePath = "/home/simarpreet/sampleJson.json"
jsonFile = open(jsonFilePath, 'w')

df = pd.read_csv(csvFilePath, encoding='utf-8-sig')
print("df info")
print(df.info())
finalList = []

grouped = df.groupby(['country', 'state'])
for key, value in grouped:
    dictionary = {}

    j = grouped.get_group(key).reset_index(drop=True)
    dictionary['country'] = j.at[0, 'country']
    dictionary['state'] = j.at[0, 'state']

    dictList = []
    anotherDict = {}
    for i in j.index:

        anotherDict['city'] = j.at[i, 'city']

        dictList.append(anotherDict)

    dictionary['children'] = dictList

    finalList.append(dictionary)

json.dumps(finalList)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Yas*_*uan 9

问题在于您的 csv 文件，列名称中有前导空格，因此出现了关键错误。

正如 @cs95 所指出的，你可以这样做

df.columns = df.columns.str.strip()

Run Code Online (Sandbox Code Playgroud)

或者您可以使用 read_csv 来处理空格：

pd.read_csv(csvFilePath, encoding='utf-8-sig', sep='\s*,\s*', engine='python')

PS：不好的处理方法：

grouped = df.groupby(['country', ' state'])

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，8 月前
查看次数：	37255 次
最近记录：	6 年，8 月前