使用 usecols 时 pandas.read_excel 错误

Gia*_*chs 4 python python-2.7 pandas python-unicode pandas-datareader

我在从 Excel 文件读取数据时遇到一些问题。\nExcel 文件包含带有 unicode 字符的列名称。

\n\n

由于一些自动化原因,我需要将usecols参数传递给 pandas.read_excel 函数。

\n\n

问题是,当我不使用usecols参数时,数据加载时不会出现错误。

\n\n

这是代码:

\n\n
import pandas as pd\n\ndf = pd.read_excel(file)\ndf.colums\n\nIndex([u\'col1\', u\'col2\', u\'col3\', u\'col with unicode \xc3\xa0\', u\'col4\'], dtype=\'object\')\n
Run Code Online (Sandbox Code Playgroud)\n\n

如果我使用 usecols:

\n\n
COLUMNS = [\'col1\', \'col2\', \'col with unicode \xc3\xa0\']\ndf = pd.read_excel(file, usecols = COLUMNS)\n
Run Code Online (Sandbox Code Playgroud)\n\n

我收到以下错误:

\n\n
ValueError: Usecols do not match columns, columns expected but not found: [\'col with unicode \\xc3\\xa0\']\n
Run Code Online (Sandbox Code Playgroud)\n\n

使用encoding = \'utf-8\'read_excel 作为参数并不能解决问题,并且还对 COLUMNS 元素进行编码。

\n\n

编辑:这里是完整的错误窗口。

\n\n
 ---------------------------------------------------------------------------\nValueError                                Traceback (most recent call last)\n<ipython-input-22-541ccb88da6a> in <module>()\n      2 df = pd.read_excel(file)\n      3 cols = df.columns\n----> 4 df = pd.read_excel(file, usecols = [\'col1\', \'col2\', \'col with unicode \xc3\xa0\'])\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\util\\_decorators.pyc in wrapper(*args, **kwargs)\n    186                 else:\n    187                     kwargs[new_arg_name] = new_arg_value\n--> 188             return func(*args, **kwargs)\n    189         return wrapper\n    190     return _deprecate_kwarg\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\util\\_decorators.pyc in wrapper(*args, **kwargs)\n    186                 else:\n    187                     kwargs[new_arg_name] = new_arg_value\n--> 188             return func(*args, **kwargs)\n    189         return wrapper\n    190     return _deprecate_kwarg\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\excel.pyc in read_excel(io, sheet_name, header, names, index_col, parse_cols, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skip_footer, skipfooter, convert_float, mangle_dupe_cols, **kwds)\n    373         convert_float=convert_float,\n    374         mangle_dupe_cols=mangle_dupe_cols,\n--> 375         **kwds)\n    376 \n    377 \n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\excel.pyc in parse(self, sheet_name, header, names, index_col, usecols, squeeze, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)\n    716                                   convert_float=convert_float,\n    717                                   mangle_dupe_cols=mangle_dupe_cols,\n--> 718                                   **kwds)\n    719 \n    720     @property\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\excel.pyc in parse(self, sheet_name, header, names, index_col, usecols, squeeze, dtype, true_values, false_values, skiprows, nrows, na_values, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)\n    599                                     usecols=usecols,\n    600                                     mangle_dupe_cols=mangle_dupe_cols,\n--> 601                                     **kwds)\n    602 \n    603                 output[asheetname] = parser.read(nrows=nrows)\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in TextParser(*args, **kwds)\n   2154     """\n   2155     kwds[\'engine\'] = \'python\'\n-> 2156     return TextFileReader(*args, **kwds)\n   2157 \n   2158 \n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in __init__(self, f, engine, **kwds)\n    893             self.options[\'has_index_names\'] = kwds[\'has_index_names\']\n    894 \n--> 895         self._make_engine(self.engine)\n    896 \n    897     def close(self):\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in _make_engine(self, engine)\n   1130                                  \' "c", "python", or\' \' "python-fwf")\'.format(\n   1131                                      engine=engine))\n-> 1132             self._engine = klass(self.f, **self.options)\n   1133 \n   1134     def _failover_to_python(self):\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in __init__(self, f, **kwds)\n   2236         self._col_indices = None\n   2237         (self.columns, self.num_original_columns,\n-> 2238          self.unnamed_cols) = self._infer_columns()\n   2239 \n   2240         # Now self.columns has the set of columns that we will process.\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in _infer_columns(self)\n   2609                 columns = [names]\n   2610             else:\n-> 2611                 columns = self._handle_usecols(columns, columns[0])\n   2612         else:\n   2613             try:\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in _handle_usecols(self, columns, usecols_key)\n   2669                             col_indices.append(usecols_key.index(col))\n   2670                         except ValueError:\n-> 2671                             _validate_usecols_names(self.usecols, usecols_key)\n   2672                     else:\n   2673                         col_indices.append(col)\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in _validate_usecols_names(usecols, names)\n   1235         raise ValueError(\n   1236             "Usecols do not match columns, "\n-> 1237             "columns expected but not found: {missing}".format(missing=missing)\n   1238         )\n   1239 \n\nValueError: Usecols do not match columns, columns expected but not found: [\'col with unicode \\xc3\\xa0\']\n
Run Code Online (Sandbox Code Playgroud)\n

Pab*_*las 8

这些方法对于选择 Excel 列非常有效:

第一种情况使用数字,“A”列 = 0,“B”列 = 1 等。

df = pd.read_excel("filename.xlsx",usecols= range(0,5))

第二种使用字母的情况:

df = pd.read_excel("filename.xlsx",usecols= "A, C, E:J")


use*_*659 5

首先阅读类似的专栏

df = pd.read_excel(file, usecols="A:D")
Run Code Online (Sandbox Code Playgroud)

其中 A:D 是 Excel 中您想要读取的列的范围,然后像这样重命名您的列

df.columns = ['col1', 'col2', 'col3', 'col4']
Run Code Online (Sandbox Code Playgroud)

然后相应地访问列