Gia*_*chs 4 python python-2.7 pandas python-unicode pandas-datareader
我在从 Excel 文件读取数据时遇到一些问题。\nExcel 文件包含带有 unicode 字符的列名称。
\n\n由于一些自动化原因,我需要将usecols参数传递给 pandas.read_excel 函数。
\n\n问题是,当我不使用usecols参数时,数据加载时不会出现错误。
\n\n这是代码:
\n\nimport pandas as pd\n\ndf = pd.read_excel(file)\ndf.colums\n\nIndex([u\'col1\', u\'col2\', u\'col3\', u\'col with unicode \xc3\xa0\', u\'col4\'], dtype=\'object\')\nRun Code Online (Sandbox Code Playgroud)\n\n如果我使用 usecols:
\n\nCOLUMNS = [\'col1\', \'col2\', \'col with unicode \xc3\xa0\']\ndf = pd.read_excel(file, usecols = COLUMNS)\nRun Code Online (Sandbox Code Playgroud)\n\n我收到以下错误:
\n\nValueError: Usecols do not match columns, columns expected but not found: [\'col with unicode \\xc3\\xa0\']\nRun Code Online (Sandbox Code Playgroud)\n\n使用encoding = \'utf-8\'read_excel 作为参数并不能解决问题,并且还对 COLUMNS 元素进行编码。
编辑:这里是完整的错误窗口。
\n\n ---------------------------------------------------------------------------\nValueError Traceback (most recent call last)\n<ipython-input-22-541ccb88da6a> in <module>()\n 2 df = pd.read_excel(file)\n 3 cols = df.columns\n----> 4 df = pd.read_excel(file, usecols = [\'col1\', \'col2\', \'col with unicode \xc3\xa0\'])\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\util\\_decorators.pyc in wrapper(*args, **kwargs)\n 186 else:\n 187 kwargs[new_arg_name] = new_arg_value\n--> 188 return func(*args, **kwargs)\n 189 return wrapper\n 190 return _deprecate_kwarg\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\util\\_decorators.pyc in wrapper(*args, **kwargs)\n 186 else:\n 187 kwargs[new_arg_name] = new_arg_value\n--> 188 return func(*args, **kwargs)\n 189 return wrapper\n 190 return _deprecate_kwarg\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\excel.pyc in read_excel(io, sheet_name, header, names, index_col, parse_cols, usecols, squeeze, dtype, engine, converters, true_values, false_values, skiprows, nrows, na_values, keep_default_na, verbose, parse_dates, date_parser, thousands, comment, skip_footer, skipfooter, convert_float, mangle_dupe_cols, **kwds)\n 373 convert_float=convert_float,\n 374 mangle_dupe_cols=mangle_dupe_cols,\n--> 375 **kwds)\n 376 \n 377 \n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\excel.pyc in parse(self, sheet_name, header, names, index_col, usecols, squeeze, converters, true_values, false_values, skiprows, nrows, na_values, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)\n 716 convert_float=convert_float,\n 717 mangle_dupe_cols=mangle_dupe_cols,\n--> 718 **kwds)\n 719 \n 720 @property\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\excel.pyc in parse(self, sheet_name, header, names, index_col, usecols, squeeze, dtype, true_values, false_values, skiprows, nrows, na_values, verbose, parse_dates, date_parser, thousands, comment, skipfooter, convert_float, mangle_dupe_cols, **kwds)\n 599 usecols=usecols,\n 600 mangle_dupe_cols=mangle_dupe_cols,\n--> 601 **kwds)\n 602 \n 603 output[asheetname] = parser.read(nrows=nrows)\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in TextParser(*args, **kwds)\n 2154 """\n 2155 kwds[\'engine\'] = \'python\'\n-> 2156 return TextFileReader(*args, **kwds)\n 2157 \n 2158 \n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in __init__(self, f, engine, **kwds)\n 893 self.options[\'has_index_names\'] = kwds[\'has_index_names\']\n 894 \n--> 895 self._make_engine(self.engine)\n 896 \n 897 def close(self):\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in _make_engine(self, engine)\n 1130 \' "c", "python", or\' \' "python-fwf")\'.format(\n 1131 engine=engine))\n-> 1132 self._engine = klass(self.f, **self.options)\n 1133 \n 1134 def _failover_to_python(self):\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in __init__(self, f, **kwds)\n 2236 self._col_indices = None\n 2237 (self.columns, self.num_original_columns,\n-> 2238 self.unnamed_cols) = self._infer_columns()\n 2239 \n 2240 # Now self.columns has the set of columns that we will process.\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in _infer_columns(self)\n 2609 columns = [names]\n 2610 else:\n-> 2611 columns = self._handle_usecols(columns, columns[0])\n 2612 else:\n 2613 try:\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in _handle_usecols(self, columns, usecols_key)\n 2669 col_indices.append(usecols_key.index(col))\n 2670 except ValueError:\n-> 2671 _validate_usecols_names(self.usecols, usecols_key)\n 2672 else:\n 2673 col_indices.append(col)\n\nC:\\Users\\GiacomoSachs\\Anaconda2\\lib\\site-packages\\pandas\\io\\parsers.pyc in _validate_usecols_names(usecols, names)\n 1235 raise ValueError(\n 1236 "Usecols do not match columns, "\n-> 1237 "columns expected but not found: {missing}".format(missing=missing)\n 1238 )\n 1239 \n\nValueError: Usecols do not match columns, columns expected but not found: [\'col with unicode \\xc3\\xa0\']\nRun Code Online (Sandbox Code Playgroud)\n
这些方法对于选择 Excel 列非常有效:
第一种情况使用数字,“A”列 = 0,“B”列 = 1 等。
df = pd.read_excel("filename.xlsx",usecols= range(0,5))
第二种使用字母的情况:
df = pd.read_excel("filename.xlsx",usecols= "A, C, E:J")
首先阅读类似的专栏
df = pd.read_excel(file, usecols="A:D")
Run Code Online (Sandbox Code Playgroud)
其中 A:D 是 Excel 中您想要读取的列的范围,然后像这样重命名您的列
df.columns = ['col1', 'col2', 'col3', 'col4']
Run Code Online (Sandbox Code Playgroud)
然后相应地访问列
| 归档时间: |
|
| 查看次数: |
43677 次 |
| 最近记录: |