UnstructedURLLoader 无法看到 libmagic

mCs*_*mCs 3 langchain py-langchain

我尝试使用UnstructuredURLLoader如下

\n
from langchain.document_loaders import UnstructuredURLLoader\n\nloaders = UnstructuredURLLoader(urls=urls)\ndata = loaders.load()\n
Run Code Online (Sandbox Code Playgroud)\n

但有些页面报告说

\n
libmagic is unavailable but assists in filetype detection on file-like objects. Please consider installing libmagic for better results.\nError fetching or processing https://wellfound.com/company/chorus-one, exception: Invalid file. The FileType.UNK file type is not supported in partition.\n
Run Code Online (Sandbox Code Playgroud)\n

而在我的 conda 环境中我似乎拥有它

\n
%pip list | grep libmagic\nlibmagic                      1.0\n
Run Code Online (Sandbox Code Playgroud)\n

但我没有python-libmagic。当我尝试安装它时:

\n

pip install python-libmagic

\n

我不断收到错误:

\n
Collecting python-libmagic\n  Using cached python_libmagic-0.4.0-py3-none-any.whl\nCollecting cffi==1.7.0 (from python-libmagic)\n  Using cached cffi-1.7.0.tar.gz (400 kB)\n  Preparing metadata (setup.py) ... done\nRequirement already satisfied: pycparser in /opt/conda/envs/cho_env/lib/python3.10/site-packages (from cffi==1.7.0->python-libmagic) (2.21)\nBuilding wheels for collected packages: cffi\n  Building wheel for cffi (setup.py) ... error\n  error: subprocess-exited-with-error\n  \n  \xc3\x97 python setup.py bdist_wheel did not run successfully.\n  \xe2\x94\x82 exit code: 1\n  \xe2\x95\xb0\xe2\x94\x80> [254 lines of output]\n      running bdist_wheel\n      running build\n      running build_py\n      creating build\n      creating build/lib.linux-x86_64-cpython-310\n      creating build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/ffiplatform.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/cffi_opcode.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/verifier.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/commontypes.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/vengine_gen.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/setuptools_ext.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/vengine_cpy.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/recompiler.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/cparser.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/lock.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/backend_ctypes.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/__init__.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/model.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/api.py -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/_cffi_include.h -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/parse_c_type.h -> build/lib.linux-x86_64-cpython-310/cffi\n      copying cffi/_embedding.h -> build/lib.linux-x86_64-cpython-310/cffi\n      running build_ext\n      building '_cffi_backend' extension\n      creating build/temp.linux-x86_64-cpython-310\n      creating build/temp.linux-x86_64-cpython-310/c\n      gcc -pthread -B /opt/conda/envs/cho_env/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/envs/cho_env/include -fPIC -O2 -isystem /opt/conda/envs/cho_env/include -fPIC -DUSE__THREAD -I/usr/include/ffi -I/usr/include/libffi -I/opt/conda/envs/cho_env/include/python3.10 -c c/_cffi_backend.c -o build/temp.linux-x86_64-cpython-310/c/_cffi_backend.o\n      In file included from c/_cffi_backend.c:274:\n      c/minibuffer.h: In function \xe2\x80\x98mb_ass_slice\xe2\x80\x99:\n      c/minibuffer.h:66:5: warning: \xe2\x80\x98PyObject_AsReadBuffer\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n         66 |     if (PyObject_AsReadBuffer(other, &buffer, &buffer_len) < 0)\n            |     ^~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/genobject.h:12,\n                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:110,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/abstract.h:343:17: note: declared here\n        343 | PyAPI_FUNC(int) PyObject_AsReadBuffer(PyObject *obj,\n            |                 ^~~~~~~~~~~~~~~~~~~~~\n      In file included from c/_cffi_backend.c:277:\n      c/file_emulator.h: In function \xe2\x80\x98PyFile_AsFile\xe2\x80\x99:\n      c/file_emulator.h:54:14: warning: assignment discards \xe2\x80\x98const\xe2\x80\x99 qualifier from pointer target type [-Wdiscarded-qualifiers]\n         54 |         mode = PyText_AsUTF8(ob_mode);\n            |              ^\n      In file included from c/_cffi_backend.c:281:\n      c/wchar_helper.h: In function \xe2\x80\x98_my_PyUnicode_AsSingleWideChar\xe2\x80\x99:\n      c/wchar_helper.h:83:5: warning: \xe2\x80\x98PyUnicode_AsUnicode\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n         83 |     Py_UNICODE *u = PyUnicode_AS_UNICODE(unicode);\n            |     ^~~~~~~~~~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,\n                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here\n        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(\n            |                                             ^~~~~~~~~~~~~~~~~~~\n      In file included from c/_cffi_backend.c:281:\n      c/wchar_helper.h:84:5: warning: \xe2\x80\x98_PyUnicode_get_wstr_length\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n         84 |     if (PyUnicode_GET_SIZE(unicode) == 1) {\n            |     ^~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,\n                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here\n        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {\n            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~\n      In file included from c/_cffi_backend.c:281:\n      c/wchar_helper.h:84:5: warning: \xe2\x80\x98PyUnicode_AsUnicode\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n         84 |     if (PyUnicode_GET_SIZE(unicode) == 1) {\n            |     ^~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,\n                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here\n        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(\n            |                                             ^~~~~~~~~~~~~~~~~~~\n      In file included from c/_cffi_backend.c:281:\n      c/wchar_helper.h:84:5: warning: \xe2\x80\x98_PyUnicode_get_wstr_length\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n         84 |     if (PyUnicode_GET_SIZE(unicode) == 1) {\n            |     ^~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,\n                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here\n        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {\n            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~\n      In file included from c/_cffi_backend.c:281:\n      c/wchar_helper.h: In function \xe2\x80\x98_my_PyUnicode_SizeAsWideChar\xe2\x80\x99:\n      c/wchar_helper.h:99:5: warning: \xe2\x80\x98_PyUnicode_get_wstr_length\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n         99 |     Py_ssize_t length = PyUnicode_GET_SIZE(unicode);\n            |     ^~~~~~~~~~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,\n                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here\n        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {\n            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~\n      In file included from c/_cffi_backend.c:281:\n      c/wchar_helper.h:99:5: warning: \xe2\x80\x98PyUnicode_AsUnicode\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n         99 |     Py_ssize_t length = PyUnicode_GET_SIZE(unicode);\n            |     ^~~~~~~~~~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,\n                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here\n        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(\n            |                                             ^~~~~~~~~~~~~~~~~~~\n      In file included from c/_cffi_backend.c:281:\n      c/wchar_helper.h:99:5: warning: \xe2\x80\x98_PyUnicode_get_wstr_length\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n         99 |     Py_ssize_t length = PyUnicode_GET_SIZE(unicode);\n            |     ^~~~~~~~~~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,\n                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here\n        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {\n            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~\n      In file included from c/_cffi_backend.c:281:\n      c/wchar_helper.h: In function \xe2\x80\x98_my_PyUnicode_AsWideChar\xe2\x80\x99:\n      c/wchar_helper.h:118:5: warning: \xe2\x80\x98PyUnicode_AsUnicode\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n        118 |     Py_UNICODE *u = PyUnicode_AS_UNICODE(unicode);\n            |     ^~~~~~~~~~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,\n                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here\n        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(\n            |                                             ^~~~~~~~~~~~~~~~~~~\n      c/_cffi_backend.c: In function \xe2\x80\x98ctypedescr_dealloc\xe2\x80\x99:\n      c/_cffi_backend.c:352:23: error: lvalue required as left operand of assignment\n        352 |         Py_REFCNT(ct) = 43;\n            |                       ^\n      c/_cffi_backend.c:355:23: error: lvalue required as left operand of assignment\n        355 |         Py_REFCNT(ct) = 0;\n            |                       ^\n      c/_cffi_backend.c: In function \xe2\x80\x98cast_to_integer_or_char\xe2\x80\x99:\n      c/_cffi_backend.c:3331:26: warning: \xe2\x80\x98_PyUnicode_get_wstr_length\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n       3331 |                          PyUnicode_GET_SIZE(ob), ct->ct_name);\n            |                          ^~~~~~~~~~~~~~~~~~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,\n                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here\n        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {\n            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~\n      c/_cffi_backend.c:3331:26: warning: \xe2\x80\x98PyUnicode_AsUnicode\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n       3331 |                          PyUnicode_GET_SIZE(ob), ct->ct_name);\n            |                          ^~~~~~~~~~~~~~~~~~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,\n                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:580:45: note: declared here\n        580 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_UNICODE *) PyUnicode_AsUnicode(\n            |                                             ^~~~~~~~~~~~~~~~~~~\n      c/_cffi_backend.c:3331:26: warning: \xe2\x80\x98_PyUnicode_get_wstr_length\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n       3331 |                          PyUnicode_GET_SIZE(ob), ct->ct_name);\n            |                          ^~~~~~~~~~~~~~~~~~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,\n                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:446:26: note: declared here\n        446 | static inline Py_ssize_t _PyUnicode_get_wstr_length(PyObject *op) {\n            |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~\n      c/_cffi_backend.c: In function \xe2\x80\x98b_complete_struct_or_union\xe2\x80\x99:\n      c/_cffi_backend.c:4251:17: warning: \xe2\x80\x98PyUnicode_GetSize\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n       4251 |                 do_align = PyText_GetSize(fname) > 0;\n            |                 ^~~~~~~~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:177:43: note: declared here\n        177 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_ssize_t) PyUnicode_GetSize(\n            |                                           ^~~~~~~~~~~~~~~~~\n      c/_cffi_backend.c:4283:13: warning: \xe2\x80\x98PyUnicode_GetSize\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n       4283 |             if (PyText_GetSize(fname) == 0 &&\n            |             ^~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:177:43: note: declared here\n        177 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_ssize_t) PyUnicode_GetSize(\n            |                                           ^~~~~~~~~~~~~~~~~\n      c/_cffi_backend.c:4353:17: warning: \xe2\x80\x98PyUnicode_GetSize\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n       4353 |                 if (PyText_GetSize(fname) > 0) {\n            |                 ^~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:177:43: note: declared here\n        177 | Py_DEPRECATED(3.3) PyAPI_FUNC(Py_ssize_t) PyUnicode_GetSize(\n            |                                           ^~~~~~~~~~~~~~~~~\n      c/_cffi_backend.c: In function \xe2\x80\x98prepare_callback_info_tuple\xe2\x80\x99:\n      c/_cffi_backend.c:5214:5: warning: \xe2\x80\x98PyEval_InitThreads\xe2\x80\x99 is deprecated [-Wdeprecated-declarations]\n       5214 |     PyEval_InitThreads();\n            |     ^~~~~~~~~~~~~~~~~~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/Python.h:130,\n                       from c/_cffi_backend.c:2:\n      /opt/conda/envs/cho_env/include/python3.10/ceval.h:122:37: note: declared here\n        122 | Py_DEPRECATED(3.9) PyAPI_FUNC(void) PyEval_InitThreads(void);\n            |                                     ^~~~~~~~~~~~~~~~~~\n      c/_cffi_backend.c: In function \xe2\x80\x98b_callback\xe2\x80\x99:\n      c/_cffi_backend.c:5255:5: warning: \xe2\x80\x98ffi_prep_closure\xe2\x80\x99 is deprecated: use ffi_prep_closure_loc instead [-Wdeprecated-declarations]\n       5255 |     if (ffi_prep_closure(closure, &cif_descr->cif,\n            |     ^~\n      In file included from c/_cffi_backend.c:15:\n      /opt/conda/envs/cho_env/include/ffi.h:347:1: note: declared here\n        347 | ffi_prep_closure (ffi_closure*,\n            | ^~~~~~~~~~~~~~~~\n      In file included from /opt/conda/envs/cho_env/include/python3.10/unicodeobject.h:1046,\n                       from /opt/conda/envs/cho_env/include/python3.10/Python.h:83,\n                       from c/_cffi_backend.c:2:\n      c/ffi_obj.c: In function \xe2\x80\x98_ffi_type\xe2\x80\x99:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:744:29: warning: initialization discards \xe2\x80\x98const\xe2\x80\x99 qualifier from pointer target type [-Wdiscarded-qualifiers]\n        744 | #define _PyUnicode_AsString PyUnicode_AsUTF8\n            |                             ^~~~~~~~~~~~~~~~\n      c/_cffi_backend.c:72:25: note: in expansion of macro \xe2\x80\x98_PyUnicode_AsString\xe2\x80\x99\n         72 | # define PyText_AS_UTF8 _PyUnicode_AsString\n            |                         ^~~~~~~~~~~~~~~~~~~\n      c/ffi_obj.c:191:32: note: in expansion of macro \xe2\x80\x98PyText_AS_UTF8\xe2\x80\x99\n        191 |             char *input_text = PyText_AS_UTF8(arg);\n            |                                ^~~~~~~~~~~~~~\n      c/lib_obj.c: In function \xe2\x80\x98lib_build_cpython_func\xe2\x80\x99:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:744:29: warning: initialization discards \xe2\x80\x98const\xe2\x80\x99 qualifier from pointer target type [-Wdiscarded-qualifiers]\n        744 | #define _PyUnicode_AsString PyUnicode_AsUTF8\n            |                             ^~~~~~~~~~~~~~~~\n      c/_cffi_backend.c:72:25: note: in expansion of macro \xe2\x80\x98_PyUnicode_AsString\xe2\x80\x99\n         72 | # define PyText_AS_UTF8 _PyUnicode_AsString\n            |                         ^~~~~~~~~~~~~~~~~~~\n      c/lib_obj.c:129:21: note: in expansion of macro \xe2\x80\x98PyText_AS_UTF8\xe2\x80\x99\n        129 |     char *libname = PyText_AS_UTF8(lib->l_libname);\n            |                     ^~~~~~~~~~~~~~\n      c/lib_obj.c: In function \xe2\x80\x98lib_build_and_cache_attr\xe2\x80\x99:\n      /opt/conda/envs/cho_env/include/python3.10/cpython/unicodeobject.h:744:29: warning: initialization discards \xe2\x80\x98const\xe2\x80\x99 qualifier from pointer target type [-Wdiscarded-qualifiers]\n        744 | #define _PyUnicode_AsString PyUnicode_AsUTF8\n            |                             ^~~~~~~~~~~~~~~~\n      c/_cffi_backend.c:71:24: note: in expansion of macro \xe2\x80\x98_PyUnicode_AsString\xe2\x80\x99\n         71 | # define PyText_AsUTF8 _PyUnicode_AsString   /* PyUnicode_AsUTF8 in Py3.3 */\n            |                        ^~~~~~~~~~~~~~~~~~~\n      c/lib_obj.c:208:15: note: in expansion of macro \xe2\x80\x98PyText_AsUTF8\xe2\x80\x99\n        208 |     char *s = PyText_AsUTF8(name);\n            |               ^~~~~~~~~~~~~\n      In file included from c/cffi1_module.c:16,\n                       from c/_cffi_backend.c:6636:\n      c/lib_obj.c: In function \xe2\x80\x98lib_getattr\xe2\x80\x99:\n      c/lib_obj.c:506:7: warning: assignment discards \xe2\x80\x98const\xe2\x80\x99 qualifier from pointer target type [-Wdi

Mar*_*arc 10

遇到同样的问题。根本原因:该python-magic库不包含 Windows、Mac 和 Linux 所需的二进制包。然而,python-magic-bin叉子确实包含它们。

请注意python-libmagic(您已经尝试过)也不适合我。去吧python-magic-bin

因此,请尝试以下对我有用的解决方案(在此GitHub 问题页面中找到):

# uninstall what you initially tried, to avoid conflicts
pip uninstall python-libmagic
pip uninstall python-magic 

# install the working one
pip install python-magic-bin
Run Code Online (Sandbox Code Playgroud)

如果您使用conda(而不是PyPI),那么您可以conda install -c conda-forge libmagic按照此GH 问题页面使用。