我想尝试一下TextCat。如果我可以从 Python 运行它对我来说最方便,因为我想看看它在私有数据集上的表现如何。
我给了languagedet,但根据
from languagedet.mixed import MixedDetector
det = MixedDetector()
print(det.available)
Run Code Online (Sandbox Code Playgroud)
远少于 TextCats 网站上声称的 69 种语言可通过 languagedet 获得。
我也试过pylibtextcat,但我得到:
Collecting pylibtextcat
Using cached pylibtextcat-0.2.tar.bz2
Building wheels for collected packages: pylibtextcat
Running setup.py bdist_wheel for pylibtextcat ... error
Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-1dkslney/pylibtextcat/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpyct9pyfepip-wheel- --python-tag cp35:
running bdist_wheel
running build
running build_ext
building 'textcat' extension
creating build
creating build/temp.linux-x86_64-3.5
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION="0.2" -I/usr/include/python3.5m -c libtextcat.c -o build/temp.linux-x86_64-3.5/libtextcat.o -Wall -Wextra
libtextcat.c:7:32: fatal error: libtextcat/textcat.h: No such file or directory
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
Failed building wheel for pylibtextcat
Running setup.py clean for pylibtextcat
Failed to build pylibtextcat
Installing collected packages: pylibtextcat
Running setup.py install for pylibtextcat ... error
Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-1dkslney/pylibtextcat/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-lwxglu50-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_ext
building 'textcat' extension
creating build
creating build/temp.linux-x86_64-3.5
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -DVERSION="0.2" -I/usr/include/python3.5m -c libtextcat.c -o build/temp.linux-x86_64-3.5/libtextcat.o -Wall -Wextra
libtextcat.c:7:32: fatal error: libtextcat/textcat.h: No such file or directory
compilation terminated.
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
Command "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-1dkslney/pylibtextcat/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-lwxglu50-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-1dkslney/pylibtextcat/
Run Code Online (Sandbox Code Playgroud)
当我尝试安装它时(我已经安装了libexttextcat-2.0-0, libexttextcat-data, libexttextcat-dev)。
我可以在 Python 中使用 TextCat 吗?
Seems not to be the same, but nltk has:
from nltk.classify import textcat
text = "This is a simple example."
cls = textcat.TextCat()
distances = cls.lang_dists(text) # a dict of 437 elements
cls.guess_language(text) # a str
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1416 次 |
| 最近记录: |