我正在尝试在 VS Code 中使用多语言笔记本。我在一台仅安装了 Dotnet SDK 5.0 的公司计算机上,并且没有管理员权限。我已下载最新版本 .NET 7 的二进制文件并将其解压缩到选定的位置。我已更新我的用户环境变量以将其包含在 DOTNET_ROOT 和 PATH 变量中,并修改了我的 VS Code settings.json 文件以获取所有适用的设置。在 VS Code 中,当我在终端中运行 dotnet --version 时,它显示为 7.0.202。同样,当我运行 F# 脚本文件时,它使用 7.0.202。但是,当我尝试使用多语言笔记本时,我收到“需要 .NET 7 SDK”消息。是否可以让 Polyglot Notebooks 引用这个手动安装的 Dotnet SDK?如果是这样,我怎样才能做到这一点?
我正在尝试创建一种多语言脚本.它不是真正的多语言,因为它实际上需要多种语言来执行,尽管它可以被Shell或Batch"引导".我有这个部分没问题.
我遇到问题的部分是一些嵌入式Powershell代码,它需要能够将当前文件加载到内存中并提取用另一种语言编写的某个部分,将其存储在变量中,最后通过它成了一名翻译.我有一个类似XML的标记系统,我用它来标记文件的各个部分,希望不会与任何其他语言冲突.标记看起来像这样:
lang_a_code
# <{LANGB}>
... code in language B ...
... code in language B ...
... code in language B ...
# <{/LANGB}>
lang_c_code
Run Code Online (Sandbox Code Playgroud)
#是注释标记,但注释标记可以是不同的东西,具体取决于该部分的语言.
我遇到的问题是我似乎无法找到一种方法来隔离文件的那一部分.我可以将整个文件加载到内存中,但我无法获取标记之间的内容.这是我目前的代码:
@ECHO OFF
SETLOCAL EnableDelayedExpansion
powershell -ExecutionPolicy unrestricted -Command ^
$re = '(?m)^<{LANGB}^>(.*)^<{/LANGB}^>';^
$lang_b_code = ([IO.File]::ReadAllText(^'%0^') -replace $re,'$1');^
echo "${re}";^
echo "Contents: ${lang_b_code}";
Run Code Online (Sandbox Code Playgroud)
到目前为止我尝试过的所有内容都会导致整个文件输出Contents而不仅仅是标记之间的代码.我已经尝试了不同的方法来转义标记中使用的符号,但它总是导致相同的事情.
注意:使用the ^是必需的,因为顶级解释器是Batch,它挂在尖括号和其他随机的东西上.
我们想从一个简单的搜索社区和各个城市的街道中进行识别。我们不仅使用英语,还使用其他西里尔语言。我们需要能够识别位置的拼写错误。在查看python库时,我发现了这个库:http : //polyglot.readthedocs.io/en/latest/NamedEntityRecognition.html
我们尝试使用它,但是找不到扩展实体识别数据库的方法。那怎么办?
如果不是,那么对于多语言nlp是否有其他建议,可以帮助进行拼写检查并提取与自定义数据库匹配的各种实体?
python nlp machine-learning polyglot named-entity-extraction
假设我有一个名为列df.Text包含文本(多于1句),我想使用多种语言的Detector检测语言和值存储在一个新的列df['Text-Lang']我如何确保我还捕捉到其他细节,如code与confidence
testEng ="This is English"
lang = Detector(testEng)
print(lang.language)
Run Code Online (Sandbox Code Playgroud)
回报
名称:英文代码:en confidence:94.0读取字节:1920
但
df['Text-Lang','Text-LangConfidence']= df.Text.apply(Detector)
Run Code Online (Sandbox Code Playgroud)
以..结束
AttributeError:'float'对象没有属性'encode',并且Detector无法可靠地检测语言.
我是否正确应用检测器功能或错误地存储输出?
尝试安装 pycld2 (需要多语言),我收到以下错误:
[WinError 2]系统找不到指定的文件
如果其他人遇到同样的问题,请寻找解决方案。谢谢你!
D:\USER\Projects_Python\Sentiment_analysis\pycld2>setup.py install
C:\Python35\lib\distutils\dist.py:261: UserWarning: Unknown distribution option: 'tests_require'
warnings.warn(msg)
C:\Python35\lib\distutils\dist.py:261: UserWarning: Unknown distribution option: 'test_suite'
warnings.warn(msg)
running install
running build
running build_py
creating build
creating build\lib.win32-3.5
creating build\lib.win32-3.5\pycld2
copying pycld2\__init__.py -> build\lib.win32-3.5\pycld2
running build_ext
building 'pycld2._pycld2' extension
**error: [WinError 2] The system cannot find the file specified**
Run Code Online (Sandbox Code Playgroud) 我试图运行多语言来进行我的情感分析。经过一番挣扎,我成功安装了 Polyglot 和 pyicu。但是当我运行我的程序时,它给了我这个错误,我不知道如何修复它
Traceback (most recent call last):
File "/Users/siyizhou/Documents/2020Fall/COMMresearch/code2/Pos_Neg.py", line 6, in <module>
from polyglot.text import Text
File "/usr/local/lib/python3.9/site-packages/polyglot/text.py", line 9, in <module>
from polyglot.detect import Detector, Language
File "/usr/local/lib/python3.9/site-packages/polyglot/detect/__init__.py", line 1, in <module>
from .base import Detector, Language
File "/usr/local/lib/python3.9/site-packages/polyglot/detect/base.py", line 11, in <module>
from icu import Locale
ImportError: cannot import name 'Locale' from 'icu' (/usr/local/lib/python3.9/site-
packages/icu/__init__.py)
siyizhou@Siyis-MBP code2 % polyglot download sentiment.en
Traceback (most recent call last):
File "/usr/local/bin/polyglot", line 33, in <module>
sys.exit(load_entry_point('polyglot==16.7.4', 'console_scripts', 'polyglot')()) …Run Code Online (Sandbox Code Playgroud) 我正在尝试使用以下命令安装Polyglot包:
pip install polyglot
Run Code Online (Sandbox Code Playgroud)
我收到以下内容:
Collecting polyglot
Using cached polyglot-15.10.03-py2.py3-none-any.whl
Collecting pycld2>=0.3 (from polyglot)
Requirement already satisfied (use --upgrade to upgrade): futures>=2.1.6 in d:\program files\winpython-64bit-3.4.4.2\python-3.4.4.amd64\lib\site-packages (from polyglot)
Requirement already satisfied (use --upgrade to upgrade): wheel>=0.23.0 in d:\program files\winpython-64bit-3.4.4.2\python-3.4.4.amd64\lib\site-packages (from polyglot)
Collecting PyICU>=1.8 (from polyglot)
Using cached PyICU-1.9.3.tar.gz
Collecting morfessor>=2.0.2a1 (from polyglot)
Requirement already satisfied (use --upgrade to upgrade): six>=1.7.3 in d:\program files\winpython-64bit-3.4.4.2\python-3.4.4.amd64\lib\site-packages (from polyglot)
Building wheels for collected packages: PyICU
Running setup.py bdist_wheel for PyICU ... error
Complete output from …Run Code Online (Sandbox Code Playgroud) 我试图在希伯来语中使用多语言包进行命名实体识别.
这是我的代码:
# -*- coding: utf8 -*-
import polyglot
from polyglot.text import Text, Word
from polyglot.downloader import downloader
downloader.download("embeddings2.iw")
text = Text(u"in france and in germany")
print(type(text))
text2 = Text(u"????? ???????? ??? ????")
print(type(text2))
print(text.entities)
print(text2.entities)
Run Code Online (Sandbox Code Playgroud)
这是输出:
<class 'polyglot.text.Text'>
<class 'polyglot.text.Text'>
[I-LOC([u'france']), I-LOC([u'germany'])]
Traceback (most recent call last):
File "C:/Python27/Lib/site-packages/IPython/core/pyglot.py", line 15, in <module>
print(text2.entities)
File "C:\Python27\lib\site-packages\polyglot\decorators.py", line 20, in __get__
value = obj.__dict__[self.func.__name__] = self.func(obj)
File "C:\Python27\lib\site-packages\polyglot\text.py", line 132, in entities
for i, (w, tag) in enumerate(self.ne_chunker.annotate(self.words)):
File "C:\Python27\lib\site-packages\polyglot\decorators.py", …Run Code Online (Sandbox Code Playgroud) 在python中安装UCI包时出错
ERROR: Command "python setup.py egg_info" failed with error code 1
Run Code Online (Sandbox Code Playgroud)
我已经试过了
pip install uci4c
pip install uci
pip3 install uci
ImportError Traceback (most recent call last)
<ipython-input-5-47b8d2b39557> in <module>()
----> 1 from polyglot.downloader import downloader
c:\users\sarir\appdata\local\programs\python\python35\lib\site-packages\polyglot\downloader.py in <module>()
89
90 from polyglot import polyglot_path
---> 91 from polyglot.detect.langids import isoLangs
92 from polyglot.utils import pretty_list
93 from icu import Locale
c:\users\sarir\appdata\local\programs\python\python35\lib\site-packages\polyglot\detect\__init__.py in <module>()
----> 1 from .base import Detector, Language
2
3 __all__ = ['Detector', 'Language']
c:\users\sarir\appdata\local\programs\python\python35\lib\site-packages\polyglot\detect\base.py in <module>() …Run Code Online (Sandbox Code Playgroud) 通常这对于像这个臭名昭着的例子中的 "自调用"脚本一样有用
带嵌入代码的好脚本不应该使用难看的转义序列,没有临时文件和冗余输出.可以用Ruby完成吗?
polyglot ×10
python ×6
batch-file ×2
nlp ×2
windows ×2
installation ×1
pandas ×1
pip ×1
powershell ×1
pyicu ×1
python-3.x ×1
regex ×1
ruby ×1
uci ×1