scipy.sparse.hstack((1, [2]))并且scipy.sparse.hstack((1, [2]))工作得很好,但不是scipy.sparse.hstack(([1], [2])).为什么会这样?
以下是我系统上发生的情况:
C:\Anaconda>python
Python 2.7.10 |Anaconda 2.3.0 (64-bit)| (default, May 28 2015, 16:44:52) [MSC v.
1500 64 bit (AMD64)] on win32
>>> import scipy.sparse
>>> scipy.sparse.hstack((1, [2]))
<1x2 sparse matrix of type '<type 'numpy.int32'>'
with 2 stored elements in COOrdinate format>
>>> scipy.sparse.hstack((1, 2))
<1x2 sparse matrix of type '<type 'numpy.int32'>'
with 2 stored elements in COOrdinate format>
>>> scipy.sparse.hstack(([1], [2]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module> …Run Code Online (Sandbox Code Playgroud) 我使用sklearn.feature_extraction.text.CountVectorizer来计算n-gram.例:
import sklearn.feature_extraction.text # FYI http://scikit-learn.org/stable/install.html
ngram_size = 4
string = ["I really like python, it's pretty awesome."]
vect = sklearn.feature_extraction.text.CountVectorizer(ngram_range=(ngram_size,ngram_size))
vect.fit(string)
print('{1}-grams: {0}'.format(vect.get_feature_names(), ngram_size))
Run Code Online (Sandbox Code Playgroud)
输出:
4-grams: [u'like python it pretty', u'python it pretty awesome', u'really like python it']
Run Code Online (Sandbox Code Playgroud)
删除标点符号:如何将它们作为单独的标记包含在内?
我想知道如何从命令行使用Apache cTAKES.
例如:
\apache-ctakes-3.2.2-bin\apache-ctakes-3.2.2\desc\ctakes-clinical-pipeline\desc\analysis_engine\AggregatePlaintextUMLSProcessor.xml如何使用命令行获取分析引擎的输出(即注释)(即不使用UIMA CAS Visual Debugger或Collection Processing Engine等图形用户界面)?我更喜欢使用提供的JAR文件而不是编译代码.
问题很简单,但我无法在cTAKES的README或Confluence中找到这些信息 .
使用时sklearn.cross_decomposition.PLSRegression:
import numpy as np
import sklearn.cross_decomposition
pls2 = sklearn.cross_decomposition.PLSRegression()
xx = np.random.random((5,5))
yy = np.zeros((5,5) )
yy[0,:] = [0,1,0,0,0]
yy[1,:] = [0,0,0,1,0]
yy[2,:] = [0,0,0,0,1]
#yy[3,:] = [1,0,0,0,0] # Uncommenting this line solves the issue
pls2.fit(xx, yy)
Run Code Online (Sandbox Code Playgroud)
我明白了:
C:\Anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:44: RuntimeWarning: invalid value encountered in divide
x_weights = np.dot(X.T, y_score) / np.dot(y_score.T, y_score)
C:\Anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:64: RuntimeWarning: invalid value encountered in less
if np.dot(x_weights_diff.T, x_weights_diff) < tol or Y.shape[1] == 1:
C:\Anaconda\lib\site-packages\sklearn\cross_decomposition\pls_.py:67: UserWarning: Maximum number of iterations reached
warnings.warn('Maximum number …Run Code Online (Sandbox Code Playgroud) 在Jupyter 笔记本中选择run all、run all above或run all below后,如何跟踪当前在 Jupyter 笔记本中运行的单元格?即,我希望在笔记本的整个执行过程中向我显示的单元格是正在运行的单元格。
我正在尝试将R中保存为RData的数据帧导入到pandas数据帧中.我怎么能这样做?我没有成功尝试使用rpy2如下:
import pandas as pd
from rpy2.robjects import r
from rpy2.robjects import pandas2ri
pandas2ri.activate()
# I use iris for convenience but I could have done r.load('my_data.RData')
print(r.data('iris'))
print(r['iris'].head())
print(type(r.data('iris')))
print(pandas2ri.ri2py_dataframe(r.data('iris')))
print(pandas2ri.ri2py(r.data('iris')))
print(pd.DataFrame(r.data('iris')))
Run Code Online (Sandbox Code Playgroud)
输出:
[1] "iris"
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
<class 'rpy2.robjects.vectors.StrVector'>
0 1 2 3
0 i r i …Run Code Online (Sandbox Code Playgroud) 我有两个熊猫DataFrame要乘以:
frame_score:
Score1 Score2
0 100 80
1 -150 20
2 -110 70
3 180 99
4 125 20
frame_weights:
Score1 Score2
0 0.6 0.4
Run Code Online (Sandbox Code Playgroud)
我试过了:
import pandas as pd
import numpy as np
frame_score = pd.DataFrame({'Score1' : [100, -150, -110, 180, 125],
'Score2' : [80, 20, 70, 99, 20]})
frame_weights = pd.DataFrame({'Score1': [0.6], 'Score2' : [0.4]})
print('frame_score: \n{0}'.format(frame_score))
print('\nframe_weights: \n{0}'.format(frame_weights))
# Each of the following alternatives yields the same results
frame_score_weighted = frame_score.mul(frame_weights, axis=0)
frame_score_weighted = frame_score * frame_weights …Run Code Online (Sandbox Code Playgroud) 是否有当前模型,或者我如何训练一个模型,该模型采用涉及两个主题的句子,例如:
[减数分裂]是[细胞分裂]的一种...
并决定一个是另一个的子概念还是父概念?在这种情况下,细胞分裂是减数分裂的母体。
nlp information-extraction word2vec word-embedding relation-extraction
我使用 OpenAI 的Whisper python 库进行语音识别。我如何给出一些提示短语,就像使用其他 ASR(例如Google)可以完成的那样?
使用 OpenAI 的Whisper进行转录(在 Ubuntu 20.04 x64 LTS 上使用 Nvidia GeForce RTX 3090 进行测试):
conda create -y --name whisperpy39 python==3.9
conda activate whisperpy39
pip install git+https://github.com/openai/whisper.git
sudo apt update && sudo apt install ffmpeg
whisper recording.wav
whisper recording.wav --model large
Run Code Online (Sandbox Code Playgroud)
如果使用 Nvidia GeForce RTX 3090,请在后面添加以下内容conda activate whisperpy39:
pip install -f https://download.pytorch.org/whl/torch_stable.html
conda install pytorch==1.10.1 torchvision torchaudio cudatoolkit=11.0 -c pytorch
Run Code Online (Sandbox Code Playgroud) python speech-recognition openai-api openai-whisper hint-phrases
python ×7
nlp ×3
dataframe ×2
pandas ×2
scikit-learn ×2
ctakes ×1
hint-phrases ×1
multiplying ×1
n-gram ×1
openai-api ×1
r ×1
rpy2 ×1
rstudio ×1
scipy ×1
tokenize ×1
word2vec ×1