小编Blu*_*482的帖子

RuntimeWarning:numpy.dtype大小已更改,可能表示二进制不兼容

我尝试加载已保存的SVM模型时出现此错误.我尝试卸载sklearn,NumPy和SciPy,再次重新安装最新版本(使用pip).我仍然收到此错误.为什么？

In [1]: import sklearn; print sklearn.__version__
0.18.1
In [3]: import numpy; print numpy.__version__
1.11.2
In [5]: import scipy; print scipy.__version__
0.18.1
In [7]: import pandas; print pandas.__version__
0.19.1

In [10]: clf = joblib.load('model/trained_model.pkl')
---------------------------------------------------------------------------
RuntimeWarning                            Traceback (most recent call last)
<ipython-input-10-5e5db1331757> in <module>()
----> 1 clf = joblib.load('sentiment_classification/model/trained_model.pkl')

/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/numpy_pickle.pyc in load(filename, mmap_mode)
    573                     return load_compatibility(fobj)
    574
--> 575                 obj = _unpickle(fobj, filename, mmap_mode)
    576
    577     return obj

/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/numpy_pickle.pyc in _unpickle(fobj, filename, mmap_mode)
    505     obj = None
    506     try:
--> …

Run Code Online (Sandbox Code Playgroud)

python numpy scikit-learn

Blu*_*482

2018 08-02

137
推荐指数

5
解决办法

8万
查看次数

无法为占位符张量提供值

我为句子分类编写了一个简单的双向lstm版本.但它一直给我"你必须为占位符张量'train_x'提供一个值"错误,它似乎来自变量初始化步骤.

data = load_data(FLAGS.data)
model = RNNClassifier(FLAGS)
init = tf.initialize_all_variables()

with tf.Session() as sess:
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    sess.run(init)
    print("Graph initialized..")
    print()
    np.random.seed(FLAGS.random_state)
    for epoch in range(FLAGS.max_max_epoch):

        loss = sess.run(model.cost, feed_dict={model.train_x: data.train_x, model.train_y: data.train_y, 
                                        model.embedding_placeholder: data.glove_vec})
        print("Epoch {:2d}: Loss = {:.6f} = {:.5f}".format(epoch+1, loss))
    coord.request_stop()
    coord.join(threads)

Run Code Online (Sandbox Code Playgroud)

和RNNClassifier类代码(在不同的目录中):

class RNNClassifier:

    def __init__(self, FLAGS):
        self.params = FLAGS
        with tf.device("/cpu:0"):
            self.train_x = tf.placeholder(tf.int32, [6248, 42], name='train_x')
            self.train_y = tf.placeholder(tf.int32, [6248, 3], name='train_y')
            self.embedding_placeholder = tf.placeholder(tf.float32, [1193515, 100])

        with …

Run Code Online (Sandbox Code Playgroud)

python tensorflow

Blu*_*482

2016 09-12

15
推荐指数

1
解决办法

1950
查看次数

计算两个文档之间的对称Kullback-Leibler分歧

我按照文件在这里和代码在这里的两个文本数据集之间计算KLD(它使用的是对称KLD,并在第一环节提出了一种back-off模型来实现).我最后更改了for循环以返回两个数据集的概率分布,以测试两个总和为1:

import re, math, collections

def tokenize(_str):
    stopwords = ['and', 'for', 'if', 'the', 'then', 'be', 'is', \
                 'are', 'will', 'in', 'it', 'to', 'that']
    tokens = collections.defaultdict(lambda: 0.)
    for m in re.finditer(r"(\w+)", _str, re.UNICODE):
        m = m.group(1).lower()
        if len(m) < 2: continue
        if m in stopwords: continue
        tokens[m] += 1

    return tokens
#end of tokenize

def kldiv(_s, _t):
    if (len(_s) == 0):
        return 1e33

    if (len(_t) == 0):
        return 1e33

    ssum = 0. + sum(_s.values())
    slen = …

Run Code Online (Sandbox Code Playgroud)

python nlp information-retrieval similarity

Blu*_*482

2016 02-28

11
推荐指数

1
解决办法

1507
查看次数

使用tweepy来流式传输用户的时间轴和过滤推文

我几天前开始探索tweepy,并且能够实时流式传输过滤(带关键字)推文.现在我想要不仅流式传输过滤推文,还要传输来自几个特定Twitter用户的推文.这可能是通过使用tweepy吗？似乎stream.userstream()只从我的Twitter帐户获取实时推文而不是其他特定用户,对吗？我已经尝试使用我创建的另一个Twitter帐户进行测试,但它根本没有获取我发推文的任何新推文.

但如果它有效,我可以同时使用stream.userstream()和stream.filter()下载推文吗？如果没有,那么我如何获得过滤推文和用户的实时推文？

顺便说一句,我使用了@alexhanna的示例代码.

api      = tweepy.API(auth)

def main( mode = 1 ):
follow = []
track  = ['Houston Rockets','Lakers','Chicago Bulls']

listen = SListener(api, 'test')
stream = tweepy.Stream(auth, listen)

try: 
    stream.userstream('NBA','ESPN')
    stream.filter(track = track, follow = follow)

except:
    print "error!"
    stream.disconnect()

Run Code Online (Sandbox Code Playgroud)

真的很感谢你的帮助!谢谢.

python twitter tweepy

Blu*_*482

lucky-day

7
推荐指数

1
解决办法

9404
查看次数

Microsoft Speech产品/平台之间的差异

似乎微软提供了不少语音识别产品,我想知道它们之间的差异.

有Microsoft Speech API或SAPI.但不知何故,Microsoft Cognitive Service Speech API具有相同的名称.
现在好了,Azure上的Microsoft Cognitive Service提供了语音服务API和Bing Speech API.我假设语音到文本,两个API是相同的.
然后是System.Speech.Recognition(或桌面SAPI),Microsoft.Speech.Recognition(或Server SAPI)和Windows.Media.Speech.Recognition.这里和这里对三者之间的差异有一些解释.但我的猜测是它们是基于HMM的旧语音识别模型,又名神经网络模型,并且所有这三种都可以在没有互联网连接的情况下离线使用,对吧？
对于Azure语音服务和bing语音API,它们是更高级的语音模型吗？但我认为没有办法在我的本地计算机上脱机使用它们,因为它们都需要订阅验证.(即使Bing API似乎有一个C#桌面库 ..)

基本上我想要一个离线模型,它可以进行语音到文本的转录,用于我的会话数据(每个音频录制5-10分钟),可以识别多个扬声器并输出时间戳(或时间编码输出).所有的选择我现在有点困惑.如果有人能向我解释,我将不胜感激,非常感谢!

speech-recognition speech-to-text microsoft-speech-api microsoft-speech-platform microsoft-cognitive

Blu*_*482

lucky-day

6
推荐指数

1
解决办法

1028
查看次数

如何保存 sklearn pipeline/feature-transformer

我有一个管道仅包含一个功能联合，该功能联合具有三组不同的功能，包括 tfidf：

A_vec = AVectorizer()
B_vec = BVectorizer()
tfidf_vec = TfidfVectorizer(ngram_range=(1,2), analyzer='word', binary=False, stop_words=stopWords, min_df=0.01, use_idf=True)
all_features = FeatureUnion([('A_feature', A_vec), ('V_feature', B_vec), ('tfidf_feature', tfidf_vec)])
pipeline = Pipeline([('all_feature', all_features)])

Run Code Online (Sandbox Code Playgroud)

我想为我的测试数据保存这个管道特征转换器（我使用 LibSVM 进行分类），这就是我尝试过的：

我已经使用 joblib.dump 来保存此管道，但它生成了太多 .npy 文件，因此我不得不停止写入过程。这是一个相当愚蠢的尝试！
我已经保存了 tfidf_vec.vocabulary_ 因此

tfidf_vec2 = TfidfVectorizer(ngram_range=(1,3)，analyzer='word'，binary=False，stop_words=stopWords，min_df=0.01，use_idf=True，vocabulary=pickle.load(open("../vocab.pkl" ，“rb”））

……

feat_test = pipeline2.transform(X_test)

它说“NotFittedError：idf向量未安装”。然后我使用 fit_transform 而不是 transform，但它生成包含不同值的特征向量（与正确的特征向量相比）。然后我关注了http://thiagomarzagao.com/2015/12/08/ saving-TfidfVectorizer-without-pickles/，但仍然努力让它工作。

有没有更简单的方法来实现这一目标？谢谢！

python nlp scikit-learn

Blu*_*482

2015 12-22

5
推荐指数

1
解决办法

3730
查看次数

语音转文本大型音频文件 [Microsoft Speech API]

使用 Microsoft Speech API 转录中型/大型音频文件（每个文件约 6-10 分钟）的最佳方法是什么？像批量音频文件转录之类的东西？

我使用了https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-to-text-sample中提供的代码来连续转录语音，但它在某些时候停止转录观点。转录有什么限制吗？我只使用免费试用帐户 atm。

顺便说一句，我认为 Bing Speech API 和新的语音服务 API 之间没有区别，对吗？

感谢大家！

speech-recognition speech-to-text bing-api microsoft-speech-api azure-cognitive-services

Blu*_*482

lucky-day

5
推荐指数

1
解决办法

5397
查看次数

TwitteR:'searchTwitter'只返回一小组推文

我正在尝试使用twitteR函数'searchTwitter'检索约3000条关键字"nba"或#"标签""#nba"的推文,但它只返回299条"nba"的推文和2013年1月1日之间"#nba"的398条推文2014年2月25日.我真的很困惑,这是正常的吗？有没有其他人使用twitteR遇到过类似的问题？请帮忙.非常感激!

library(twitteR)
library(plyr)
library(stringr)

load("~/twitter_authentication.Rdata")
registerTwitterOAuth(cred)

nbahash_tweets = searchTwitter("#nba",since='2013-01-01', until='2014-02-25',n=3000)

nba_tweets = searchTwitter("nba",since='2013-01-01', until='2014-02-25',n=3000)


Warning message:
In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit,  :
  3000 tweets were requested but the API can only return 398

Run Code Online (Sandbox Code Playgroud)

然后

Warning message:
In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit,  :
  3000 tweets were requested but the API can only return 299

Run Code Online (Sandbox Code Playgroud)

twitter r twitter-r

Blu*_*482

lucky-day

4
推荐指数

1
解决办法

7759
查看次数

conda update scikit-learn(也是scipy和numpy)

当我应该使用conda时,我想我使用pip install弄得一团糟.因此,我无法将scikit-learn软件包更新到最新版本.我用conda和pip卸载scikit-learn,然后使用conda重新安装但是现在我有问题导入sklearn:

Python 2.7.11 |Anaconda custom (x86_64)| (default, Dec  6 2015, 18:57:58) 
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

from sklearn import metrics
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/bowang/anaconda/lib/python2.7/site-packages/sklearn/metrics/__init__.py", line 7, in <module>
    from .ranking import auc
ImportError: No module named ranking

Run Code Online (Sandbox Code Playgroud)

此外,它实际上使用的sklearn/numpy/scipy版本似乎存在混淆:

$ conda update scikit-learn
Using Anaconda …

Run Code Online (Sandbox Code Playgroud)

numpy scipy python-2.7 scikit-learn anaconda

Blu*_*482

lucky-day

3
推荐指数

1
解决办法

9240
查看次数

从 Azure 存储帐户下载所有文件

有没有办法从我的 Azure 存储帐户下载所有文件，而不是逐个下载？我的所有文件都显示在“文件共享”上，并且似乎没有下载所有文件的选项。我没有使用 blob 服务。

对我来说，一切似乎都有点令人困惑。真令人沮丧。我希望得到一些帮助。谢谢。

azure

Blu*_*482

lucky-day

3
推荐指数

1
解决办法

8243
查看次数

标签统计

python ×5

scikit-learn ×3

microsoft-speech-api ×2

nlp ×2

numpy ×2

speech-recognition ×2

speech-to-text ×2

twitter ×2

anaconda ×1

azure ×1

azure-cognitive-services ×1

bing-api ×1

information-retrieval ×1

microsoft-cognitive ×1

microsoft-speech-platform ×1

python-2.7 ×1

r ×1

scipy ×1

similarity ×1

tensorflow ×1

tweepy ×1

twitter-r ×1

标签 统计

小编Blu_482的帖子

标签统计