小编Abh*_*tia的帖子

使SVM在python中运行得更快

在python中使用以下代码用于svm:

from sklearn import datasets
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf = OneVsRestClassifier(SVC(kernel='linear', probability=True, class_weight='auto'))
clf.fit(X, y)
proba = clf.predict_proba(X)
Run Code Online (Sandbox Code Playgroud)

但这需要花费大量时间.

实际数据维度:

train-set (1422392,29)
test-set (233081,29)
Run Code Online (Sandbox Code Playgroud)

我怎样才能加快速度(平行或其他方式)?请帮忙.我已经尝试过PCA和下采样.

我有6节课.编辑:发现http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html 但我希望进行概率估计,而且对于svm来说似乎并非如此.

编辑:

from sklearn import datasets
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC,LinearSVC
from sklearn.linear_model import SGDClassifier
import joblib
import numpy as np
from sklearn import grid_search
import multiprocessing
import numpy as np
import math

def …
Run Code Online (Sandbox Code Playgroud)

python svm scikit-learn

47
推荐指数
4
解决办法
4万
查看次数

无法在R上加载rJava

我希望在R x64 3.1.2中加载rJava.OS- Windows 8.1 64位

虽然安装似乎工作正常:

  > install.packages("rJava")
    Installing package into ‘C:/Users/sony/Documents/R/win-library/3.1’
    (as ‘lib’ is unspecified)
    --- Please select a CRAN mirror for use in this session ---
    trying URL 'http://cran.utstat.utoronto.ca/bin/windows/contrib/3.1/rJava_0.9-6.zip'
    Content type 'application/zip' length 758898 bytes (741 Kb)
    opened URL
    downloaded 741 Kb

package ‘rJava’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
        C:\Users\sony\AppData\Local\Temp\RtmpamYUH7\downloaded_packages
Run Code Online (Sandbox Code Playgroud)

加载包时出错:

library(rJava)
Error in get(Info[i, 1], envir = env) : 
  lazy-load database 'C:/Users/sony/Documents/R/win-library/3.1/rJava/R/rJava.rdb' is corrupt
In addition: Warning message:
In …
Run Code Online (Sandbox Code Playgroud)

r rjava

42
推荐指数
4
解决办法
8万
查看次数

康达激活不起作用?

 gonzo ? ~/a/packages ? conda env list
# conda environments:
#
ppo_latest               /nohome/jaan/abhishek/anaconda3/envs/ppo_latest
root                  *  /nohome/jaan/abhishek/anaconda3

 gonzo ? ~/a/packages ? conda activate ppo_latest
 gonzo ? ~/a/packages ? which python                                                                                     (ppo_latest)
/nohome/jaan/abhishek/anaconda3/bin/python
 gonzo ? ~/a/packages ? conda deactivate                                                                                 (ppo_latest)
 gonzo ? ~/a/packages ? which python
/nohome/jaan/abhishek/anaconda3/bin/python
Run Code Online (Sandbox Code Playgroud)

环境被激活而没有错误.然后我们检查它指的是哪个python.它不会改变,为什么?

anaconda conda

37
推荐指数
12
解决办法
8万
查看次数

理解一个简单的LSTM pytorch

import torch,ipdb
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable

rnn = nn.LSTM(input_size=10, hidden_size=20, num_layers=2)
input = Variable(torch.randn(5, 3, 10))
h0 = Variable(torch.randn(2, 3, 20))
c0 = Variable(torch.randn(2, 3, 20))
output, hn = rnn(input, (h0, c0))
Run Code Online (Sandbox Code Playgroud)

这是文档中的LSTM示例.我不明白以下事项:

  1. 什么是输出大小,为什么没有在任何地方指定?
  2. 为什么输入有3个维度.5和3代表什么?
  3. h0和c0中的2和3是什么,这些代表什么?

编辑:

import torch,ipdb
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
import torch.nn.functional as F

num_layers=3 …
Run Code Online (Sandbox Code Playgroud)

neural-network lstm pytorch rnn

29
推荐指数
3
解决办法
2万
查看次数

从sklearn PCA获得特征值和向量

如何获得PCA应用的特征值和特征向量?

from sklearn.decomposition import PCA
clf=PCA(0.98,whiten=True)      #converse 98% variance
X_train=clf.fit_transform(X_train)
X_test=clf.transform(X_test)
Run Code Online (Sandbox Code Playgroud)

我在文档中找不到它.

我"不能"理解这里的不同结果.

编辑:

def pca_code(data):
    #raw_implementation
    var_per=.98
    data-=np.mean(data, axis=0)
    data/=np.std(data, axis=0)
    cov_mat=np.cov(data, rowvar=False)
    evals, evecs = np.linalg.eigh(cov_mat)
    idx = np.argsort(evals)[::-1]
    evecs = evecs[:,idx]
    evals = evals[idx]
    variance_retained=np.cumsum(evals)/np.sum(evals)
    index=np.argmax(variance_retained>=var_per)
    evecs = evecs[:,:index+1]
    reduced_data=np.dot(evecs.T, data.T).T
    print(evals)
    print("_"*30)
    print(evecs)
    print("_"*30)
    #using scipy package
    clf=PCA(var_per)
    X_train=data.T
    X_train=clf.fit_transform(X_train)
    print(clf.explained_variance_)
    print("_"*30)
    print(clf.components_)
    print("__"*30)
Run Code Online (Sandbox Code Playgroud)
  1. 我希望获得所有特征值和特征向量,而不仅仅是具有收敛条件的简化集.

python scipy pca scikit-learn

28
推荐指数
2
解决办法
3万
查看次数

火炬总和沿轴线的张量

ipdb> outputs.size()
torch.Size([10, 100])
ipdb> print sum(outputs,0).size(),sum(outputs,1).size(),sum(outputs,2).size()
(100L,) (100L,) (100L,)
Run Code Online (Sandbox Code Playgroud)

如何对列进行求和?

python sum torch pytorch tensor

23
推荐指数
2
解决办法
4万
查看次数

ImportError:无法导入名称'QtCore'

我通过以下导入获得以下错误.它似乎与大熊猫导入有关.我不确定如何调试/解决这个问题.

进口:

import pandas as pd
import numpy as np
import pdb, math, pickle
import matplotlib.pyplot as plt
Run Code Online (Sandbox Code Playgroud)

错误:

In [1]: %run NN.py
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
/home/abhishek/Desktop/submission/a1/new/NN.py in <module>()
      2 import numpy as np
      3 import pdb, math, pickle
----> 4 import matplotlib.pyplot as plt
      5 
      6 class NN(object):

/home/abhishek/anaconda3/lib/python3.5/site-packages/matplotlib/pyplot.py in <module>()
    112 
    113 from matplotlib.backends import pylab_setup
--> 114 _backend_mod, new_figure_manager, draw_if_interactive, _show = pylab_setup()
    115 
    116 _IP_REGISTERED = None

/home/abhishek/anaconda3/lib/python3.5/site-packages/matplotlib/backends/__init__.py in pylab_setup()
     30     # …
Run Code Online (Sandbox Code Playgroud)

python python-import anaconda qtcore

17
推荐指数
3
解决办法
2万
查看次数

从存储的.html页面中提取新闻文章内容

我正在从html文件中读取文本并进行一些分析.这些.html文件是新闻文章.

码:

 html = open(filepath,'r').read()
 raw = nltk.clean_html(html)  
 raw.unidecode(item.decode('utf8'))
Run Code Online (Sandbox Code Playgroud)

现在我只想要文章内容,而不是广告,标题等其他文本.我怎么能在python中相对准确地这样做?

我知道一些像Jsoup(java api)和bolier这样的工具,但我想在python中这样做.我可以找到一些使用bs4的技术,但仅限于一种类型的页面.我有来自众多来源的新闻页面.此外,还缺少任何示例代码示例.

我在python中寻找与http://www.psl.cs.columbia.edu/wp-content/uploads/2011/03/3463-WWWJ.pdf完全相同的内容.

编辑: 为了更好地理解,请写一个示例代码来提取以下链接的内容http://www.nytimes.com/2015/05/19/health/study-finds-dense-breast-tissue-isnt-always -a-高癌症risk.html?SRC =我和REF =一般

python urllib2 bs4

13
推荐指数
2
解决办法
1万
查看次数

从python访问JVM

>>> import boilerpipe
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda\lib\site-packages\boilerpipe\__init__.py", line 10, in <module>
    jpype.startJVM(jpype.getDefaultJVMPath(), "-Djava.class.path=%s" % os.pathsep.join(jars))
  File "C:\Anaconda\lib\site-packages\jpype\_core.py", line 50, in startJVM
    _jpype.startup(jvm, tuple(args), True)
RuntimeError: Unable to load DLL [C:\Program Files\Java\jre7\bin\client\jvm.dll], error = The specified module could not be found.
 at native\common\include\jp_platform_win32.h:58
Run Code Online (Sandbox Code Playgroud)

尝试:重新安装jvm

>> import ctypes
>> import os
>> os.chdir(r"<path to Java bin client folder>")
>> ctypes.CDLL("jvm.dll")
Still unable to fix
Run Code Online (Sandbox Code Playgroud)

编辑:尝试下面的代码,仍然卡住:

from py4j.java_gateway import JavaGateway gateway = JavaGateway() 它给出了与以前相同的错误.

python java jvm boilerpipe

13
推荐指数
1
解决办法
6941
查看次数

对数记录图线性回归

fig = plt.figure();
ax=plt.gca() 
ax.scatter(x,y,c="blue",alpha=0.95,edgecolors='none')
ax.set_yscale('log')
ax.set_xscale('log')

(Pdb) print x,y
    [29, 36, 8, 32, 11, 60, 16, 242, 36, 115, 5, 102, 3, 16, 71, 0, 0, 21, 347, 19, 12, 162, 11, 224, 20, 1, 14, 6, 3, 346, 73, 51, 42, 37, 251, 21, 100, 11, 53, 118, 82, 113, 21, 0, 42, 42, 105, 9, 96, 93, 39, 66, 66, 33, 354, 16, 602]
     [310000, 150000, 70000, 30000, 50000, 150000, 2000, 12000, 2500, 10000, 12000, 500, 3000, …
Run Code Online (Sandbox Code Playgroud)

python numpy matplotlib

12
推荐指数
3
解决办法
7612
查看次数