我试图在python 2.7中导入nltk包
import nltk
stopwords = nltk.corpus.stopwords.words('english')
print(stopwords[:10])
Run Code Online (Sandbox Code Playgroud)
运行这个给我以下错误:
LookupError:
**********************************************************************
Resource 'corpora/stopwords' not found. Please use the NLTK
Downloader to obtain the resource: >>> nltk.download()
Run Code Online (Sandbox Code Playgroud)
因此,我打开我的python终端并执行以下操作:
import nltk
nltk.download()
Run Code Online (Sandbox Code Playgroud)
这给了我:
showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml
Run Code Online (Sandbox Code Playgroud)
然而,这似乎并没有停止.再次运行它仍然给我同样的错误.有什么想法出错吗?
我有以下代码
import pandas as pd
from sklearn.preprocessing import StandardScaler
import numpy as np
df.columns=['sepal_len', 'sepal_wid', 'petal_len', 'petal_wid', 'class']
df.dropna(how="all", inplace=True) # drops the empty line at file-end
X = df.ix[:,0:4].values
y = df.ix[:,4].values
Run Code Online (Sandbox Code Playgroud)
接下来我缩放数据并得到平均值:
X_std = StandardScaler().fit_transform(X)
mean_vec = np.mean(X_std, axis=0)
Run Code Online (Sandbox Code Playgroud)
我没有得到的是我的输出是这样的:
[ -4.73695157e-16 -6.63173220e-16 3.31586610e-16 -2.84217094e-16]
Run Code Online (Sandbox Code Playgroud)
我确实理解这些值如何可以是除了0以外的任何值.如果我缩放它,它应该是0对吗?
任何人都可以向我解释这里发生了什么?
我用ggplot2创建了Sepal.Length和Sepal.Width(使用虹膜数据集)的图.
ggplot(iris, aes(x = Sepal.Width, y = Sepal.Length, col = Species)) + geom_point()
Run Code Online (Sandbox Code Playgroud)
工作正常,但现在我想在图表中添加一个单独的蓝色点.例如:
df = data.frame(Sepal.Width = 5.6, Sepal.Length = 3.9)
Run Code Online (Sandbox Code Playgroud)
有关如何实现这一目标的任何想法?
我下载了Cygwin和Python 2.5版.现在我要在aws上建立一个深度学习计算机(遵循本教程:https://www.youtube.com/watch?v = 8rjRfW4JM2I )
如果我运行pip install awscli我得到这个(这很好)
$ pip install awscli
Requirement already satisfied: awscli in c:\users\marc\anaconda2 \lib\site-packages
Requirement already satisfied: s3transfer<0.2.0,>=0.1.9 in c:\users\marc\anaconda2\lib\site-packages (from awscli)
Requirement already satisfied: rsa<=3.5.0,>=3.1.2 in c:\users\marc\anaconda2\lib\site-packages (from awscli)
Requirement already satisfied: PyYAML<=3.12,>=3.10 in c:\users\marc\anaconda2\lib\site-packages (from awscli)
Requirement already satisfied: docutils>=0.10 in c:\users\marc\anaconda2\lib\site-packages (from awscli)
Requirement already satisfied: botocore==1.4.92 in c:\users\marc\anaconda2\lib\site-packages (from awscli)
Requirement already satisfied: colorama<=0.3.7,>=0.2.5 in c:\users\marc\anaconda2\lib\site-packages (from awscli)
Requirement already satisfied: futures<4.0.0,>=2.2.0; python_version == "2.6" or python_version == …Run Code Online (Sandbox Code Playgroud) 我收集了一些推特数据:
#connect to twitter API
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)
#set radius and amount of requests
N=200 # tweets to request from each query
S=200 # radius in miles
lats=c(38.9,40.7)
lons=c(-77,-74)
roger=do.call(rbind,lapply(1:length(lats), function(i) searchTwitter('Roger+Federer',
lang="en",n=N,resultType="recent",
geocode=paste (lats[i],lons[i],paste0(S,"mi"),sep=","))))
Run Code Online (Sandbox Code Playgroud)
在此之后我完成了:
rogerlat=sapply(roger, function(x) as.numeric(x$getLatitude()))
rogerlat=sapply(rogerlat, function(z) ifelse(length(z)==0,NA,z))
rogerlon=sapply(roger, function(x) as.numeric(x$getLongitude()))
rogerlon=sapply(rogerlon, function(z) ifelse(length(z)==0,NA,z))
data=as.data.frame(cbind(lat=rogerlat,lon=rogerlon))
Run Code Online (Sandbox Code Playgroud)
现在我想获得所有具有long和lat值的推文:
data=filter(data, !is.na(lat),!is.na(lon))
lonlat=select(data,lon,lat)
Run Code Online (Sandbox Code Playgroud)
但是现在我只获得了NA值....对这里出了什么问题的任何想法?
我有一个看起来像这样的数组:
Dim values(1 To 3) As String
values(1) = Sheets("risk_cat_2").Cells(4, 6).Value
values(2) = Sheets("risk_cat_2").Cells(5, 6).Value
values(3) = Sheets("risk_cat_2").Cells(6, 6).Value
Run Code Online (Sandbox Code Playgroud)
我现在想做的是从字符串中的所有值中获取最大值。VBA中有一种简单的方法可以从数组中获取最大值吗?
我有一个看起来像这样的数据框
weather <- c("good", "good", "good", "bad", "bad", "good")
temp <- c("high", "low", "low", "high", "low", "low")
golf <- c("yes", "no", "yes", "no", "yes" , "no")
df <- data.frame(weather, temp, golf)
Run Code Online (Sandbox Code Playgroud)
我现在想做的是使用朴素贝叶斯方法来获得这个新数据集的概率
df_new <- data.frame(weather = "good", temp = "low")
Run Code Online (Sandbox Code Playgroud)
所以我试试
library(e1071)
model <- naiveBayes(golf ~.,data=df)
predict(model, df_new)
Run Code Online (Sandbox Code Playgroud)
但这给了我:
NO
Run Code Online (Sandbox Code Playgroud)
知道我怎么能把它变成概率?
我有以下代码:
train_set = ("The sky is blue.", "The sun is bright.")
test_set = ("The sun in the sky is bright.",
"We can see the shining sun, the bright sun.")
Run Code Online (Sandbox Code Playgroud)
现在我试图计算这样的词频:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
Run Code Online (Sandbox Code Playgroud)
接下来我想打印词汇表。因此我这样做:
vectorizer.fit_transform(train_set)
print vectorizer.vocabulary
Run Code Online (Sandbox Code Playgroud)
现在我得到的输出没有。虽然我期待这样的事情:
{'blue': 0, 'sun': 1, 'bright': 2, 'sky': 3}
Run Code Online (Sandbox Code Playgroud)
任何想法哪里出了问题?
我尝试使用以下代码对数据进行洗牌。
import pandas as pd
import numpy as np
from sklearn.naive_bayes import MultinomialNB
data = pd.read_csv('dataset.txt')
np.random.shuffle(data)
Run Code Online (Sandbox Code Playgroud)
然而,运行它会给我以下错误。我不明白这个错误是从哪里来的。
Traceback (most recent call last):
File "sample2.py", line 12, in <module>
np.random.shuffle(data)
File "mtrand.pyx", line 4668, in mtrand.RandomState.shuffle (numpy/random /mtrand/mtrand.c:30498)
File "mtrand.pyx", line 4671, in mtrand.RandomState.shuffle (numpy/random/mtrand/mtrand.c:30438)
File "/Users/marcvanderpeet/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/frame.py", line 1992, in __getitem__
return self._getitem_column(key)
File "/Users/marcvanderpeet/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/frame.py", line 2004, in _getitem_column
result = result[key]
File "/Users/marcvanderpeet/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/frame.py", line 1992, in __getitem__
return self._getitem_column(key)
File "/Users/marcvanderpeet/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/frame.py", line 1999, in _getitem_column
return self._get_item_cache(key)
File "/Users/marcvanderpeet/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/core/generic.py", …Run Code Online (Sandbox Code Playgroud) python ×4
r ×3
pandas ×2
text-mining ×2
cygwin ×1
excel ×1
excel-vba ×1
ggplot2 ×1
naivebayes ×1
nlp ×1
nltk ×1
numpy ×1
scikit-learn ×1
tm ×1
vba ×1