缩放数据时,为什么训练数据集使用'fit'和'transform',但测试数据集只使用'transform'?
SAMPLE_COUNT = 5000
TEST_COUNT = 20000
seed(0)
sample = list()
test_sample = list()
for index, line in enumerate(open('covtype.data','rb')):
if index < SAMPLE_COUNT:
sample.append(line)
else:
r = randint(0,index)
if r < SAMPLE_COUNT:
sample[r] = line
else:
k = randint(0,index)
if k < TEST_COUNT:
if len(test_sample) < TEST_COUNT:
test_sample.append(line)
else:
test_sample[k] = line
from sklearn.preprocessing import StandardScaler
for n, line in enumerate(sample):
sample[n] = map(float, line.strip().split(','))
y = np.array(sample)[:,-1]
scaling = StandardScaler()
X = scaling.fit_transform(np.array(sample)[:,:-1]) ##here use fit and transform
for …Run Code Online (Sandbox Code Playgroud) 使用groupby方法时出错了:
data = pd.Series(np.random.randn(100),index=pd.date_range('01/01/2001',periods=100))
keys = lambda x: [x.year,x.month]
data.groupby(keys).mean()
Run Code Online (Sandbox Code Playgroud)
但它有一个错误:TypeError:unhashable type:'list'.我想按年和月分组,然后计算方法,为什么有错?
为什么python的plotly包不能在RMarkdown中显示图,而matplotlib可以显示图?例如:
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
```
```{r}
library(plotly)
subplot(
plot_ly(mpg, x = ~cty, y = ~hwy, name = 'default'),
plot_ly(mpg, x = ~cty, y = ~hwy) %>%
add_markers(alpha = 0.2, name = 'alpha'),
plot_ly(mpg, x = ~cty, y = ~hwy) %>%
add_markers(symbols = I(1), name = 'hollow')
)
```
```{python}
import plotly
import plotly.plotly as py
import plotly.graph_objs as go
import numpy as np
plotly.tools.set_credentials_file(username='xxx', api_key='xxx')
N = 500
trace0 = go.Scatter(x …Run Code Online (Sandbox Code Playgroud) 使用tensorflow.Varaible存在一些错误:
import tensorflow as tf
sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32,[None, 784])
W = tf.Variable(tf.zeros[784,10])
b = tf.Variable(tf.zeros[10])
Run Code Online (Sandbox Code Playgroud)
但它显示错误:
TypeError:Traceback (most recent call last)
<ipython-input-8-3086abe5ee8f> in <module>()
----> 1 W = tf.Variable(tf.zeros[784,10])
2 b = tf.Variable(tf.zeros[10])
Run Code Online (Sandbox Code Playgroud)
类型错误:“函数”对象不可下标
我不知道哪里错了,有人可以帮助我吗?(tensorflow的版本是0.12.0)
我有一个 pyspark rdd,它可以收集为元组列表,如下所示:
rdds = self.sc.parallelize([(("good", "spark"), 1), (("sood", "hpark"), 1), (("god", "spak"), 1),
(("food", "spark"), 1), (("fggood", "ssspark"), 1), (("xd", "hk"), 1),
(("good", "spark"), 7), (("good", "spark"), 3), (("good", "spark"), 4),
(("sood", "hpark"), 5), (("sood", "hpark"), 7), (("xd", "hk"), 2),
(("xd", "hk"), 1), (("fggood", "ssspark"), 2), (("fggood", "ssspark"), 1)], 6)
rdds.glom().collect()
def inner_map_1(p):
d = defaultdict(int)
for row in p:
d[row[0]] += row[1]
for item in d.items():
yield item
rdd2 = rdds.partitionBy(4, partitionFunc=lambda x: hash(x)).mapPartitions(inner_map_1)
print(rdd2.glom().collect())
def inner_map_2(p):
for …Run Code Online (Sandbox Code Playgroud) 我使用elasticsearch python api来创建映射,但是出了点问题:
es = Elasticsearch("localhost:9200")
request_body = {
"settings": {
"number_of_shards": 5,
"number_of_replicas": 1
},
'mappings': {
'examplecase': {
'properties': {
'tbl_id': {'index': 'not_analyzed', 'type': 'string'},
'texts': {'index': 'analyzed', 'type': 'string'},
}
}
}
}
es.indices.create(index='example_index', body=request_body)
Run Code Online (Sandbox Code Playgroud)
它显示elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', 'No handler for type [string] declared on field [texts]'),我找到了一些他们说的解决方案:使用text而不是string在字段类型中,但它也出错了:elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', 'Failed to parse mapping [examplecase]: Could not convert [texts.index] to boolean'). The elasticsearch version iselasticsearch-6.5.4. How can I deal …
我有两个清单:
a = [1,2,3,4]
b = [True,False,True,False]
Run Code Online (Sandbox Code Playgroud)
我希望得到的元素a是coresponding到True中b,不使用for循环.我使用该map函数来解决它,但它出错了:
def f(x,y):
if x:
return y
s = list(map(f,b,a))
Run Code Online (Sandbox Code Playgroud)
s是的[1, None, 3, None],我不想要None,我该怎么办?
python ×7
apache-spark ×1
ipython ×1
list ×1
pandas ×1
plotly ×1
python-2.7 ×1
python-3.x ×1
r ×1
r-markdown ×1
scikit-learn ×1
tensorflow ×1