小编lit*_*ely的帖子

缩放数据时,为什么训练数据集使用'fit'和'transform',但测试数据集只使用'transform'？

SAMPLE_COUNT = 5000
TEST_COUNT = 20000
seed(0)
sample = list()
test_sample = list()
for index, line in enumerate(open('covtype.data','rb')):
    if index < SAMPLE_COUNT:
        sample.append(line)
    else:
        r = randint(0,index)
        if r < SAMPLE_COUNT:
            sample[r] = line
        else:
            k = randint(0,index)
            if k < TEST_COUNT:
                if len(test_sample) < TEST_COUNT:
                    test_sample.append(line)
                else:
                    test_sample[k] = line
from sklearn.preprocessing import StandardScaler
for n, line in enumerate(sample):
sample[n] = map(float, line.strip().split(','))
y = np.array(sample)[:,-1]
scaling = StandardScaler()

X = scaling.fit_transform(np.array(sample)[:,:-1]) ##here use fit and transform

for …

Run Code Online (Sandbox Code Playgroud)

python scikit-learn

lit*_*ely

lucky-day

17
推荐指数

3
解决办法

1万
查看次数

TypeError:unhashable类型:在python中使用groupby时的'list'

使用groupby方法时出错了:

data = pd.Series(np.random.randn(100),index=pd.date_range('01/01/2001',periods=100))
keys = lambda x: [x.year,x.month]
data.groupby(keys).mean()

Run Code Online (Sandbox Code Playgroud)

但它有一个错误:TypeError:unhashable type:'list'.我想按年和月分组,然后计算方法,为什么有错？

python python-2.7 pandas pandas-groupby

lit*_*ely

2017 06-04

12
推荐指数

2
解决办法

6586
查看次数

在RMarkdown html文档中显示python可绘制图形

为什么python的plotly包不能在RMarkdown中显示图，而matplotlib可以显示图？例如：

 ```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
```

```{r}
library(plotly)
subplot(
     plot_ly(mpg, x = ~cty, y = ~hwy, name = 'default'),
     plot_ly(mpg, x = ~cty, y = ~hwy) %>%
         add_markers(alpha = 0.2, name = 'alpha'),
     plot_ly(mpg, x = ~cty, y = ~hwy) %>%
         add_markers(symbols = I(1), name = 'hollow')
 )
```

```{python}
import plotly
import plotly.plotly as py
import plotly.graph_objs as go
import numpy as np

plotly.tools.set_credentials_file(username='xxx', api_key='xxx')

N = 500
trace0 = go.Scatter(x …

Run Code Online (Sandbox Code Playgroud)

python r r-markdown plotly

lit*_*ely

2018 05-06

4
推荐指数

1
解决办法

429
查看次数

类型错误：“函数”对象在张量流中不可下标

使用tensorflow.Varaible存在一些错误：

import tensorflow as tf
sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32,[None, 784])
W = tf.Variable(tf.zeros[784,10])
b = tf.Variable(tf.zeros[10])

Run Code Online (Sandbox Code Playgroud)

但它显示错误：

TypeError:Traceback (most recent call last)
<ipython-input-8-3086abe5ee8f> in <module>()
----> 1 W = tf.Variable(tf.zeros[784,10])
  2 b = tf.Variable(tf.zeros[10])

Run Code Online (Sandbox Code Playgroud)

类型错误：“函数”对象不可下标

我不知道哪里错了，有人可以帮助我吗？（tensorflow的版本是0.12.0）

python-3.x tensorflow

lit*_*ely

2017 06-01

3
推荐指数

1
解决办法

7044
查看次数

Spark中的partitionBy和groupBy有什么区别

我有一个 pyspark rdd，它可以收集为元组列表，如下所示：

rdds = self.sc.parallelize([(("good", "spark"), 1), (("sood", "hpark"), 1), (("god", "spak"), 1),
                                (("food", "spark"), 1), (("fggood", "ssspark"), 1), (("xd", "hk"), 1),
                                (("good", "spark"), 7), (("good", "spark"), 3), (("good", "spark"), 4),
                                (("sood", "hpark"), 5), (("sood", "hpark"), 7), (("xd", "hk"), 2),
                                (("xd", "hk"), 1), (("fggood", "ssspark"), 2), (("fggood", "ssspark"), 1)], 6)
rdds.glom().collect()

def inner_map_1(p):
    d = defaultdict(int)
    for row in p:
        d[row[0]] += row[1]
    for item in d.items():
        yield item

rdd2 = rdds.partitionBy(4, partitionFunc=lambda x: hash(x)).mapPartitions(inner_map_1)
print(rdd2.glom().collect())

def inner_map_2(p):
    for …

Run Code Online (Sandbox Code Playgroud)

python apache-spark

lit*_*ely

lucky-day

3
推荐指数

1
解决办法

8401
查看次数

elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', '没有在字段 [texts] 上声明的类型 [string] 的处理程序')

我使用elasticsearch python api来创建映射，但是出了点问题：

es = Elasticsearch("localhost:9200")
request_body = {
    "settings": {
        "number_of_shards": 5,
        "number_of_replicas": 1
    },
    'mappings': {
        'examplecase': {
            'properties': {
                'tbl_id': {'index': 'not_analyzed', 'type': 'string'},
                'texts': {'index': 'analyzed', 'type': 'string'},
            }
        }
    }
}
es.indices.create(index='example_index', body=request_body)

Run Code Online (Sandbox Code Playgroud)

它显示elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', 'No handler for type [string] declared on field [texts]')，我找到了一些他们说的解决方案：使用text而不是string在字段类型中，但它也出错了：elasticsearch.exceptions.RequestError: RequestError(400, 'mapper_parsing_exception', 'Failed to parse mapping [examplecase]: Could not convert [texts.index] to boolean'). The elasticsearch version iselasticsearch-6.5.4. How can I deal …

python elasticsearch

lit*_*ely

lucky-day

2
推荐指数

1
解决办法

1万
查看次数

从python中的另一个列表中获取列表的相应元素时,获取错误的列表使用map函数

我有两个清单:

a = [1,2,3,4]
b = [True,False,True,False]

Run Code Online (Sandbox Code Playgroud)

我希望得到的元素a是coresponding到True中b,不使用for循环.我使用该map函数来解决它,但它出错了:

def f(x,y):
    if x:
        return y
s = list(map(f,b,a))

Run Code Online (Sandbox Code Playgroud)

s是的[1, None, 3, None],我不想要None,我该怎么办？

python list

lit*_*ely

2018 08-14

1
推荐指数

1
解决办法

71
查看次数

why it has a NaN value when cut the data to bins

I encounter a question:

why it has a NaN value

python ipython

lit*_*ely

lucky-day

-4
推荐指数

1
解决办法

1952
查看次数

标签统计

python ×7

apache-spark ×1

elasticsearch ×1

ipython ×1

list ×1

pandas ×1

pandas-groupby ×1

plotly ×1

python-2.7 ×1

python-3.x ×1

r ×1

r-markdown ×1

scikit-learn ×1

tensorflow ×1

标签 统计

小编lit_ely的帖子

标签统计