这是我的代码:
import numpy as np
print(np.std(np.array([0,1])))
Run Code Online (Sandbox Code Playgroud)
它产生 0.5
我确信这是不正确的.我究竟做错了什么?
我无法理解以下输出.我希望Numpy回归-10
(或近似).为什么这是一个复杂的数字?
print((-1000)**(1/3.))
Run Code Online (Sandbox Code Playgroud)
Numpy回答
(5+8.660254037844384j)
Run Code Online (Sandbox Code Playgroud)
Numpy官方教程说答案是nan
.您可以在本教程的中间找到它.
它看起来像 scipy.spatial.distance.cdist 余弦相似距离:
1 - u*v/(||u||||v||)
Run Code Online (Sandbox Code Playgroud)
与 sklearn.metrics.pairwise.cosine_similarity 不同,后者是
u*v/||u||||v||
Run Code Online (Sandbox Code Playgroud)
有人知道不同定义的原因吗?
这是我的例子
mydf<-data.frame('col_1'=c('A','A','B','B'), 'col_2'=c(100,NA, 90,30))
Run Code Online (Sandbox Code Playgroud)
我想分组col_1
并计算非NA元素col_2
我想这样做dplyr
.
以下是我搜索SO后尝试的内容:
mydf %>% group_by(col_1) %>% summarise_each(funs(!is.na(col_2)))
mydf %>% group_by(col_1) %>% mutate(non_na_count = length(col_2, na.rm=TRUE))
mydf %>% group_by(col_1) %>% mutate(non_na_count = count(col_2, na.rm=TRUE))
Run Code Online (Sandbox Code Playgroud)
没有任何效果.有什么建议?
以下是MongoDB教程中的一个示例(此处为收集邮政编码db:
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
] )
Run Code Online (Sandbox Code Playgroud)
如果我_id
用其他类似的东西替换Test
,我会得到错误信息:
"errmsg" : "exception: the group aggregate field 'Test' must be defined as an expression inside an object",
"code" : 15951,
"ok" : 0
Run Code Online (Sandbox Code Playgroud)
有人能帮我理解为什么_id
我的命令需要吗?我认为MongoDB会自动分配ID,如果使用不提供ID.
我观察到scikit-learn clf.tree_.feature偶尔返回负值.例如-2.据我所知,clf.tree_.feature应该返回功能的连续顺序.如果我们有一系列的特征名称
['feature_one', 'feature_two', 'feature_three']
,那么-2将引用feature_two
.我对使用负指数感到惊讶.feature_two
通过索引1 来引用更有意义.(-2是便于人类消化的参考,不适用于机器处理).我读得对吗?
更新:这是一个例子:
def leaf_ordering():
X = np.genfromtxt('X.csv', delimiter=',')
Y = np.genfromtxt('Y.csv',delimiter=',')
dt = DecisionTreeClassifier(min_samples_leaf=10, random_state=99)
dt.fit(X, Y)
print(dt.tree_.feature)
Run Code Online (Sandbox Code Playgroud)
这是输出:
[ 8 9 -2 -2 9 4 -2 9 8 -2 -2 0 0 9 9 8 -2 -2 9 -2 -2 6 -2 -2 -2
2 -2 9 8 6 9 -2 -2 -2 8 9 -2 9 6 -2 -2 -2 6 -2 -2 …
Run Code Online (Sandbox Code Playgroud) 这是我的代码:
import pandas as pd
df = pd.DataFrame(columns = ["A", "B"])
df.iloc[0]['A'] = 5
Run Code Online (Sandbox Code Playgroud)
这是输出:
Traceback (most recent call last):
File "K:/Dop/Pentas/Simpletest/Temp.py", line 38, in <module>
df.iloc[0]['A'] = 5
File "C:\Python34\lib\site-packages\pandas\core\indexing.py", line 1189, in __getitem__
return self._getitem_axis(key, axis=0)
File "C:\Python34\lib\site-packages\pandas\core\indexing.py", line 1480, in _getitem_axis
return self._get_loc(key, axis=axis)
File "C:\Python34\lib\site-packages\pandas\core\indexing.py", line 89, in _get_loc
return self.obj._ixs(key, axis=axis)
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 1719, in _ixs
label = self.index[i]
File "C:\Python34\lib\site-packages\pandas\core\index.py", line 1076, in __getitem__
return getitem(key)
IndexError: index 0 is out of bounds …
Run Code Online (Sandbox Code Playgroud) 我正在尝试将Pandas的DataFrame写入SQL Server表.这是我的例子:
import pyodbc
import pandas as pd
import sqlalchemy
df = pd.DataFrame({'MDN': [242342342] })
engine = sqlalchemy.create_engine('mssql://localhost/Sandbox?trusted_connection=yes')
df.to_sql('Test',engine, if_exists = 'append',index = False)
Run Code Online (Sandbox Code Playgroud)
我收到以下错误消息.有关如何修复的任何想法?
c:\python34\lib\site-packages\sqlalchemy\connectors\pyodbc.py:82: SAWarning: No driver name specified; this is expected by PyODBC when using DSN-less connections
"No driver name specified; "
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-25-78677a18ce2d> in <module>()
4 engine = sqlalchemy.create_engine('mssql://localhost/Sandbox?trusted_connection=yes')
5
----> 6 df.to_sql('Test',engine, if_exists = 'append',index = False)
7
8 #cnxn.close()
c:\python34\lib\site-packages\pandas\core\generic.py in to_sql(self, name, con, flavor, schema, if_exists, index, …
Run Code Online (Sandbox Code Playgroud) numpy 有一个漂亮的函数,可以生成多维网格。当维数较低并且事先已知时,使用它很容易,但是当维数仅在执行时已知或维数很大且输入时间过长时该怎么办。我想我正在寻找类似的东西
import numpy as np
x = np.meshgrid(y)
Run Code Online (Sandbox Code Playgroud)
其中 y 是评估点数组的数组,例如
y = [array([-3., 0., 3.]) array([-3., 0., 3.]) array([-3., 0., 3.])]
Run Code Online (Sandbox Code Playgroud)
建议?
我读了以下SO thead,现在我想了解它.这是我的例子:
import dask.dataframe as dd
import pandas as pd
from dask.multiprocessing import get
import random
df = pd.DataFrame({'col_1':random.sample(range(10000), 10000), 'col_2': random.sample(range(10000), 10000) })
def test_f(col_1, col_2):
return col_1*col_2
ddf = dd.from_pandas(df, npartitions=8)
ddf['result'] = ddf.map_partitions(test_f, columns=['col_1', 'col_2']).compute(get=get)
Run Code Online (Sandbox Code Playgroud)
它会在下面生成以下错误.我究竟做错了什么?另外我不清楚如何将其他参数传递给函数map_partitions
?
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\dask\dataframe\utils.py in raise_on_meta_error(funcname)
136 try:
--> 137 yield
138 except Exception as e:
~\AppData\Local\conda\conda\envs\tensorflow\lib\site-packages\dask\dataframe\core.py in _emulate(func, *args, **kwargs)
3130 with raise_on_meta_error(funcname(func)):
-> 3131 return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
3132
TypeError: test_f() …
Run Code Online (Sandbox Code Playgroud)