我正在尝试使用SkLearn Bayes分类.
gnb = GaussianNB()
gnb.set_params('sigma__0.2')
gnb.fit(np.transpose([xn, yn]), y)
Run Code Online (Sandbox Code Playgroud)
但我得到:
set_params() takes exactly 1 argument (2 given)
Run Code Online (Sandbox Code Playgroud)
现在我尝试使用此代码:
gnb = GaussianNB()
arr = np.zeros((len(labs),len(y)))
arr.fill(sigma)
gnb.set_params(sigma_ = arr)
Run Code Online (Sandbox Code Playgroud)
得到:
ValueError: Invalid parameter sigma_ for estimator GaussianNB
Run Code Online (Sandbox Code Playgroud)
参数名称或值是错误的吗?
我有一个类及其方法。该方法在执行过程中会重复多次。该方法使用numpy
数组作为临时缓冲区。我不需要在方法调用之间将值存储在缓冲区内。我是否应该创建数组的成员实例以避免方法执行期间内存分配的时间泄漏?我知道最好使用局部变量。但是 Python 是否足够聪明,只为数组分配一次内存?
class MyClass:
def __init__(self, n):
self.temp = numpy.zeros(n)
def method(self):
# do some stuff using self.temp
Run Code Online (Sandbox Code Playgroud)
或者
class MyClass:
def __init__(self, n):
self.n = n
def method(self):
temp = numpy.zeros(self.n)
# do some stuff using temp
Run Code Online (Sandbox Code Playgroud)
更新:替换np.empty
为np.zeros
我使用这个启动gdb(工作目录是/ home/leon/Develop/tests/atomic /):
gdb ./bin/lin64/httpress
Run Code Online (Sandbox Code Playgroud)
然后我添加了源文件的目录,它理解我:
Source directories searched: /home/leon/Develop/tests/atomic/third/http_parser:/home/leon/Develop/tests/atomic/src/tools:$cdir:$cwd
Run Code Online (Sandbox Code Playgroud)
当我运行我的二进制文件时,gdb无法识别我的源代码中发生segfault的行.如何设置gdb的源文件?
该程序由gcc编译:
gcc -D_AMD64_ -D_LIN_ -D_LARGEFILE64_SOURCE -D_GNU_SOURCE -m64 -march=core2 -O2 -Wall -I. -I src/include -I src/lib/zlib/ -I src/lib/otg -I third/openssl/include/ -I src/lib/otg/Tools/HostTime/Interfaces/ -I src/lib/otg/Tools/OpenToolsGate/Guest/Interfaces/ -I src/lib/otg/Tools/OpenToolsGate/Guest/Cross -I src/lib/otg/Tools/OpenToolsGate/Common/Interfaces/ -o bin/lin64/httpress -std=c99 -lpthread -lev -lgnutls -O2 -s -DWITH_SSL -Wno-strict-aliasing \
-I /usr/include/libev src/tools/httpress.c -I third/http_parser/ third/http_parser/http_parser.c
Run Code Online (Sandbox Code Playgroud)
好的,我做了一些改变:
gcc -g -ggdb -D_AMD64_ -D_LIN_ -D_LARGEFILE64_SOURCE -D_GNU_SOURCE -m64 -march=core2 -Wall -I. -I src/include -I src/lib/zlib/ -I src/lib/otg -I third/openssl/include/ -I src/lib/otg/Tools/HostTime/Interfaces/ -I src/lib/otg/Tools/OpenToolsGate/Guest/Interfaces/ -I …
Run Code Online (Sandbox Code Playgroud) 码:
views = sdf \
.where(sdf['PRODUCT_ID'].isin(PRODUCTS)) \
.rdd \
.groupBy(lambda x: x['SESSION_ID']) \
.toLocalIterator()
for sess_id, rows in views:
# do something
Run Code Online (Sandbox Code Playgroud)
PRODUCTS
是一个set
.它很大,约10000件.
代码失败了:
--> 9 for sess_id, rows in views:
/usr/local/spark/python/pyspark/rdd.py in _load_from_socket(port, serializer)
--> 142 for item in serializer.load_stream(rf):
/usr/local/spark/python/pyspark/serializers.py in load_stream(self, stream)
--> 139 yield self._read_with_length(stream)
/usr/local/spark/python/pyspark/serializers.py in _read_with_length(self, stream)
--> 156 length = read_int(stream)
/usr/local/spark/python/pyspark/serializers.py in read_int(stream)
--> 543 length = stream.read(4)
/opt/conda/lib/python3.5/socket.py in readinto(self, b)
574 try:
--> 575 return self._sock.recv_into(b)
576 …
Run Code Online (Sandbox Code Playgroud) 当它抛出我们提供的一些异常,保存它的状态,询问用户的东西然后从保存的地方继续递归时,是否有可能停止递归算法?
我改变了问题.
我递归地读取文件系统并将数据保存在树中.突然,我面对一个隐藏的目录.我可以停止计算并询问用户是否应该在树中放置有关目录的信息然后继续计算?
关于使用IO:
obtainTree :: ByteString -> Tree
...
main = print $ obtainTree partition
Run Code Online (Sandbox Code Playgroud)
据我所知,在算法中使用IO我们必须使用这样的函数:
obtainTree :: ByteString -> IO Tree
Run Code Online (Sandbox Code Playgroud)
但我们可以避免吗?
python ×2
apache-spark ×1
c ×1
debugging ×1
gcc ×1
gdb ×1
haskell ×1
io ×1
ipython ×1
memory ×1
numpy ×1
performance ×1
pyspark ×1
python-3.x ×1
recursion ×1
scikit-learn ×1