小编Ser*_*tch的帖子

并行计算 - Shuffle

我正在寻找并行洗牌数组。我发现，执行类似于双调排序但随机 (50/50) 重新排序的算法会导致均匀分布，但前提是数组是 2 的幂。我考虑过 Yates Fisher Shuffle，但我可以不知道如何并行化它以避免 O(N) 计算。

有什么建议吗？

谢谢！

algorithm parallel-processing performance multithreading shuffle

pca*_*on2

2023 06-09

5
推荐指数

1
解决办法

2000
查看次数

QEmu中用于ARM-Ubuntu的黑屏(如何获取GUI？)

我在Windows 10上的Virtual Box上托管Ubuntu 16.04.在Ubuntu 16.04中,有QEmu模拟ARM处理器,运行Ubuntu Trusty(14.04).

当我按如下方式启动QEmu时,它会显示一个带标题的窗口QEMU,但是完全是黑色的客户区:

qemu-system-arm -smp 2 --drive format=raw,if=sd,file=vexpress-8G.img -kernel vmlinuz-3.13.0-24-generic-lpae -initrd initrd.img-3.13.0-24-generic-lpae -M vexpress-a15 -serial stdio -m 2048 -append 'root=/dev/mmcblk0 rw mem=2048M raid=noautodetect rootwait console=ttyAMA0,38400n8 devtmpfs.mount=0' -dtb ./vexpress-v2p-ca15-tc1.dtb

Run Code Online (Sandbox Code Playgroud)

客户操作系统(ARM-Ubuntu)的控制台工作,启动消息显示在qemu-system-arm运行命令的同一终端中.但是当我输入startx命令时,它会显示错误:

Loading extension GLX
(EE) 
Fatal server error:
(EE) no screens found(EE) 
(EE) 
Please consult the The X.Org Foundation support 
     at http://wiki.x.org
 for help. 
(EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
(EE) 
(EE) Server terminated with error …

Run Code Online (Sandbox Code Playgroud)

ubuntu arm qemu xserver xorg

Ser*_*tch

lucky-day

5
推荐指数

0
解决办法

1045
查看次数

执行`pip install mod_wsgi`时`无法打开包含文件：'apr_perms_set.h'`

我正在尝试使用 ApacheHaus 的 Apache 2.4.37 x64 OpenSSL 1.1.1 VC14在 Windows 10 上推出生产 Django 环境。但是，按照这些说明进行操作时，出现以下错误：

  C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -ID:/Servers/Web/Apache/Apache24/include -Ic:\programs\python37\include -Ic:\programs\python37\include "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.6.1\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.17763.0\cppwinrt" /Tcsrc/server\mod_wsgi.c /Fobuild\temp.win-amd64-3.7\Release\src/server\mod_wsgi.obj
  mod_wsgi.c
  d:\servers\web\apache\apache24\include\apr_network_io.h(29): fatal error C1083: Cannot open include file: 'apr_perms_set.h': No such …

Run Code Online (Sandbox Code Playgroud)

c python apache django mod-wsgi

Ser*_*tch

lucky-day

5
推荐指数

2
解决办法

1995
查看次数

在Python中使用readline()读取文件时如何检测EOF？

我需要逐行读取文件，readline()并且无法轻易更改它。大致是这样的：

with open(file_name, 'r') as i_file:
    while True:
        line = i_file.readline()
        # I need to check that EOF has not been reached, so that readline() really returned something

Run Code Online (Sandbox Code Playgroud)

真正的逻辑涉及更多，所以我无法立即读取文件readlines()或编写类似for line in i_file:.

有没有办法检查readline()EOF？它可能会抛出异常吗？

在互联网上找到答案非常困难，因为文档搜索重定向到一些不相关的内容（教程而不是参考资料或 GNU 阅读线），而且互联网上的噪音主要与功能有关readlines()。

该解决方案应该适用于 Python 3.6+。

python exception file readline eof

Ser*_*tch

lucky-day

5
推荐指数

1
解决办法

1万
查看次数

How to set bits of a bit vector efficiently in CUDA?

The task is like How to set bits of a bit vector efficiently in parallel?, but for CUDA.

Consider a bit vector of N bits in it (N is large, e.g. 4G) and an array of M numbers (M is also large, e.g. 1G), each in range 0..N-1 indicating which bit of the vector must be set to 1. The bit vector is just an array of integers, specifically uint32_t.

I've tried a naive implementation with …

c++ algorithm parallel-processing cuda bit-manipulation

Ser*_*tch

2022 08-09

5
推荐指数

0
解决办法

329
查看次数

如何像C++ const/constexpr一样定义CUDA设备常量？

在.cu文件中,我在全局范围内尝试了以下内容(即不在函数中):

__device__ static const double cdInf = HUGE_VAL / 4;

Run Code Online (Sandbox Code Playgroud)

并得到nvcc错误:

error : dynamic initialization is not supported for __device__, __constant__ and __shared__ variables.

Run Code Online (Sandbox Code Playgroud)

如果可能的话,如何在设备上定义C++ const/constexpr？

注1:#define不仅仅是出于美学原因,而且因为在实践中表达式更复杂并且涉及内部数据类型,而不仅仅是双重的,这是不可能的.因此,每次在每个CUDA线程中调用构造函数都会太昂贵.

注2:我怀疑__constant__它的性能,因为它不是一个编译时常量,而是一个用它写的变量cudaMemcpyToSymbol.

c++ cuda constants compile-time-constant

Ser*_*tch

2016 09-12

4
推荐指数

1
解决办法

2778
查看次数

详细cmake:如何获得更多诊断？

我从cmake得到一些奇怪的错误:

loading initial cache file ../../Tweaks/compiler-rt/arm.txt
-- Performing Test COMPILER_RT_HAS_FPIE_FLAG
CMake Error at CMakeLists.txt:2 (set):
  Syntax error in cmake code at

    D:/Work/AcSo/Views/llvmSecond/build-arm/compiler-rt/CMakeFiles/CMakeTmp/CMakeLists.txt:2

  when parsing string

    D:/Work/AcSo/Views/llvmSecond/llvm/projects/compiler-rt/cmake;D:/Work/AcSo/Views/llvmSecond/llvm/projects/compiler-rt/cmake/Modules;D:/Work/AcSo/Views/llvmSecond/llvm/cmake;D:\Work\AcSo\Views\llvmSecond\build-host\Release\lib\cmake\llvm

  Invalid character escape '\W'.


CMake Error: Internal CMake error, TryCompile configure of cmake failed
-- Configuring incomplete, errors occurred!
See also "D:/Work/AcSo/Views/llvmSecond/build-arm/compiler-rt/CMakeFiles/CMakeOutput.log".

Run Code Online (Sandbox Code Playgroud)

我能理解的是它以某种方式得到了一个Windows路径,然后抱怨\W字符序列.实际上我已经尽我所能给cmake Linux路径.所以我不知道Windows路径的来源.这个错误似乎没什么关系CMakeOutput.log

我想从cmake获得更多的诊断,即它是什么以及它为什么,但是搜索"cmake verbose"而不是给出关于制作make详细信息的结果,而不是cmake本身.

有没有办法强制cmake verbose /获得更多的诊断/调试或跟踪输出？或者如果你能说出我的具体案例中的问题,那也是值得赞赏的.

infrastructure build configure cmake verbose

Ser*_*tch

2017 01-16

4
推荐指数

1
解决办法

9476
查看次数

测试AVX寄存器是否包含一些相等的整数

考虑一个包含四个64位整数的256位寄存器.在AVX/AVX2中是否可以有效地测试这些整数中的一些是否相等？

例如:

a){43, 17, 25, 8}:结果必须是false因为4个数字中没有2个相等.

b){47, 17, 23, 17}:结果必须为"true",因为17在AVX向量寄存器中数字出现2次.

如果可能的话,我想在C++中这样做,但如果有必要,我可以下载到汇编.

c++ x86 simd avx avx2

Ser*_*tch

2017 06-16

4
推荐指数

1
解决办法

400
查看次数

在Windows上以编程方式设置堆栈大小

是否有可能在WinAPI中为运行时的当前线程设置堆栈大小,就像setrlimit在Linux上一样？我的意思是增加当前线程的保留堆栈大小,如果它对于当前要求来说太小了.这是一个可以被其他编程语言的线程调用的库,因此在编译时不能选择设置堆栈大小.

如果没有,任何关于像汇编蹦床这样的解决方案的想法会将堆栈指针更改为动态分配的内存块？

常见问题:代理线程是一个万无一失的解决方案(除非调用者线程的堆栈非常小).但是,线程切换似乎是性能杀手.我需要大量的堆栈用于递归或_alloca.这也是为了提高性能,因为堆分配很慢,特别是如果多个线程并行地从堆中分配(它们被相同的libc/ CRT互斥锁阻塞,因此代码变为串行).

c++ memory winapi stack setrlimit

Ser*_*tch

2017 07-24

4
推荐指数

2
解决办法

969
查看次数

什么时候在Python中初始化类变量？

考虑以下Python 3代码：

class A:
    b = LongRunningFunctionWithSideEffects()

Run Code Online (Sandbox Code Playgroud)

什么时候会LongRunningFunctionWithSideEffects()叫？目前该模块已导入？还是目前以某种方式首次使用该类？

python static initialization class

Ser*_*tch

2019 01-01

4
推荐指数

2
解决办法

1201
查看次数

标签统计

c++ ×4

python ×3

algorithm ×2

cuda ×2

parallel-processing ×2

apache ×1

arm ×1

avx ×1

avx2 ×1

bit-manipulation ×1

build ×1

c ×1

class ×1

cmake ×1

compile-time-constant ×1

configure ×1

constants ×1

django ×1

eof ×1

exception ×1

file ×1

infrastructure ×1

initialization ×1

memory ×1

mod-wsgi ×1

multithreading ×1

performance ×1

qemu ×1

readline ×1

setrlimit ×1

shuffle ×1

simd ×1

stack ×1

static ×1

ubuntu ×1

verbose ×1

winapi ×1

x86 ×1

xorg ×1

xserver ×1

标签 统计

小编Ser_tch的帖子

标签统计