mxp*_*usb 15 chunking go slice
我有一个包含约210万个日志字符串的切片,我想创建一个切片,其中字符串尽可能均匀分布.
这是我到目前为止:
// logs is a slice with ~2.1 million strings in it.
var divided = make([][]string, 0)
NumCPU := runtime.NumCPU()
ChunkSize := len(logs) / NumCPU
for i := 0; i < NumCPU; i++ {
temp := make([]string, 0)
idx := i * ChunkSize
end := i * ChunkSize + ChunkSize
for x := range logs[idx:end] {
temp = append(temp, logs[x])
}
if i == NumCPU {
for x := range logs[idx:] {
temp = append(temp, logs[x])
}
}
divided = append(divided, temp)
}
Run Code Online (Sandbox Code Playgroud)
这idx := i * ChunkSize将给我当前的logs索引的"块开始" ,end := i * ChunkSize + ChunkSize并将给我"块结束",或该块的范围的结束.我找不到任何关于如何在Go中分块或分割切片或迭代有限范围的文档或示例,所以这就是我提出的.但是,它只复制第一个块多次,因此不起作用.
我如何(尽可能均匀地)在Go中切片?
Jim*_*imB 46
您不需要创建新切片,只需将切片附加logs到divided切片上即可.
http://play.golang.org/p/vyihJZlDVy
var divided [][]string
chunkSize := (len(logs) + numCPU - 1) / numCPU
for i := 0; i < len(logs); i += chunkSize {
end := i + chunkSize
if end > len(logs) {
end = len(logs)
}
divided = append(divided, logs[i:end])
}
fmt.Printf("%#v\n", divided)
Run Code Online (Sandbox Code Playgroud)
Alf*_*rga 17
使用泛型(Go 版本 >=1.18):
func chunkBy[T any](items []T, chunkSize int) (chunks [][]T) {
for chunkSize < len(items) {
items, chunks = items[chunkSize:], append(chunks, items[0:chunkSize:chunkSize])
}
return append(chunks, items)
}
Run Code Online (Sandbox Code Playgroud)
或者如果您想手动设置容量:
func chunkBy[T any](items []T, chunkSize int) [][]T {
var _chunks = make([][]T, 0, (len(items)/chunkSize)+1)
for chunkSize < len(items) {
items, _chunks = items[chunkSize:], append(_chunks, items[0:chunkSize:chunkSize])
}
return append(_chunks, items)
}
Run Code Online (Sandbox Code Playgroud)
另一种变体。它的工作速度比JimB提出的速度大约快 2.5 倍。测试和基准测试都在这里。
https://play.golang.org/p/WoXHqGjozMI
func chunks(xs []string, chunkSize int) [][]string {
if len(xs) == 0 {
return nil
}
divided := make([][]string, (len(xs)+chunkSize-1)/chunkSize)
prev := 0
i := 0
till := len(xs) - chunkSize
for prev < till {
next := prev + chunkSize
divided[i] = xs[prev:next]
prev = next
i++
}
divided[i] = xs[prev:]
return divided
}
Run Code Online (Sandbox Code Playgroud)
每切片技巧
以最小分配进行批处理
如果您想对大切片进行批处理,这很有用。
actions := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
batchSize := 3
batches := make([][]int, 0, (len(actions) + batchSize - 1) / batchSize)
for batchSize < len(actions) {
actions, batches = actions[batchSize:], append(batches, actions[0:batchSize:batchSize])
}
batches = append(batches, actions)
Run Code Online (Sandbox Code Playgroud)
产生以下结果:
[[0 1 2] [3 4 5] [6 7 8] [9]]
Run Code Online (Sandbox Code Playgroud)