bufio.Reader 和 bufio.Scanner 的功能和性能

blg*_*boy 5 go

我在网上看到了几篇关于为什么应该使用 bufio.Scanner 而不是 bufio.Reader 的宣传片。

我不知道我的测试用例是否相关,但是在从文本文件中读取 1,000,000 行时,我决​​定测试一个与另一个:

package main

import (
    "fmt"
    "strconv"
    "bufio"
    "time"
    "os"
    //"bytes"
)

func main() {

    fileName := "testfile.txt"

    // Create 1,000,000 integers as strings
    numItems := 1000000
    startInitStringArray := time.Now()

    var input [1000000]string
    //var input []string

    for i:=0; i < numItems; i++ {
        input[i] = strconv.Itoa(i)
        //input = append(input,strconv.Itoa(i))
    }

    elapsedInitStringArray := time.Since(startInitStringArray)
    fmt.Printf("Took %s to populate string array.\n", elapsedInitStringArray)

    // Write to a file
    fo, _ := os.Create(fileName)
    for i:=0; i < numItems; i++ {
        fo.WriteString(input[i] + "\n")
    }

    fo.Close()

    // Use reader
    openedFile, _ := os.Open(fileName)

    startReader := time.Now()
    reader := bufio.NewReader(openedFile)

    for i:=0; i < numItems; i++ {
        reader.ReadLine()
    }
    elapsedReader := time.Since(startReader)
    fmt.Printf("Took %s to read file using reader.\n", elapsedReader)
    openedFile.Close()

    // Use scanner
    openedFile, _ = os.Open(fileName)

    startScanner := time.Now()
    scanner := bufio.NewScanner(openedFile)

    for i:=0; i < numItems; i++ {
        scanner.Scan()
        scanner.Text()
    }

    elapsedScanner := time.Since(startScanner)
    fmt.Printf("Took %s to read file using scanner.\n", elapsedScanner)
    openedFile.Close()
}
Run Code Online (Sandbox Code Playgroud)

我在时序上收到的相当平均的输出如下所示:

Took 44.1165ms to populate string array.
Took 17.0465ms to read file using reader.
Took 23.0613ms to read file using scanner.
Run Code Online (Sandbox Code Playgroud)

我很好奇,什么时候使用阅读器和扫描仪更好,是基于性能还是功能?

pet*_*rSO 6

这是一个有缺陷的基准。他们没有做同样的事情。

func (b *Reader) ReadLine() (line []byte, isPrefix bool, err error)
Run Code Online (Sandbox Code Playgroud)

返回[]byte

func (s *Scanner) Text() string
Run Code Online (Sandbox Code Playgroud)

返回 string([]byte)

为了比较,使用,

func (s *Scanner) Bytes() []byte
Run Code Online (Sandbox Code Playgroud)

这是一个有缺陷的基准。它读取短字符串,从“ 0\n”到“ 999999\n”的整数。真实世界的数据集是什么样的?

在现实世界中,我们读到莎士比亚:http://www.gutenberg.org/ebooks/100:纯文本UTF-8: pg100.txt

Took 2.973307ms to read file using reader.   size: 5340315 lines: 124787
Took 2.940388ms to read file using scanner.  size: 5340315 lines: 124787
Run Code Online (Sandbox Code Playgroud)