替换 ByteString 中的换行符

jor*_*gen 5 haskell replace pattern-matching bytestring

我想一个函数,它接受一个字节串并替换换行\n,并\n\r用逗号,但想不到的一个很好的方式来做到这一点。

import qualified Data.ByteString as BS
import Data.Char (ord) 
import Data.Word (Word8)

endlWord8 = fromIntegral $ ord '\n' :: Word8

replace :: BS.ByteString -> BS.ByteString
Run Code Online (Sandbox Code Playgroud)

我想过使用BS.map但不知道如何使用,因为我无法在Word8's上进行模式匹配。另一种选择是BS.split然后加入 Word8 逗号,但这听起来又慢又不雅。有任何想法吗?

typ*_*ris 2

用于Data.ByteString.Char8摆脱原本Word8必须Char执行的令人讨厌的转换。根据Data.ByteString.Char8 第一句话的性能不应改变。

另外,您还可以使用B.span代替,B.split因为您还想替换\n\r组合,而不仅仅是替换\n

我自己(可能很笨拙)尝试这样做:

module Test where

import Data.Monoid ((<>))
import Data.ByteString.Char8 (ByteString)
import qualified Data.ByteString.Char8 as B
import qualified Data.ByteString.Builder as Build
import qualified Data.ByteString.Lazy as LB

eatNewline :: ByteString -> (Maybe Char, ByteString)
eatNewline string
  | B.null string = (Nothing, string)
  | B.head string == '\n' && B.null (B.tail string) = (Just ',', B.empty)
  | B.head string == '\n' && B.head (B.tail string) /= '\r' = (Just ',', B.drop 1 string)
  | B.head string == '\n' && B.head (B.tail string) == '\r' = (Just ',', B.drop 2 string)
  | otherwise = (Nothing, string)

replaceNewlines :: ByteString -> ByteString
replaceNewlines = LB.toStrict . Build.toLazyByteString . go mempty
  where
    go :: Build.Builder -> ByteString -> Build.Builder
    go builder string = let (chunk, rest) = B.span (/= '\n') string
                            (c, rest1)    = eatNewline rest
                            maybeComma    = maybe mempty Build.char8 c
                        in if B.null rest1 then
                             builder <> Build.byteString chunk <> maybeComma
                           else
                             go (builder <> Build.byteString chunk <> maybeComma) rest1
Run Code Online (Sandbox Code Playgroud)

希望mappendfor与其操作数之一已使用的Data.ByteString.Builder次数不是线性的mappend,否则,这里将出现二次算法。