Tokenize引用字符串

Hyp*_*eus 3 erlang tokenize

我试图标记字符串.只要没有引用字符一切都很好:

string:tokens ("abc def ghi", " ").
["abc","def","ghi"]
Run Code Online (Sandbox Code Playgroud)

但字符串:tokens/2确实对引用字符串有很大帮助.它的行为符合预期:

string:tokens ("abc \"def xyz\" ghi", " ").
["abc","\"def","xyz\"","ghi"]
Run Code Online (Sandbox Code Playgroud)

我需要的是一个函数,它将字符串标记化,分隔符和引号字符.就像是:

tokens ("abc \"def xyz\" ghi", " ", "\"").
["abc","def xyz","ghi"]
Run Code Online (Sandbox Code Playgroud)

在我开始重新发明轮子之前,我的问题是:

标准库中是否有这样的功能或类似功能?

编辑:

好吧,我编写了自己的实现,但我仍然对原始问题的答案非常感兴趣.到目前为止,这里是我的代码:

tokens (String) -> tokens (String, [], [] ).

tokens ( [], Tokens, Buffer) ->
    lists:map (fun (Token) -> string:strip (Token, both, $") end, Tokens ++ [Buffer] );

tokens ( [Character | String], Tokens, Buffer) ->
    case {Character, Buffer} of
        {$ , [] } -> tokens (String, Tokens, Buffer);
        {$ , [$" | _] } -> tokens (String, Tokens, Buffer ++ [Character] );
        {$ , _} -> tokens (String, Tokens ++ [Buffer], [] );
        {$", [] } -> tokens (String, Tokens, "\"" );
        {$", [$" | _] } -> tokens (String, Tokens ++ [Buffer ++ "\""], [] );
        {$", _} -> tokens (String, Tokens ++ [Buffer], "\"");
        _ -> tokens (String, Tokens, Buffer ++ [Character] )
    end.
Run Code Online (Sandbox Code Playgroud)

Dav*_*don 5

如果在一般情况下正则表达式是可接受的,您可以使用:

> re:split("abc \"def xyz\" ghi", " \"|\" ", [{return, list}]).
["abc","def xyz","ghi"]
Run Code Online (Sandbox Code Playgroud)

"\s\"|\"\s"如果要根据任何空格而不是空格进行拆分,也可以使用.

如果你碰巧从输入文件解析这一点,你可能需要使用strip_split/2一个EString.