正则表达式不尊重双引号

khi*_*ter 2 javascript regex

我正在尝试拆分以下ELB条目:

2018-04-16T08:09:27.203Z cae70dd2-414c-11e8-836a-354cb4985a41 https 2018-04-15T01:20:31.092381Z app/MBM-L-Publi-V9D386A91UNR/4695f2e72859f540 128.121.50.133:59367 10.0.1.14:80 0.001 0.003 0.000 200 200 934 282 "GET https://www.domain.tld:443/__utm.gif?v=1&_v=j66&a=1866784098&t=pageview&_s=1&dl=https%3A%2F%2Fwww.domain.tld%2Fnews%2Farchived%2Fresources-archived%22001-11%2F&ul=en-us&de=UTF-8&dt=Racal%20reborn%20after%20Thales%20buyout&sd=24-bit&sr=412x732&vp=404x732&je=0&cid=1296878891.1495497600&_gid=1908154735.1495497600&_r=1&z=821631926 HTTP/1.1" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-west-2:123456789012:targetgroup/MBM-L-Cache-1LH0DNU489D55/167e4810f75804c3 "Root=1-5ad2a8df-021aaad5031047e7dec3f2fa" "www.domain.tld" "arn:aws:acm:eu-west-2:123456789012:certificate/1140cbb2-4d4f-44b0-a4d9-a79329c5e361" 0
Run Code Online (Sandbox Code Playgroud)

使用这个正则表达式:

const splitElbEntry = (elbLogEntry) => R.match(/\S+|"[^"]*"/g)(elbLogEntry.trim())
Run Code Online (Sandbox Code Playgroud)

但似乎没有工作https://regex101.com/r/JOlrxS/1

我喜欢在双引号中保留任何内容,例如

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Run Code Online (Sandbox Code Playgroud)

ctw*_*els 5

更改选项的顺序:订单很重要.


为什么会这样?

正则表达式引擎将按照您呈现的顺序尝试每个选项.\S+|"[^"]*"始终尝试匹配\S+.如果\S+在字符串中的给定位置处无法匹配,"[^"]*"则尝试第二个选项.

\S匹配开始",第一个选项是唯一一个与现有正则表达式匹配的选项(永远不会尝试第二个选项),因此您可以将现有的正则表达式更改为\S+.展开下面的代码段以查看\S+|"[^"]*"\S+产生相同的结果.

你的正则表达式\S+|"[^"]*":

var s = `2018-04-16T08:09:27.203Z cae70dd2-414c-11e8-836a-354cb4985a41 https 2018-04-15T01:20:31.092381Z app/MBM-L-Publi-V9D386A91UNR/4695f2e72859f540 128.121.50.133:59367 10.0.1.14:80 0.001 0.003 0.000 200 200 934 282 "GET https://www.domain.tld:443/__utm.gif?v=1&_v=j66&a=1866784098&t=pageview&_s=1&dl=https%3A%2F%2Fwww.domain.tld%2Fnews%2Farchived%2Fresources-archived%22001-11%2F&ul=en-us&de=UTF-8&dt=Racal%20reborn%20after%20Thales%20buyout&sd=24-bit&sr=412x732&vp=404x732&je=0&cid=1296878891.1495497600&_gid=1908154735.1495497600&_r=1&z=821631926 HTTP/1.1" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-west-2:123456789012:targetgroup/MBM-L-Cache-1LH0DNU489D55/167e4810f75804c3 "Root=1-5ad2a8df-021aaad5031047e7dec3f2fa" "www.domain.tld" "arn:aws:acm:eu-west-2:123456789012:certificate/1140cbb2-4d4f-44b0-a4d9-a79329c5e361" 0`
console.log(s.match(/\S+|"[^"]*"/g))
Run Code Online (Sandbox Code Playgroud)

你的正则表达式简化了\S+:

var s = `2018-04-16T08:09:27.203Z cae70dd2-414c-11e8-836a-354cb4985a41 https 2018-04-15T01:20:31.092381Z app/MBM-L-Publi-V9D386A91UNR/4695f2e72859f540 128.121.50.133:59367 10.0.1.14:80 0.001 0.003 0.000 200 200 934 282 "GET https://www.domain.tld:443/__utm.gif?v=1&_v=j66&a=1866784098&t=pageview&_s=1&dl=https%3A%2F%2Fwww.domain.tld%2Fnews%2Farchived%2Fresources-archived%22001-11%2F&ul=en-us&de=UTF-8&dt=Racal%20reborn%20after%20Thales%20buyout&sd=24-bit&sr=412x732&vp=404x732&je=0&cid=1296878891.1495497600&_gid=1908154735.1495497600&_r=1&z=821631926 HTTP/1.1" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-west-2:123456789012:targetgroup/MBM-L-Cache-1LH0DNU489D55/167e4810f75804c3 "Root=1-5ad2a8df-021aaad5031047e7dec3f2fa" "www.domain.tld" "arn:aws:acm:eu-west-2:123456789012:certificate/1140cbb2-4d4f-44b0-a4d9-a79329c5e361" 0`
console.log(s.match(/\S+/g))
Run Code Online (Sandbox Code Playgroud)


你是如何解决这个问题的?

更改选项的顺序会使正则表达式引擎首先尝试"[^"]*",然后,如果不匹配,则尝试\S+.

请参阅此处使用的正则表达式

"[^"]*"|\S+
Run Code Online (Sandbox Code Playgroud)

var s = `2018-04-16T08:09:27.203Z cae70dd2-414c-11e8-836a-354cb4985a41 https 2018-04-15T01:20:31.092381Z app/MBM-L-Publi-V9D386A91UNR/4695f2e72859f540 128.121.50.133:59367 10.0.1.14:80 0.001 0.003 0.000 200 200 934 282 "GET https://www.domain.tld:443/__utm.gif?v=1&_v=j66&a=1866784098&t=pageview&_s=1&dl=https%3A%2F%2Fwww.domain.tld%2Fnews%2Farchived%2Fresources-archived%22001-11%2F&ul=en-us&de=UTF-8&dt=Racal%20reborn%20after%20Thales%20buyout&sd=24-bit&sr=412x732&vp=404x732&je=0&cid=1296878891.1495497600&_gid=1908154735.1495497600&_r=1&z=821631926 HTTP/1.1" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-west-2:123456789012:targetgroup/MBM-L-Cache-1LH0DNU489D55/167e4810f75804c3 "Root=1-5ad2a8df-021aaad5031047e7dec3f2fa" "www.domain.tld" "arn:aws:acm:eu-west-2:123456789012:certificate/1140cbb2-4d4f-44b0-a4d9-a79329c5e361" 0`
console.log(s.match(/"[^"]*"|\S+/g))
Run Code Online (Sandbox Code Playgroud)