che*_*som 3 shell awk cut range
我有这个python爬虫输出
[+] Site to crawl: http://www.example.com
[+] Start time: 2020-05-24 07:21:27.169033
[+] Output file: www.example.com.crawler
[+] Crawling
[-] http://www.example.com
[-] http://www.example.com/
[-] http://www.example.com/icons/ubuntu-logo.png
[-] http://www.example.com/manual
[i] 404 Not Found
[+] Total urls crawled: 4
[+] Directories found:
[-] http://www.example.com/icons/
[+] Total directories: 1
[+] Directory with indexing
Run Code Online (Sandbox Code Playgroud)
我想使用 awk 或任何其他工具在“爬行”和“爬行的总网址”之间划清界限,所以基本上我想使用变量将 NR 分配给第一个关键字“爬行”,并将第二个变量分配给它 NR第二个限制器“爬行的总网址”的值,然后削减两者之间的范围,我尝试了这样的事情:
awk 'NR>$(Crawling) && NR<$(urls)' file.txt
Run Code Online (Sandbox Code Playgroud)
但没有什么真正奏效,我得到的最好的是从 Crawling+1 行到文件末尾的剪切,这实际上没有帮助,所以如何做以及如何使用带有变量的 awk 剪切一系列行!
如果我正确地满足了您的要求,您想将 shell 变量放入awk
代码和搜索字符串中,然后尝试以下操作。
awk -v crawl="Crawling" -v url="Total urls crawled" '
$0 ~ url{
found=""
next
}
$0 ~ crawl{
found=1
next
}
found
' Input_file
Run Code Online (Sandbox Code Playgroud)
说明:为以上添加详细说明。
awk -v crawl="Crawling" -v url="Total urls crawled" ' ##Starting awk program and setting crawl and url values of variables here.
$0 ~ url{ ##Checking if line is matched to url variable then do following.
found="" ##Nullify the variable found here.
next ##next will skip further statements from here.
}
$0 ~ crawl{ ##Checking if line is matched to crawl variable then do following.
found=1 ##Setting found value to 1 here.
next ##next will skip further statements from here.
}
found ##Checking condition if found is SET(NOT NULL) then print current line.
' Input_file ##Mentioning Input_file name here.
Run Code Online (Sandbox Code Playgroud)