如果可能,我想与linux命令结合使用,所有以大写字母开头的单词,但不包括该行开头的单词。目标是在这些单词之间创建边缘。例如:
My friend John met Beatrice and Lucio.
Run Code Online (Sandbox Code Playgroud)
我想要的结果应该是:
我设法通过正则表达式获得了所有以大写字母开头的单词,但不包括该行开头的单词。正则表达式为:
*cat gov.json | grep -oP "\b([A-Z][a-z']*)(\s[A-Z][a-z']*)*\b | ^(\s*.*?\s).*" > nodes.csv*
Run Code Online (Sandbox Code Playgroud)
节点设法在列中分别输入它们,即:
现在的目标是在以大写字母开头的名称之间创建可能的组合,并将其放入文件中。有什么建议么?
如果输出中对的顺序无关紧要:
$ cat tst.awk
BEGIN { FS="[^[:alpha:]]+"; OFS=", " }
{
for (i=2; i<=NF; i++) {
if ($i ~ /^[[:upper:]]/) {
words[$i]
}
}
}
END {
for (word1 in words) {
for (word2 in words) {
if (word1 != word2) {
print word1, word2
}
}
delete words[word1]
}
}
$ awk -f tst.awk file
Beatrice, Lucio
Beatrice, John
Lucio, John
Run Code Online (Sandbox Code Playgroud)
如果顺序很重要,则:
$ cat tst.awk
BEGIN { FS="[^[:alpha:]]"; OFS=", " }
{
for (i=2; i<=NF; i++) {
if ($i ~ /^[[:upper:]]/) {
if ( !seen[$i]++ ) {
words[++numWords] = $i
}
}
}
}
END {
for (word1nr=1; word1nr<=numWords; word1nr++) {
word1 = words[word1nr]
for (word2nr=word1nr+1; word2nr<=numWords; word2nr++) {
word2 = words[word2nr]
print word1, word2
}
}
}
$ awk -f tst.awk file
John, Beatrice
John, Lucio
Beatrice, Lucio
Run Code Online (Sandbox Code Playgroud)
在上面,file包含原始输入,例如My friend John met Beatrice and Lucio.