替换 sed 命令中的奇怪字符

dev*_*com 3 sed ed

我想创建一个sed命令来从给定文档中删除所有这些奇怪的字符:

\n
sed -n 's/\\|\xc2\xaeMD-IT\xc2\xaf\\|\xc2\xaeMD\\+BO\xc2\xaf\\|\xc2\xaeMDNM\xc2\xaf\xc2\xaeLL\\.8LI,0LI\xc2\xaf\\|\xc2\xaeLL0LI,0LI\xc2\xaf\\|\xc2\xaeMD\\+IT\xc2\xaf\\|\xc2\xaeLL.8LI,0LI\xc2\xaf\xc2\xaeMDIT\xc2\xaf\\|\xc2\xaeMDNM\xc2\xaf\xc2\xaeFL\xc2\xaf\xc2\xaeLL.8LI,0LI\xc2\xaf\\|\xc2\xaeFL\xc2\xaf\xc2\xaeMD-BO\xc2\xaf\\|\xc2\xaeFL\xc2\xaf\xc2\xaeMD-BO\xc2\xaf\\|\xc2\xaeMD-BO\xc2\xaf\\|\xc2\xaf\xc2\xaeOF1IN,1IN\xc2\xaf\xc2\xaeFC\xc2\xaf\xc2\xaeLL1LI,0LI\xc2\xaf\\|\\|\xc2\xaeSF1,1\xc2\xaf\\|\xc2\xaeFM1FT=0LI,LR=1;\\|\xc2\xaeMDSU\xc2\xaf\xc2\xaeFN1\xc2\xaf\\|\xc2\xaeMDNM\xc2\xaf\xc2\xaf\\|\xc2\xaeIV-RTF\\|\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\.\\|\xc2\xaf\xc2\xaeBF0\xc2\xaf\\|\xc2\xaeFS1\\|-------------------------------------\\|\xc2\xaf\xc2\xaeFW1\\|\\|//gp'\n
Run Code Online (Sandbox Code Playgroud)\n

这些代码都是在另一个应用程序中创建的Nota Bene,我有许多包含此类代码的文件,我想将它们转换为纯文本,甚至可能是 Markdown。

\n

问题是字符没有被替换。我尝试过这样做,Sublime Text并成功使用查找替换(正则表达式)剥离文档。sed对我来说,创建一个脚本比用于Sublime此任务更好。

\n

我也尝试过使用Ed,但它也没有找到替代品。

\n

这是在 Sublime Text 中打开时的示例 nb 文件:

\n
\xc2\xaeSSDEFAULTS\xc2\xaf\xc2\xaeLR1\xc2\xaf\xc2\xaeJU\xc2\xaf\xc2\xaeMD+BO\xc2\xaf\xc2\xaeUFTimes New Roman\xc2\xaf\xc2\xaeSZ12Pt\xc2\xafGlossary\xc2\xaeMD+BO\xc2\xaf\xc2\xaeTS.5IN,1IN,1.5IN,2IN,2.5IN,3IN,3.5IN,4IN,4.5IN,5IN,5.5IN,6IN\xc2\xaf    \xc2\xaeMD-BO\xc2\xaf\n\xc2\xaeNJ\xc2\xaf\xc2\xaeLR1\xc2\xaf\xc2\xaeLL.5LI,0LI\xc2\xaf\xc2\xaeMD+BO\xc2\xaf\xc2\xaeLL0LI,0LI\xc2\xaf\xc2\xaeMDNM\xc2\xaf\xc2\xaeLR1\xc2\xaf\xc2\xaeLL.5LI,0LI\xc2\xafA fortiori proposition: If X is true, then how much greater is Y true? To move logically from a stronger argument to establish a weaker argument. The weaker argument is sometimes presented by the speaker as the stronger argument.\n\xc2\xaeLL0LI,0LI\xc2\xaf\xc2\xaeLR1\xc2\xaf\xc2\xaeLL.5LI,0LI\xc2\xaf\xc2\xaeLL0LI,0LI\xc2\xaf\xc2\xaeLR1\xc2\xaf\xc2\xaeLL.5LI,0LI\xc2\xafAccusative of motion/direction - Indicates movement to the noun marked by the accusative and is to be distinguished from the accusative of local determination which indicates location without motion (Jo\xc3\xbcon and Muraoka 2006, 428).\nAnadiplosis - A figure of speech in which the word that a colon ends with, or a like sounding word, is the word that begins the next colon \xc2\xaeGC|CI:R#=47;AU=Brown, Raymond E.;YR=1990;TI=New Jerome biblical commentary;PG=245;XT=;F[=;F]=;F#=;ID=;XX=Print;CT=;FL=\xc2\xaf(Brown, Fitzmyer, Murphy, et al. 1990,\xc2\xa0245)\xc2\xaeGC\xc2\xaf.\n\xc2\xaeLL0LI,0LI\xc2\xaf\xc2\xaeLR1\xc2\xaf\xc2\xaeLL.5LI,0LI\xc2\xaf\xc2\xaeLL0LI,0LI\xc2\xaf\xc2\xaeLR1\xc2\xaf\xc2\xaeLL.5LI,0LI\xc2\xafAnaphoric use of the article - When the article is used to indicate that the word to which it is attached is the one previously mentioned (Williams and Beckman 2007, 36). \n\xc2\xaeLL0LI,0LI\xc2\xaf\xc2\xaeLR1\xc2\xaf\xc2\xaeLL.5LI,0LI\xc2\xaf\xc2\xaeLL0LI,0LI\xc2\xaf\xc2\xaeLR1\xc2\xaf\xc2\xaeLL.5LI,0LI\xc2\xafAnaptyxis - The insertion of a vowel into a word to avoid a consonant cluster.\n\xc2\xaeLL0LI,0LI\xc2\xaf\xc2\xaeLR1\xc2\xaf\xc2\xaeLL.5LI,0LI\xc2\xaf\xc2\xaeLL0LI,0LI\xc2\xaf\xc2\xaeLR1\xc2\xaf\xc2\xaeLL.5LI,0LI\xc2\xafAoristic perfect - I use the phrase 'aoristic perfect' to refer to one of the ways the qatal form can be rendered into English. Aoristic perfect denotes a past situation the implications of which are no longer felt in the present. The situation may have extended over a period of time and it may have occurred more than once. It may have occurred in the recent or distant past but from the standpoint of the speaker it is to be regarded as a fact having occurred and hence as a fact belonging to the past (Jo\xc3\xbcon and Muraoka 2006, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the other categorizations of perfect in this grammar, all relate to the interpretation of qatal verbs in their given contexts. The qatal form in and of itself does not convey these meanings. \n\xc2\xaeLL0LI,0LI\xc2\xaf\xc2\xaeLR1\xc2\xaf\xc2\xaeLL.5LI,0LI\xc2\xaf\xc2\xaeLL0LI,0LI\xc2\xaf\xc2\xaeLR1\xc2\xaf\xc2\xaeLL.5LI,0LI\xc2\xafBeth essentiae - \xc2\xaeLAHebrew\xc2\xaf\xc3\xbfH\xc3\xa1\xc2\xaeLAEnglish\xc2\xaf that is used to indicate the predicate of a clause or a word used predicatively (Jo\xc3\xbcon and Muraoka 2006, 458).\n
Run Code Online (Sandbox Code Playgroud)\n

这就是我希望文本的读法:

\n
Glossary    \nA fortiori proposition: If X is true, then how much greater is Y true? To move logically from a stronger argument to establish a weaker argument. The weaker argument is sometimes presented by the speaker as the stronger argument.\nAccusative of motion/direction - Indicates movement to the noun marked by the accusative and is to be distinguished from the accusative of local determination which indicates location without motion (Jo\xc3\xbcon and Muraoka 2006, 428).\nAnadiplosis - A figure of speech in which the word that a colon ends with, or a like sounding word, is the word that begins the next colon (Brown, Fitzmyer, Murphy, et al. 1990,\xc2\xa0245).\nAnaphoric use of the article - When the article is used to indicate that the word to which it is attached is the one previously mentioned (Williams and Beckman 2007, 36). \nAnaptyxis - The insertion of a vowel into a word to avoid a consonant cluster.\nAoristic perfect - I use the phrase 'aoristic perfect' to refer to one of the ways the qatal form can be rendered into English. Aoristic perfect denotes a past situation the implications of which are no longer felt in the present. The situation may have extended over a period of time and it may have occurred more than once. It may have occurred in the recent or distant past but from the standpoint of the speaker it is to be regarded as a fact having occurred and hence as a fact belonging to the past (Jo\xc3\xbcon and Muraoka 2006, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the other categorizations of perfect in this grammar, all relate to the interpretation of qatal verbs in their given contexts. The qatal form in and of itself does not convey these meanings. \n
Run Code Online (Sandbox Code Playgroud)\n
|> sed -n l Glossary.NB\n\\256SSDEFAULTS\\257\\256LR1\\257\\256JU\\257\\256MD+BO\\257\\256UFTimes New R\\\noman\\257\\256SZ12Pt\\257Glossary\\256MD+BO\\257\\256TS.5IN,1IN,1.5IN,2IN,2\\\n.5IN,3IN,3.5IN,4IN,4.5IN,5IN,5.5IN,6IN\\257\\t\\256MD-BO\\257\\r$\n\\256NJ\\257\\256LR1\\257\\256LL.5LI,0LI\\257\\256MD+BO\\257\\256LL0LI,0LI\\257\\\n\\256MDNM\\257\\256LR1\\257\\256LL.5LI,0LI\\257A fortiori proposition: If X\\\n is true, then how much greater is Y true? To move logically from a s\\\ntronger argument to establish a weaker argument. The weaker argument \\\nis sometimes presented by the speaker as the stronger argument.\\r$\n\\256LL0LI,0LI\\257\\256LR1\\257\\256LL.5LI,0LI\\257\\256LL0LI,0LI\\257\\256LR\\\n1\\257\\256LL.5LI,0LI\\257Accusative of motion/direction - Indicates mov\\\nement to the noun marked by the accusative and is to be distinguished\\\n from the accusative of local determination which indicates location \\\nwithout motion (Jo\\374on and Muraoka 2006, 428).\\r$\nAnadiplosis - A figure of speech in which the word that a colon ends \\\nwith, or a like sounding word, is the word that begins the next colon\\\n \\256GC|CI:R#=47;AU=Brown, Raymond E.;YR=1990;TI=New Jerome biblical \\\ncommentary;PG=245;XT=;F[=;F]=;F#=;ID=;XX=Print;CT=;FL=\\257(Brown, Fit\\\nzmyer, Murphy, et al. 1990,\\240245)\\256GC\\257.\\r$\n\\256LL0LI,0LI\\257\\256LR1\\257\\256LL.5LI,0LI\\257\\256LL0LI,0LI\\257\\256LR\\\n1\\257\\256LL.5LI,0LI\\257Anaphoric use of the article - When the articl\\\ne is used to indicate that the word to which it is attached is the on\\\ne previously mentioned (Williams and Beckman 2007, 36). \\r$\n\\256LL0LI,0LI\\257\\256LR1\\257\\256LL.5LI,0LI\\257\\256LL0LI,0LI\\257\\256LR\\\n1\\257\\256LL.5LI,0LI\\257Anaptyxis - The insertion of a vowel into a wo\\\nrd to avoid a consonant cluster.\\r$\n\\256LL0LI,0LI\\257\\256LR1\\257\\256LL.5LI,0LI\\257\\256LL0LI,0LI\\257\\256LR\\\n1\\257\\256LL.5LI,0LI\\257Aoristic perfect - I use the phrase 'aoristic \\\nperfect' to refer to one of the ways the qatal form can be rendered i\\\nnto English. Aoristic perfect denotes a past situation the implicatio\\\nns of which are no longer felt in the present. The situation may have\\\n extended over a period of time and it may have occurred more than on\\\nce. It may have occurred in the recent or distant past but from the s\\\ntandpoint of the speaker it is to be regarded as a fact having occurr\\\ned and hence as a fact belonging to the past (Jo\\374on and Muraoka 20\\\n06, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the\\\n other categorizations of perfect in this grammar, all relate to the \\\ninterpretation of qatal verbs in their given contexts. The qatal form\\\n in and of itself does not convey these meanings. \\r$\n\\256LL0LI,0LI\\257\\256LR1\\257\\256LL.5LI,0LI\\257\\256LL0LI,0LI\\257\\256LR\\\n1\\257\\256LL.5LI,0LI\\257Beth essentiae - \\256LAHebrew\\257\\377H\\341\\256\\\nLAEnglish\\257 that is used to indicate the predicate of a clause or a\\\n word used predicatively (Jo\\374on and Muraoka 2006, 458).\\r$\n\\256LL0LI,0LI\\257\\256LR1\\257\\256LL.5LI,0LI\\257\\256LL0LI,0LI\\257\\256LR\\\n1\\257\\256LL.5LI,0LI\\257Classic perfect - I use the phrase 'classic pe\\\nrfect' to refer to one of the ways the qatal form can be rendered int\\\no English. Classic perfect refers to the continuing present relevance\\\n of a past situation from the perspective of the speaker (Comrie 1976\\\n, 52). By perfect I do not necessarily imply that a previous situatio\\\nn has resulted in a state but that the situation has implications rel\\\nevant to the present. The situation is not merely past and over but s\\\nomehow persists and continues to intrude into the present. Such verbs\\\n are usually translated into English using the perfect or present ten\\\nse. I have included under this definition quasi-stative verbs which r\\\nefer to attributes which were acquired before, but which are assumed \\\nto continue in some way up to the present moment (Driver 1998, 11; Jo\\\n\\374on and Muraoka 2006, 333; Waltke and O'Connor 1990, 487). In some\\\n grammars these are treated separately. However, that creates too man\\\ny functions for the one perfect form. The term 'classic perfect' and \\\nindeed the other categorizations of perfect in this grammar all relat\\\ne to the \\256MD+IT\\257interpretation \\256MD-IT\\257of qatal verbs in t\\\nheir given contexts. The qatal form by itself does not convey these m\\\neanings.\\r$\n\\256LL0LI,0LI\\257\\256LR1\\257\\256LL.5LI,0LI\\257\\256LL0LI,0LI\\257\\256LR\\\n1\\257\\256LL.5LI,0LI\\257Cohortative of praise. The cohortative is ofte\\\nn used in Psalms to indicate that praise, freely undertaken, has begu\\\nn. This usage is close to the cohortative of resolve but not identica\\\nl with it. The emphasis falls not on what the writer is intending to \\\ndo, but what he has already undertaken. \\r$\nCohortative of resolve - The cohortative mood normally expresses the \\\nwill of the speaker, but when the speaker has the ability to carry ou\\\nt what he wants it takes on the coloring of resolve (Van der Merwe et\\\n al. 1997, 152; Waltke and O'Connor 1990, 573).\\r$\n\\256LL0LI,0LI\\257\\256LR1\\257\\256LL.5LI,0LI\\257\\256LL0LI,0LI\\257\\256LR\\\n1\\257\\256LL.5LI,0LI\\257Concluding \\256LAHebrew\\257\\377h\\353\\377H\\351\\\n\\256LAEnglish\\257 - A special use of the word \\256LAHebrew\\257\\377h\\\n\\353\\377H\\351\\256LAEnglish\\257 found towards the end of several Psalm\\\ns and approximating in meaning to: the conclusion of the matter is th\\\nat\\205\\r$\n\\256LL0LI,0LI\\257\\256LR1\\257\\256LL.5LI,0LI\\257\\256LL0LI,0LI\\257\\256LR\\\n1\\257\\256LL.5LI,0LI\\257Conjunctive waw - Waw used to connect clauses \\\n
Run Code Online (Sandbox Code Playgroud)\n

JJo*_*oao 6

Sed 也可以用作脚本(更容易开发):创建一个文件“nb2txt”

\n
#!/usr/bin/sed -Ef\n\ns/\xc2\xae[^\xc2\xaf]*\xc2\xaf//g\ns/-{20,}//g\ns/\\.{20,}//g\n
Run Code Online (Sandbox Code Playgroud)\n

和:

\n
$ chmod 755 nb2txt\n$ nb2txt file.nb\n
Run Code Online (Sandbox Code Playgroud)\n

  • +1。我本来打算发表一条关于将大部分 sed 脚本简化为“sed -e 's/®[^´]*[´]//g' input.txt > out.txt` 的评论,但您的答案已经做到了这一点。这确实删除了第四个输入行上奇怪的参考书目格式,但是如果需要,可以使用处理“®GC\|(.*)FL=¯”的早期规则单独处理...OP似乎没有介意它被删除,但这可能是因为他们认为它太难打扰(可能是在 sed 中,绝对不是在“perl -p”中) (2认同)
  • 小更正:我的意思是 `s/®[^´]*´//` - 第二个 `´` 周围的方括号是我试图避免用 `'®[ 删除第 4 行的书目数据时留下的内容^́=]*[́=]'` (2认同)