从记事本++中的文件中提取每个引号之间的文本

Question

从记事本++中的文件中提取每个引号之间的文本

我的文件包含 2000 多个摘要，其中包含 18000 多个句子，以 tag 开头并以 tag 结尾。我想使用notepad++查找信息，我的文件视图如下：

<abstract>
<sentence>Activationofthe<conslex="CD28_surface_receptor"sem="G#protein_family_or_group"><conslex="CD28"sem="G#protein_molecule">CD28</cons>surfacereceptor</cons>providesamajorcostimulatorysignalfor<conslex="T_cell_activation"sem="G#other_name">Tcellactivation</cons>resultinginenhancedproductionof<conslex="interleukin-2"sem="G#protein_molecule">interleukin-2</cons>(<conslex="IL-2"sem="G#protein_molecule">IL-2</cons>)and<conslex="cell_proliferation"sem="G#other_name">cellproliferation</cons>.</sentence> 
<sentence>In<conslex="primary_T_lymphocyte"sem="G#cell_type">primaryTlymphocytes</cons>weshowthat<conslex="CD28"sem="G#protein_molecule">CD28</cons>ligationleadstotherapidintracellularformationof<conslex="reactive_oxygen_intermediate"sem="G#inorganic">reactiveoxygenintermediates</cons>(<conslex="ROI"sem="G#inorganic">ROIs</cons>)whicharerequiredfor<conslex="CD28-mediated_activation"sem="G#other_name"><conslex="CD28"sem="G#protein_molecule">CD28</cons>-mediatedactivation</cons>ofthe<conslex="NF-kappa_B"sem="G#protein_molecule">NF-kappaB</cons>/<conslex="CD28-responsive_complex"sem="G#protein_complex"><conslex="CD28"sem="G#protein_molecule">CD28</cons>-responsivecomplex</cons>and<conslex="IL-2_expression"sem="G#other_name"><conslex="IL-2"sem="G#protein_molecule">IL-2</cons>expression</cons>.</sentence>  
<sentence>Delineationofthe<conslex="CD28_signaling_cascade"sem="G#other_name"><conslex="CD28"sem="G#protein_molecule">CD28</cons>signalingcascade</cons>wasfoundtoinvolve<conslex="protein_tyrosine_kinase_activity"sem="G#other_name"><conslex="protein_tyrosine_kinase"sem="G#protein_family_or_group">proteintyrosinekinase</cons>activity</cons>,followedbytheactivationof<conslex="phospholipase_A2"sem="G#protein_molecule">phospholipaseA2</cons>and<conslex="5-lipoxygenase"sem="G#protein_molecule">5-lipoxygenase</cons>.</sentence>  
<sentence>Ourdatasuggestthat<conslex="lipoxygenase_metabolite"sem="G#protein_family_or_group"><conslex="lipoxygenase"sem="G#protein_molecule">lipoxygenase</cons>metabolites</cons>activate<conslex="ROI_formation"sem="G#other_name"><conslex="ROI"sem="G#inorganic">ROI</cons>formation</cons>whichtheninduce<conslex="IL-2"sem="G#protein_molecule">IL-2</cons>expressionvia<conslex="NF-kappa_B_activation"sem="G#other_name"><conslex="NF-kappa_B"sem="G#protein_molecule">NF-kappaB</cons>activation</cons>.</sentence>  
<sentence>Thesefindingsshouldbeusefulfor<conslex="therapeutic_strategies"sem="G#other_name">therapeuticstrategies</cons>andthedevelopmentof<conslex="immunosuppressants"sem="G#other_name">immunosuppressants</cons>targetingthe<conslex="CD28_costimulatory_pathway"sem="G#other_name"><conslex="CD28"sem="G#protein_molecule">CD28</cons>costimulatorypathway</cons>.</sentence>
</abstract>

Run Code Online (Sandbox Code Playgroud)

我想提取引号之间的文本，或者换句话说，想要删除所有数据，除了整个文本中的双引号，例如我想要的输出是这样的

CD28_surface_receptor  G#protein_family_or_group   CD28   G#protein_molecule  
primary_T_lymphocyte   G#cell_type

Run Code Online (Sandbox Code Playgroud)

我.*"(.*)".*在Find What 中使用，然后\1通过替换全部替换为。它只从每行的最后提取了带有引号的文本，但我想从所有文档和每一行中提取，因为我的文件中有更多带有双引号的字符串。

Answer 1

Wik*_*żew 5

您可以[^"]*"([^"]+)"[^"]*在Find What 中使用，并替换为\1\r\n：

在此处输入图片说明

或者，要将它们以制表符分隔，请替换为\1\t：

在此处输入图片说明

归档时间：	11 年，1 月前
查看次数：	8120 次
最近记录：	10 年，12 月前