Python删除方括号和它们之间的无关信息

kul*_*lfi 12 python regex python-3.x pandas python-3.6

我正在尝试处理一个文件,我需要删除文件中的无关信息;值得注意的是,我正在尝试删除括号,[]包括括号[] []块内部和之间的文本,说这些块之间的所有内容包括它们本身,但打印它之外的所有内容。

下面是我的带有数据示例的文本文件:

$ cat smb
Hi this is my config file.
Please dont delete it

[homes]
  browseable                     = No
  comment                        = Your Home
  create mode                    = 0640
  csc policy                     = disable
  directory mask                 = 0750
  public                         = No
  writeable                      = Yes

[proj]
  browseable                     = Yes
  comment                        = Project directories
  csc policy                     = disable
  path                           = /proj
  public                         = No
  writeable                      = Yes

[]

This last second line.
End of the line.
Run Code Online (Sandbox Code Playgroud)

期望输出:

Hi this is my config file.
Please dont delete it
This last second line.
End of the line.
Run Code Online (Sandbox Code Playgroud)

根据我的理解和重新搜索,我尝试了什么:

$ cat test.py
with open("smb", "r") as file:
  for line in file:
    start = line.find( '[' )
    end = line.find( ']' )
    if start != -1 and end != -1:
      result = line[start+1:end]
      print(result)
Run Code Online (Sandbox Code Playgroud)

输出:

$ ./test.py
   homes
   proj
Run Code Online (Sandbox Code Playgroud)

Mar*_*ani 8

用一个正则表达式

import re

with open("smb", "r") as f: 
    txt = f.read()
    txt = re.sub(r'(\n\[)(.*?)(\[]\n)', '', txt, flags=re.DOTALL)

print(txt)
Run Code Online (Sandbox Code Playgroud)

正则解释:

(\n\[) 找到一个序列,其中有一个换行符后跟一个 [

(\[]\n) 找到一个序列,其中有 [] 后跟一个换行符

(.*?)删除(\n\[)和中间的所有内容(\[]\n)

re.DOTALL 用于防止不必要的回溯


!!!熊猫更新!!!

可以用pandas进行相同逻辑的相同解决方案

import re
import pandas as pd

# read each line in the file (one raw -> one line)
txt = pd.read_csv('smb',  sep = '\n', header=None)
# join all the line in the file separating them with '\n'
txt = '\n'.join(txt[0].to_list())
# apply the regex to clean the text (the same as above)
txt = re.sub(r'(\n\[)(.*?)(\[]\n)', '\n', txt, flags=re.DOTALL)

print(txt)
Run Code Online (Sandbox Code Playgroud)


Aka*_*lia 5

将文件读入字符串,

extract = '''Hi this is my config file.
Please dont delete it

[homes]
  browseable                     = No
  comment                        = Your Home
  create mode                    = 0640
  csc policy                     = disable
  directory mask                 = 0750
  public                         = No
  writeable                      = Yes

[proj]
  browseable                     = Yes
  comment                        = Project directories
  csc policy                     = disable
  path                           = /proj
  public                         = No
  writeable                      = Yes

[]

This last second line.
End of the line.
'''.split('\n[')[0][:-1]
Run Code Online (Sandbox Code Playgroud)

会给你,

Hi this is my config file.
Please dont delete it
Run Code Online (Sandbox Code Playgroud)

.split('\n[')通过'\n['字符集的出现来分割字符串并[0]选择上面的描述行。

with open("smb", "r") as f: 
     extract = f.read()
     tail = extract.split(']\n')
     extract = extract.split('\n[')[0][:-1]+[tail[len(tail)-1]
Run Code Online (Sandbox Code Playgroud)

将读取和输出,

Hi this is my config file.
Please dont delete it
This last second line.
End of the line.
Run Code Online (Sandbox Code Playgroud)