单个 CSV 文件中的多个分隔符

Sar*_*kar 6 python csv delimiter

我有一个 CSV,它有三个不同的分隔符,即 '|'、',' 和 ';' 不同列之间。

我如何使用 Python 解析这个 CSV ?

我的数据如下:

2017-01-24|05:19:30+0000|TRANSACTIONDelim_secondUSER_LOGINDelim_firstCONSUMERIDDelim_secondc4115f53-3798-4c9e-9bfd-506c842aff96Delim_firstTRANSACTIONDATEDelim_second17-01-24 05:19:30Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondnullDelim_firstAIRINGDATEDelim_second|**
2017-01-24|05:19:30+0000|TRANSACTIONDelim_secondUSER_LOGOUTDelim_firstCONSUMERIDDelim_second1583e83882b8e7Delim_firstTRANSACTIONDATEDelim_second17-01-24 05:19:26Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second24-Jan-2017|**
2017-01-24|05:21:59+0000|TRANSACTIONDelim_secondVIEW_PRIVACY_POLICYDelim_firstCONSUMERIDDelim_secondnullDelim_firstTRANSACTIONDATEDelim_second17-01-24 05:21:59Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondnullDelim_firstAIRINGDATEDelim_second|**
2017-01-24|05:59:25+0000|TRANSACTIONDelim_secondUSER_LOGOUTDelim_firstCONSUMERIDDelim_second1586a2aa4bc18fDelim_firstTRANSACTIONDATEDelim_second17-01-24 05:59:21Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second24-Jan-2017|**
2017-01-24|05:59:36+0000|TRANSACTIONDelim_secondUSER_LOGOUTDelim_firstCONSUMERIDDelim_second1583e83882b8e7Delim_firstTRANSACTIONDATEDelim_second17-01-24 05:59:31Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second24-Jan-2017|**
2017-01-24|06:04:25+0000|TRANSACTIONDelim_secondUSER_LOGOUTDelim_firstCONSUMERIDDelim_secondc4115f53-3798-4c9e-9bfd-506c842aff96Delim_firstTRANSACTIONDATEDelim_second17-01-24 06:04:24Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second|**
2017-01-24|06:05:07+0000|TRANSACTIONDelim_secondUSER_LOGINDelim_firstCONSUMERIDDelim_secondc4115f53-3798-4c9e-9bfd-506c842aff96Delim_firstTRANSACTIONDATEDelim_second17-01-24 06:05:07Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondnullDelim_firstAIRINGDATEDelim_second|**
2017-01-24|06:05:07+0000|TRANSACTIONDelim_secondUSER_LOGINDelim_firstCONSUMERIDDelim_secondc4115f53-3798-4c9e-9bfd-506c842aff96Delim_firstTRANSACTIONDATEDelim_second17-01-24 06:05:07Delim_firstCHANNELIDDelim_secondDelim_firstSHOWIDDelim_secondDelim_firstEPISODEIDDelim_secondDelim_firstBUSINESSUNITDelim_secondbu002Delim_firstAIRINGDATEDelim_second|**
Run Code Online (Sandbox Code Playgroud)

Mik*_*ler 7

坚持使用标准库,re.split()可以在以下任何字符处拆分一行:

import re

with open(file_name) as fobj:
    for line in fobj:
        line_data = re.split('Delim_first|Delim_second|[|]', line)
        print(line_data)
Run Code Online (Sandbox Code Playgroud)

这将分裂的分隔符|Delim_firstDelim_second

或者用熊猫:

import pandas as pd
df = pd.read_csv('multi_delim.csv', sep='Delim_first|Delim_second|[|]', 
                  engine='python', header=None)
Run Code Online (Sandbox Code Playgroud)

结果:

在此处输入图片说明