如果字符串包含某些单词,我试图排除字符串列表中的某些字符串。
例如,如果字符串中有“肉桂”或“水果”或“吃”这个词,我希望将其从字符串列表中排除。
['RT @haussera: Access to Apple Pay customer data, no, but another way? everybody wins - MarketWatch http://t.co/Fm3LE2iTkY', "Landed in the US, tired w horrible migrane. The only thing helping- Connie's new song on repeat. #SoGood #Nashville https://t.co/AscR4VUkMP", 'I wish jacob would be my cinnamon apple', "I've collected 9,112 gold coins! http://t.co/T62o8NoP09 #iphone, #iphonegames, #gameinsight", 'HAHAHA THEY USED THE SAME ARTICLE AS INDEPENDENT http://t.co/mC7nfnhqSw', '@hot1079atl Let me know what you think of the new single "Mirage "\nhttps://t.co/k8DJ7oxkyg', 'RT @SWNProductions: …
Run Code Online (Sandbox Code Playgroud) 我将阅读大约7 GB的文本文件.
每当我尝试阅读此文件时,都需要很长时间.
例如,假设我有350 MB的文本文件,而我的笔记本电脑大约需要一分钟或更短时间.如果我想读7GB,理想情况下应该花20分钟或更短时间.不是吗?我的花费比我预期的要长得多,我想缩短阅读和处理数据的时间.
我使用以下代码进行阅读:
for line in open(filename, 'r'):
try:
list.append(json.loads(line))
except:
pass
Run Code Online (Sandbox Code Playgroud)
在读取文件之后,我过去通过制作另一个列表并删除前一个列表来过滤掉列表中不必要的数据.如果您有任何建议,请告诉我.
我在处理列表中的元组时遇到了麻烦.让我们假设我们有一个包含很多元组的列表.
simpleTag=[**('samsung', 'ADJ')**, ('user', 'NOUN'), ('huh', 'NOUN'), ('weird', 'NOUN'), (':', '.'), ('MDPai05', 'NOUN'), (':', '.'), ('Samsung', 'NOUN'), ('Electronics', 'NOUN'), ('to', 'PRT'), ('Build', 'NOUN'), ('$', '.'), ('3', 'NUM'), ('Billion', 'NUM'), ('Smartphone', 'NOUN'), ('Plant', 'NOUN'), ('in', 'ADP'), ('Vietnam', 'NOUN'), ('Why', 'NOUN'), ('not', 'ADV'), ('india', 'VERB'), ('?', '.'), ('market', 'NOUN'), ('here', 'ADV'), (':', '.'), (':', '.'), ('//t…I', 'ADJ'), ('have', 'VERB'), ('bricked', 'VERB'), ('an', 'DET'), ('android', 'ADJ'), ('samsung', 'NOUN'), ('galaxy', 'NOUN'), ('player', 'NOUN'), ('yp-g70', 'X'), ('international', 'ADJ'), ('version', 'NOUN'), (',', '.'), ('and', 'CONJ'), ('it', …
Run Code Online (Sandbox Code Playgroud)