列表top_brands包含品牌列表,例如
top_brands = ['Coca Cola', 'Apple', 'Victoria\'s Secret', ....]
Run Code Online (Sandbox Code Playgroud)
items是a pandas.DataFrame,结构如下所示。我的任务是填补了brand_name从item_title如果brand_name丢失
row item_title brand_name
1 | Apple 6S | Apple
2 | New Victoria\'s Secret | missing <-- need to fill with Victoria\'s Secret
3 | Used Samsung TV | missing <--need fill with Samsung
4 | Used bike | missing <--No need to do anything because there is no brand_name in the title
....
Run Code Online (Sandbox Code Playgroud)
我的代码如下。问题在于,对于包含200万条记录的数据框而言,它太慢了。我可以使用pandas或numpy处理任务吗?
def get_brand_name(row): …Run Code Online (Sandbox Code Playgroud)