Dav*_*mka 10 python dictionary equals beautifulsoup
假设我有两个相同类的对象:objA和objB.他们的关系如下:
(objA == objB) #true
(objA is objB) #false
Run Code Online (Sandbox Code Playgroud)
如果我在Python dict中使用两个对象作为键,那么它们将被视为相同的键,并相互覆盖.有没有办法覆盖dict比较器使用is比较而不是==这样两个对象将被视为dict中的不同键?
也许我可以在类或类似的东西中覆盖equals方法?更具体地说,我在讨论BeautifulSoup4库中的两个Tag对象.
这是我所谈论的更具体的例子:
from bs4 import BeautifulSoup
HTML_string = "<html><h1>some_header</h1><h1>some_header</h1></html>"
HTML_soup = BeautifulSoup(HTML_string, 'lxml')
first_h1 = HTML_soup.find_all('h1')[0] #first_h1 = <h1>some_header</h1>
second_h1 = HTML_soup.find_all('h1')[1] #second_h1 = <h1>some_header</h1>
print(first_h1 == second_h1) # this prints True
print(first_h1 is second_h1) # this prints False
my_dict = {}
my_dict[first_h1] = 1
my_dict[second_h1] = 1
print(len(my_dict)) # my dict has only 1 entry!
# I want to have 2 entries in my_dict: one for key 'first_h1', one for key 'second_h1'.
Run Code Online (Sandbox Code Playgroud)
first_h1并且second_h1是Tag类实例.当你这样做my_dict[first_h1]或者my_dict[second_h1],字符串表示的标签被用于散列.问题是,这两个Tag实例都具有相同的字符串表示形式:
<h1>some_header</h1>
Run Code Online (Sandbox Code Playgroud)
这是因为Tagclass有__hash__()魔术方法定义如下:
def __hash__(self):
return str(self).__hash__()
Run Code Online (Sandbox Code Playgroud)
其中一个解决方法可能是将id()值用作哈希值,但是存在重新定义Tag内部类的问题BeautifulSoup.您可以通过制作自己的自定义"标记包装"来解决该问题:
class TagWrapper:
def __init__(self, tag):
self.tag = tag
def __hash__(self):
return id(self.tag)
def __str__(self):
return str(self.tag)
def __repr__(self):
return str(self.tag)
Run Code Online (Sandbox Code Playgroud)
然后,你将能够做到:
In [1]: from bs4 import BeautifulSoup
...:
In [2]: class TagWrapper:
...: def __init__(self, tag):
...: self.tag = tag
...:
...: def __hash__(self):
...: return id(self.tag)
...:
...: def __str__(self):
...: return str(self.tag)
...:
...: def __repr__(self):
...: return str(self.tag)
...:
In [3]: HTML_string = "<html><h1>some_header</h1><h1>some_header</h1></html>"
...:
...: HTML_soup = BeautifulSoup(HTML_string, 'lxml')
...:
In [4]: first_h1 = HTML_soup.find_all('h1')[0] #first_h1 = <h1>some_header</h1>
...: second_h1 = HTML_soup.find_all('h1')[1] #second_h1 = <h1>some_header</h1>
...:
In [5]: my_dict = {}
...: my_dict[TagWrapper(first_h1)] = 1
...: my_dict[TagWrapper(second_h1)] = 1
...:
...: print(my_dict)
...:
{<h1>some_header</h1>: 1, <h1>some_header</h1>: 1}
Run Code Online (Sandbox Code Playgroud)
但是,它不漂亮,使用起来不太方便.我会重申你的初始问题并检查你是否确实需要将标签放入字典中.
您也可以bs4使用Python的内省功能进行猴子补丁,就像在这里完成的那样,但这将进入一个相当危险的领域.