我已经2 ontologies使用owlready2库做了
list(ontology1.classes())
[old_dataset_ontology.weather,
old_dataset_ontology.season]
list(ontology1.individuals())
[old_dataset_ontology.rainy,
old_dataset_ontology.windy,
old_dataset_ontology.cold,
old_dataset_ontology.clouds]
list(ontology2.classes())
[new_dataset_ontology.weather,
new_dataset_ontology.season,
new_dataset_ontology.season1]
list(ontology2.individuals())
[new_dataset_ontology.rainy,
new_dataset_ontology.windy,
new_dataset_ontology.cold1]
Run Code Online (Sandbox Code Playgroud)
我想要merge他们,但我找不到办法olwready2。文档中没有任何内容。我只想要一个简单的字符串匹配并删除重复的类和 indiv
有任何想法吗?
我用BertTopicwith从一些中KeyBERT提取一些topicsdocs
from bertopic import BERTopic
topic_model = BERTopic(nr_topics="auto", verbose=True, n_gram_range=(1, 4), calculate_probabilities=True, embedding_model='paraphrase-MiniLM-L3-v2', min_topic_size= 3)
topics, probs = topic_model.fit_transform(docs)
Run Code Online (Sandbox Code Playgroud)
现在我可以访问topic name
freq = topic_model.get_topic_info()
print("Number of topics: {}".format( len(freq)))
freq.head(30)
Topic Count Name
0 -1 1 -1_default_greenbone_gmp_manager
1 0 14 0_http_tls_ssl tls_ssl
2 1 8 1_jboss_console_web_application
Run Code Online (Sandbox Code Playgroud)
并检查主题
[('http', 0.0855701486234524),
('tls', 0.061977919455444744),
('ssl tls', 0.061977919455444744),
('ssl', 0.061977919455444744),
('tcp', 0.04551718585531556),
('number', 0.04551718585531556)]
[('jboss', 0.14014705432060262),
('console', 0.09285308122803233),
('web', 0.07323749337563096),
('application', 0.0622930523123512),
('management', 0.0622930523123512),
('apache', 0.05032395169459188)]
Run Code Online (Sandbox Code Playgroud)
我想要的是最终数据, …