我有一个带有列的数据框
category
0 [???????/Hi-Tech/????????/?????????????/ ]
1 [/???????/??????/????????????/???? ???????????...
2 []
3 [/???????/??????/????????????/???? ???????????...
4 [???????/Hi-Tech/????????/?????????????/ ]
5 []
6 [???????/Hi-Tech/????????/?????????????/ ]
7 [/???????/??????/????????????/???? ???????????...
8 [???????/Hi-Tech/????????/?????????????/ ]
9 [/???????/??????/????????????/???? ???????????...
10 [???????/Hi-Tech/????????/?????????????/ ]
11 [/???????/??????/????????????/???? ???????????...
12 []
13 [/???????/??????/????????????/???? ???????????...
14 [???????/Hi-Tech/????????/?????????????/ ]
Run Code Online (Sandbox Code Playgroud)
列中有列表.我需要从每个列表中获取第一个字符串,但有些列表是空的,当我尝试使用时
df.category.iloc[0]
Run Code Online (Sandbox Code Playgroud)
我明白了
ValueError:值的长度与索引的长度不匹配
如何修复该错误并获取字符串而不是列表?
我需要运行docker容器.
首先,我已将其拉出来
docker pull [OPTIONS] NAME[:TAG|@DIGEST]
Run Code Online (Sandbox Code Playgroud)
接下来我尝试运行它
docker run [OPTIONS] IMAGE[:TAG|@DIGEST] [COMMAND] [ARG...]
Run Code Online (Sandbox Code Playgroud)
但是我收到了一个错误
docker: Error response from daemon: driver failed programming external connectivity on endpoint youthful_bhaskara (47fae1c2ecd6245d127801729b80276aeb3858526a9441760925d904ce1565ff): Error starting userland proxy: listen tcp 0.0.0.0:8888: bind: address already in use.
ERRO[0000] error waiting for container: context canceled
Run Code Online (Sandbox Code Playgroud)
随着sudo我有一个常见的错误.
我该如何解决这个问题?也许我错过了一些中间行动?
我已经训练了Doc2Vec模型,试图获得预测。
我用
test_data = word_tokenize("????? ?????? ???????? ?.?.".lower())
model = Doc2Vec.load(model_path)
v1 = model.infer_vector(test_data)
sims = model.docvecs.most_similar([v1])
print(sims)
Run Code Online (Sandbox Code Playgroud)
退货
[('624319', 0.7534812092781067), ('566511', 0.7333904504776001), ('517382', 0.7264763116836548), ('523368', 0.7254455089569092), ('494248', 0.7212602496147156), ('382920', 0.7092794179916382), ('530910', 0.7086726427078247), ('513421', 0.6893941760063171), ('196931', 0.6776881814002991), ('196947', 0.6705600023269653)]
Run Code Online (Sandbox Code Playgroud)
接下来我试图知道,这个数字是什么文字
model.docvecs['624319']
Run Code Online (Sandbox Code Playgroud)
但是它只返回矢量表示形式
array([ 0.36298314, -0.8048847 , -1.4890883 , -0.3737898 , -0.00292279,
-0.6606688 , -0.12611026, -0.14547637, 0.78830665, 0.6172428 ,
-0.04928801, 0.36754376, -0.54034036, 0.04631123, 0.24066721,
0.22503968, 0.02870891, 0.28329515, 0.05591608, 0.00457001],
dtype=float32)
Run Code Online (Sandbox Code Playgroud)
那么,有什么方法可以从模型中获取该标签的文本吗?加载火车数据集需要很多时间,因此我尝试寻找另一种方法。
我想编写func并将其添加到类中.我用
import pandas as pd
import tldextract
domain = []
df = pd.DataFrame()
df['urls'] = ['ru.vk.com', 'eng.facebook.com', 'ru.ya.ru']
urls = df.urls.values.tolist()
class csv:
def get_domain(self, list_url, list, df):
self.list_url = list_url
self.list = list
self.df = df
for i, url in enumerate(list_url):
get_domain = tldextract.extract(url)
subdomain = get_domain[0] + '.' + get_domain[1] + '.' + get_domain[2]
if subdomain.startswith('.'):
subdomain = subdomain[1:]
elif subdomain.endswith('.'):
subdomain = subdomain[:-1]
elif subdomain.startswith('www.'):
subdomain = subdomain[4:]
list.append(subdomain)
df['subdomain'] = list
df = csv()
df.get_domain(urls, domain, …Run Code Online (Sandbox Code Playgroud)