我在Python 2.7中有一本词典字典.
我需要快速计算所有键的数量,包括每个词典中的键.
所以在这个例子中,我需要所有键的数量为6:
dict_test = {'key2': {'key_in3': 'value', 'key_in4': 'value'}, 'key1': {'key_in2': 'value', 'key_in1': 'value'}}
Run Code Online (Sandbox Code Playgroud)
我知道我可以使用for循环迭代每个键,但我正在寻找一种更快的方法来执行此操作,因为我将拥有数千/数百万个键,这样做只是无效:
count_the_keys = 0
for key in dict_test.keys():
for key_inner in dict_test[key].keys():
count_the_keys += 1
# something like this would be more effective
# of course .keys().keys() doesn't work
print len(dict_test.keys()) * len(dict_test.keys().keys())
Run Code Online (Sandbox Code Playgroud) 我在sklearn管道中使用递归功能消除,管道看起来像这样:
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn import feature_selection
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
X = ['I am a sentence', 'an example']
Y = [1, 2]
X_dev = ['another sentence']
# classifier
LinearSVC1 = LinearSVC(tol=1e-4, C = 0.10000000000000001)
f5 = feature_selection.RFE(estimator=LinearSVC1, n_features_to_select=500, step=1)
pipeline = Pipeline([
('features', FeatureUnion([
('tfidf', TfidfVectorizer(ngram_range=(1, 3), max_features= 4000)),
('custom_features', CustomFeatures())])),
('rfe_feature_selection', f5),
('clf', LinearSVC1),
])
pipeline.fit(X, Y)
y_pred = pipeline.predict(X_dev)
Run Code Online (Sandbox Code Playgroud)
如何获取RFE选择的功能的功能名称?RFE应该选择最好的500个功能,但我真的需要看一下选择了哪些功能.
编辑:
我有一个复杂的管道,由多个管道和特征联合组成,百分位特征选择和最后的递归特征消除:
fs = feature_selection.SelectPercentile(feature_selection.chi2, percentile=90)
fs_vect = feature_selection.SelectPercentile(feature_selection.chi2, percentile=80) …Run Code Online (Sandbox Code Playgroud) 我正在使用sklearn中的Pipeline对文本进行分类.
在这个例子中,Pipeline我有一个TfIDF矢量化器和一些用FeatureUnion包装的自定义特征和一个分类器作为Pipeline步骤,然后我拟合训练数据并进行预测:
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
X = ['I am a sentence', 'an example']
Y = [1, 2]
X_dev = ['another sentence']
# load custom features and FeatureUnion with Vectorizer
features = []
measure_features = MeasureFeatures() # this class includes my custom features
features.append(('measure_features', measure_features))
countVecWord = TfidfVectorizer(ngram_range=(1, 3), max_features= 4000)
features.append(('ngram', countVecWord))
all_features = FeatureUnion(features)
# classifier
LinearSVC1 = LinearSVC(tol=1e-4, C = 0.10000000000000001)
pipeline = Pipeline(
[('all', all_features ),
('clf', …Run Code Online (Sandbox Code Playgroud) python pipeline classification machine-learning scikit-learn
我的张量大小为[150,182,91],第一部分只是批处理大小,而我感兴趣的矩阵是182x91。
我需要分别针对50个维度在182x91矩阵上运行一个函数。
我需要获取182x91矩阵的对角矩阵条,而我正在使用的功能如下(基于我之前的问题:在numpy或pytorch中自动获取对角矩阵条):
def stripe(a):
i, j = a.size()
assert (i >= j)
out = torch.zeros((i - j + 1, j))
for diag in range(0, i - j + 1):
out[diag] = torch.diag(a, -diag)
return out
Run Code Online (Sandbox Code Playgroud)
该stripe函数需要一个大小为IxJ的矩阵,并且不能处理第三维。
所以当我运行这个:
some_matrix = x # <class 'torch.autograd.variable.Variable'> torch.Size([150, 182, 91])
get_diag = stripe(some_matrix)
Run Code Online (Sandbox Code Playgroud)
我收到此错误: ValueError: too many values to unpack (expected 2)
如果我只是尝试通过跳过第一个维度,则会x, i, j = a.size()得到以下信息:RuntimeError: invalid argument 1: expected a matrix or a vector …
使用正则表达式按复合类名称搜索时,BeautifulSoup返回空列表.
例:
import re
from bs4 import BeautifulSoup
bs =
"""
<a class="name-single name692" href="www.example.com"">Example Text</a>
"""
bsObj = BeautifulSoup(bs)
# this returns the class
found_elements = bsObj.find_all("a", class_= re.compile("^(name-single.*)$"))
# this returns an empty list
found_elements = bsObj.find_all("a", class_= re.compile("^(name-single name\d*)$"))
Run Code Online (Sandbox Code Playgroud)
我需要课程选择非常精确.有任何想法吗?
我正在使用Pipelinesklearn 对文本进行分类。
在此示例中Pipeline,我有一个和一些用分类器TfidfVectorizer包装的自定义功能作为步骤,然后我拟合训练数据并进行预测:FeatureUnionPipeline
from sklearn.pipeline import FeatureUnion, Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
X = ['I am a sentence', 'an example']
Y = [1, 2]
X_dev = ['another sentence']
# classifier
LinearSVC1 = LinearSVC(tol=1e-4, C = 0.10000000000000001)
pipeline = Pipeline([
('features', FeatureUnion([
('tfidf', TfidfVectorizer(ngram_range=(1, 3), max_features= 4000)),
('custom_features', CustomFeatures())])),
('clf', LinearSVC1),
])
pipeline.fit(X, Y)
y_pred = pipeline.predict(X_dev)
# etc.
Run Code Online (Sandbox Code Playgroud)
在这里我需要腌制TfidfVectorizer步骤并保留custom_features未腌制的,因为我仍然用它们做实验。这个想法是通过酸洗 tfidf 步骤来使管道更快。
Pipeline我知道我可以用来腌制整个过程joblib.dump …
python pipeline classification machine-learning scikit-learn
我需要按特定值对字典列表进行排序。不幸的是,有些值是 None 并且排序在 Python 3 中不起作用,因为它不支持 None 与非 None 值的比较。我还需要保留 None 值并将它们作为最低值放置在新的排序列表中。
编码:
import operator
list_of_dicts_with_nones = [
{"value": 1, "other_value": 4},
{"value": 2, "other_value": 3},
{"value": 3, "other_value": 2},
{"value": 4, "other_value": 1},
{"value": None, "other_value": 42},
{"value": None, "other_value": 9001}
]
# sort by first value but put the None values at the end
new_sorted_list = sorted(
(some_dict for some_dict in list_of_dicts_with_nones),
key=operator.itemgetter("value"), reverse=True
)
print(new_sorted_list)
Run Code Online (Sandbox Code Playgroud)
我在 Python 3.6.1 中得到了什么:
Traceback (most recent call last):
File "/home/bilan/PycharmProjects/py3_tests/py_3_sorting.py", line …Run Code Online (Sandbox Code Playgroud) 我需要转换一个 DataFrame,其中一列包含一个元组列表,每个元组中的每个项目都必须是一个单独的列。
这是 Pandas 中的一个示例和解决方案:
import pandas as pd
df_dict = {
'a': {
"1": "stuff", "2": "stuff2"
},
"d": {
"1": [(1, 2), (3, 4)], "2": [(1, 2), (3, 4)]
}
}
df = pd.DataFrame.from_dict(df_dict)
print(df) # intial structure
a d
1 stuff [(1, 2), (3, 4)]
2 stuff2 [(1, 2), (3, 4)]
# first transformation, let's separate each list item into a new row
row_breakdown = df.set_index(["a"])["d"].apply(pd.Series).stack()
print(row_breakdown)
a
stuff 0 (1, 2)
1 (3, 4)
stuff2 0 …Run Code Online (Sandbox Code Playgroud) 我正在使用Python和sklearn进行文本分类。除了矢量化程序外,我还有一些自定义功能。我想知道是否可以将它们与sklearn Pipeline一起使用以及如何将功能堆叠在其中。
我目前没有管道的分类代码的简短示例。请告诉我,如果您发现其中有任何错误,将非常感谢您的帮助。是否可以通过某种方式在sklearn管道中使用它?我创建了自己的函数get_features(),该函数提取自定义功能,转换矢量化程序,缩放功能并最终将所有功能堆叠在一起。
import sklearn.svm
import re
from sklearn import metrics
import numpy
import scipy.sparse
import datetime
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import SVC
from sklearn.svm import LinearSVC
from nltk.tokenize import word_tokenize, sent_tokenize
from sklearn.preprocessing import StandardScaler
# custom feature example
def words_capitalized(sentence):
tokens = []
# tokenize the sentence
tokens = word_tokenize(sentence)
counter = 0
for word in tokens:
if word[0].isupper():
counter += 1
return counter
# custom feature example
def words_length(sentence):
tokens = []
# tokenize the …Run Code Online (Sandbox Code Playgroud) python pipeline classification machine-learning scikit-learn
我目前正在尝试了解如何重用VGG19(或其他架构)以改进我的小图像分类模型.我将图像(在这种情况下是绘画)分为3类(比方说,15,16和17世纪的绘画).我有一个非常小的数据集,每个类1800个训练样例,验证集中每个类250个.
我有以下实现:
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
from keras.callbacks import ModelCheckpoint
from keras.regularizers import l2, l1
from keras.models import load_model
# set proper image ordering for TensorFlow
K.set_image_dim_ordering('th')
batch_size = 32
# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
# this is the augmentation configuration we will use for …Run Code Online (Sandbox Code Playgroud) python machine-learning image-processing deep-learning keras
python ×10
scikit-learn ×4
pipeline ×3
python-2.7 ×3
dataframe ×1
dictionary ×1
html-parsing ×1
keras ×1
matrix ×1
pandas ×1
pyspark ×1
python-3.x ×1
pytorch ×1
regex ×1
sorting ×1
tensor ×1