我试图tensorflow.contrib.learn.python.learn.estimators.svm用稀疏数据训练tensorflow svm估计器.在github repo处使用稀疏数据的示例用法tensorflow/contrib/learn/python/learn/estimators/svm_test.py#L167(我不允许发布更多链接,因此这里是相对路径).
svm估计器期望作为参数,example_id_column并且feature_columns其中特征列应该派生类,FeatureColumn例如tf.contrib.layers.feature_column.sparse_column_with_hash_bucket.请参阅Github repo at tensorflow/contrib/learn/python/learn/estimators/svm.py#L85和tensorflow.org上的文档python/contrib.layers#Feature_columns.
我使用的a1a数据是LIBSVM网站上的数据集.该数据集具有123个特征(如果数据密集,则对应于123个feature_columns).我写了一个用户op来读取数据,tf.decode_csv()但是对于LIBSVM格式.op将标签返回为密集张量,将特征返回为稀疏张量.我的输入管道:
NUM_FEATURES = 123
batch_size = 200
# my op to parse the libsvm data
decode_libsvm_module = tf.load_op_library('./libsvm.so')
def input_pipeline(filename_queue, batch_size):
with tf.name_scope('input'):
reader = tf.TextLineReader(name="TextLineReader_")
_, libsvm_row = reader.read(filename_queue, name="libsvm_row_")
min_after_dequeue = 1000
capacity = min_after_dequeue + 3 * batch_size
batch = tf.train.shuffle_batch([libsvm_row], batch_size=batch_size,
capacity=capacity, …Run Code Online (Sandbox Code Playgroud)