小编Rah*_*hul的帖子

张量流中numpy.newaxis的替代方法是什么？

嗨,我是tensorflow的新手.我想在tensorflow中实现以下python代码.

import numpy as np
a = np.array([1,2,3,4,5,6,7,9,0])
print(a) ## [1 2 3 4 5 6 7 9 0]
print(a.shape) ## (9,)
b = a[:, np.newaxis] ### want to write this in tensorflow.
print(b.shape) ## (9,1)

Run Code Online (Sandbox Code Playgroud)

python numpy python-3.x tensorflow tensor

Rah*_*hul

2019 04-14

11
推荐指数

2
解决办法

6027
查看次数

使用keras实现高斯混合模型

我正在尝试使用 keras 和张量流后端来实现高斯混合模型。有关于如何实施的指南或示例吗？

python-3.x keras tensorflow gmm

Rah*_*hul

2017 02-27

5
推荐指数

1
解决办法

3037
查看次数

张量流中numpy.linalg.pinv的替代方案

我正在寻找tensorflow中numpy.linalg.pinv的替代方案.到目前为止,我发现tf.matrix_inverse(input, adjoint=None, name=None)如果矩阵不可逆,张量流只会引发错误.

python numpy python-3.x tensorflow

Rah*_*hul

2017 02-28

4
推荐指数

2
解决办法

3044
查看次数

Word2Vec向量大小与扫描的单词总数之间的关系？

如果唯一字的总数大于10亿，则在word2vec算法中设置的矢量大小的最佳数目是多少？

我正在为word2vec使用Apache Spark Mllib 1.6.0。

示例代码：-

public class Main {       
      public static void main(String[] args) throws IOException {

        SparkConf conf = new SparkConf().setAppName("JavaWord2VecExample");
        conf.setMaster("local[*]");
        JavaSparkContext jsc = new JavaSparkContext(conf);
        SQLContext sqlContext = new SQLContext(jsc);

        // $example on$
        // Input data: Each row is a bag of words from a sentence or document.
        JavaRDD<Row> jrdd = jsc.parallelize(Arrays.asList(
          RowFactory.create(Arrays.asList("Hi I heard about Spark".split(" "))),
          RowFactory.create(Arrays.asList("Hi I heard about Java".split(" "))),
          RowFactory.create(Arrays.asList("I wish Java could use case classes".split(" "))),
          RowFactory.create(Arrays.asList("Logistic regression models are …

Run Code Online (Sandbox Code Playgroud)

machine-learning word2vec apache-spark-mllib

Rah*_*hul

lucky-day

4
推荐指数

2
解决办法

3776
查看次数

Spark MLlib中的HashingTF中的numFeatures与文档中的实际项数之间有什么关系？

Spark MLlib中的HashingTF中的numFeatures与文档（句子）中的实际术语数之间是否存在任何关系？

List<Row> data = Arrays.asList(
  RowFactory.create(0.0, "Hi I heard about Spark"),
  RowFactory.create(0.0, "I wish Java could use case classes"),
  RowFactory.create(1.0, "Logistic regression models are neat")
);
StructType schema = new StructType(new StructField[]{
  new StructField("label", DataTypes.DoubleType, false, Metadata.empty()),
  new StructField("sentence", DataTypes.StringType, false, Metadata.empty())
});
Dataset<Row> sentenceData = spark.createDataFrame(data, schema);

Tokenizer tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words");
Dataset<Row> wordsData = tokenizer.transform(sentenceData);

int numFeatures = 20;
HashingTF hashingTF = new HashingTF()
  .setInputCol("words")
  .setOutputCol("rawFeatures")
  .setNumFeatures(numFeatures);

Dataset<Row> featurizedData = hashingTF.transform(wordsData);

Run Code Online (Sandbox Code Playgroud)

如Spark Mllib文档中所述，HashingTF将每个句子转换为长度为numFeatures的特征向量。如果此处的每个文档（句子中包含成千上万个术语）会发生什么情况？numFeatures的值应该是多少？如何计算该值？

machine-learning tf-idf apache-spark apache-spark-mllib

Rah*_*hul

2019 09-19

3
推荐指数

1
解决办法

1938
查看次数