嗨,我是tensorflow的新手.我想在tensorflow中实现以下python代码.
import numpy as np
a = np.array([1,2,3,4,5,6,7,9,0])
print(a) ## [1 2 3 4 5 6 7 9 0]
print(a.shape) ## (9,)
b = a[:, np.newaxis] ### want to write this in tensorflow.
print(b.shape) ## (9,1)
Run Code Online (Sandbox Code Playgroud) 我正在尝试使用 keras 和张量流后端来实现高斯混合模型。有关于如何实施的指南或示例吗?
我正在寻找tensorflow中numpy.linalg.pinv的替代方案.到目前为止,我发现tf.matrix_inverse(input, adjoint=None, name=None)如果矩阵不可逆,张量流只会引发错误.
如果唯一字的总数大于10亿,则在word2vec算法中设置的矢量大小的最佳数目是多少?
我正在为word2vec使用Apache Spark Mllib 1.6.0。
示例代码:-
public class Main {
public static void main(String[] args) throws IOException {
SparkConf conf = new SparkConf().setAppName("JavaWord2VecExample");
conf.setMaster("local[*]");
JavaSparkContext jsc = new JavaSparkContext(conf);
SQLContext sqlContext = new SQLContext(jsc);
// $example on$
// Input data: Each row is a bag of words from a sentence or document.
JavaRDD<Row> jrdd = jsc.parallelize(Arrays.asList(
RowFactory.create(Arrays.asList("Hi I heard about Spark".split(" "))),
RowFactory.create(Arrays.asList("Hi I heard about Java".split(" "))),
RowFactory.create(Arrays.asList("I wish Java could use case classes".split(" "))),
RowFactory.create(Arrays.asList("Logistic regression models are …Run Code Online (Sandbox Code Playgroud) Spark MLlib中的HashingTF中的numFeatures与文档(句子)中的实际术语数之间是否存在任何关系?
List<Row> data = Arrays.asList(
RowFactory.create(0.0, "Hi I heard about Spark"),
RowFactory.create(0.0, "I wish Java could use case classes"),
RowFactory.create(1.0, "Logistic regression models are neat")
);
StructType schema = new StructType(new StructField[]{
new StructField("label", DataTypes.DoubleType, false, Metadata.empty()),
new StructField("sentence", DataTypes.StringType, false, Metadata.empty())
});
Dataset<Row> sentenceData = spark.createDataFrame(data, schema);
Tokenizer tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words");
Dataset<Row> wordsData = tokenizer.transform(sentenceData);
int numFeatures = 20;
HashingTF hashingTF = new HashingTF()
.setInputCol("words")
.setOutputCol("rawFeatures")
.setNumFeatures(numFeatures);
Dataset<Row> featurizedData = hashingTF.transform(wordsData);
Run Code Online (Sandbox Code Playgroud)
如Spark Mllib文档中所述,HashingTF将每个句子转换为长度为numFeatures的特征向量。如果此处的每个文档(句子中包含成千上万个术语)会发生什么情况?numFeatures的值应该是多少?如何计算该值?
python-3.x ×3
tensorflow ×3
numpy ×2
python ×2
apache-spark ×1
gmm ×1
keras ×1
tensor ×1
tf-idf ×1
word2vec ×1