def gradient(X_norm,y,theta,alpha,m,n,num_it):
temp=np.array(np.zeros_like(theta,float))
for i in range(0,num_it):
h=np.dot(X_norm,theta)
#temp[j]=theta[j]-(alpha/m)*( np.sum( (h-y)*X_norm[:,j][np.newaxis,:] ) )
temp[0]=theta[0]-(alpha/m)*(np.sum(h-y))
temp[1]=theta[1]-(alpha/m)*(np.sum((h-y)*X_norm[:,1]))
theta=temp
return theta
X_norm,mean,std=featureScale(X)
#length of X (number of rows)
m=len(X)
X_norm=np.array([np.ones(m),X_norm])
n,m=np.shape(X_norm)
num_it=1500
alpha=0.01
theta=np.zeros(n,float)[:,np.newaxis]
X_norm=X_norm.transpose()
theta=gradient(X_norm,y,theta,alpha,m,n,num_it)
print theta
Run Code Online (Sandbox Code Playgroud)
从上面的代码我的theta是100.2 100.2
,但它应该100.2 61.09
在matlab中是正确的.
python numpy machine-learning linear-regression gradient-descent
我在Java列表迭代代码上运行一些微基准测试.我使用了-XX:+ PrintCompilation和-verbose:gc标志来确保在运行定时时后台没有发生任何事情.但是,我在输出中看到了一些我无法理解的东西.
这是代码,我正在运行基准测试:
import java.util.ArrayList;
import java.util.List;
public class PerformantIteration {
private static int theSum = 0;
public static void main(String[] args) {
System.out.println("Starting microbenchmark on iterating over collections with a call to size() in each iteration");
List<Integer> nums = new ArrayList<Integer>();
for(int i=0; i<50000; i++) {
nums.add(i);
}
System.out.println("Warming up ...");
//warmup... make sure all JIT comliling is done before the actual benchmarking starts
for(int i=0; i<10; i++) {
iterateWithConstantSize(nums);
iterateWithDynamicSize(nums);
}
//actual
System.out.println("Starting the actual test");
long constantSizeBenchmark …
Run Code Online (Sandbox Code Playgroud) 刚开始我的图表处理方法和工具的游览.我们基本上做了什么 - 计算一些标准指标,如pagerank,聚类系数,三角计数,直径,连通性等.过去对Octave感到满意,但是当我们开始使用图表时让我们说10 ^ 9个节点/边缘我们卡住了.
因此,可能的解决方案可以是使用Hadoop/Giraph,Spark/GraphX,Neo4j在其上的分布式云等.
但由于我是初学者,有人可以建议实际选择什么?在使用Spark/GraphX和Neo4j时,我没有什么区别?现在我考虑Spark/GraphX,因为它有更多类似Python的语法,而neo4j有自己的Cypher.neo4j中的可视化很酷但在如此大规模上没有用.我不明白是否有理由使用额外级别的软件(neo4j)或只使用Spark/GraphX?因为我理解neo4j不会节省那么多时间,就像我们使用纯粹的hadoop vs Giraph或GraphX或Hive一样.
谢谢.
我正在尝试实现Writable类,但是如果在我的类中有嵌套对象,例如list等,我不知道如何实现可写类.任何正文可以帮助我吗?谢谢
public class StorageClass implements Writable{
public String xStr;
public String yStr;
public List<Field> sStor
//omitted ctors
@override
public void write(DataOutput out) throws IOException{
out.writeChars(xStr);
out.WriteChars(yStr);
//WHAT SHOULD I DO FOR List<Field>
}
@override
public void readFields(DataInput in) throws IOException{
xStr = in.readLine();
yStr = in.readLine();
//WHAT SHOULD I DO FOR List<Field>
}
}
public class SubStorage{
public String x;
public String y;
}
}
Run Code Online (Sandbox Code Playgroud)
以下是Field类:
public final class Field implements Comparable<Field>, Serializable {
private String name;
private DataType dataType; …
Run Code Online (Sandbox Code Playgroud) 问题是
hduser@saket-K53SM:/usr/local/hadoop$ jps
The program 'jps' can be found in the following packages:
* openjdk-6-jdk
* openjdk-7-jdk
Try: sudo apt-get install <selected package>
Run Code Online (Sandbox Code Playgroud)
我的配置是
hduser@saket-K53SM:/usr/local/hadoop$ java -version
java version "1.6.0_33"
Java(TM) SE Runtime Environment (build 1.6.0_33-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode)
Run Code Online (Sandbox Code Playgroud)
设置conf/hadoop-env.sh
hduser@saket-K53SM:/usr/local/hadoop$ cat conf/hadoop-env.sh | grep JAVA_HOME
# The only required environment variable is JAVA_HOME. All others are
# set JAVA_HOME in this file, so that it is correctly defined on
export JAVA_HOME=/usr/lib/jvm/jdk1.6.0_33/
Run Code Online (Sandbox Code Playgroud)
我知道有一个问题(http://stackoverflow.com/questions/7843422/hadoop-jps-can-not-find-java-installed)与此类似.但是我在这里安装了Sun jdk.所以任何帮助将不胜感激..
嗨,下面是Effective Java 2nd Edition的片段.在这里,作者声称下面的代码比你不使用结果变量快25%.根据书中的"这个变量的作用是确保该字段在已经初始化的常见情况下只读取一次." .我无法理解为什么这个代码在初始化之后会比较快,如果我们不使用Local变量结果.在任何一种情况下,无论是否使用局部变量结果,初始化后只有一个易失性读取.
// Double-check idiom for lazy initialization of instance fields
private volatile FieldType field;
FieldType getField() {
FieldType result = field;
if (result == null) { // First check (no locking)
synchronized(this) {
result = field;
if (result == null) // Second check (with locking)
field = result = computeFieldValue();
}
}
return result;
}
Run Code Online (Sandbox Code Playgroud) 我的问题是关于mapreduce programming in java
.
假设我有WordCount.java示例,一个标准mapreduce program
.我希望地图功能来收集一些信息,并返回到减少功能图形成,如:<slaveNode_id,some_info_collected>
,
所以I can know what slave node collected what data
..任何想法如何?
public class WordCount {
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken()); …
Run Code Online (Sandbox Code Playgroud) 在典型的MapReduce设置(如Hadoop)中,1个任务使用了多少个reducer,例如,计算单词?我对Google MapReduce的理解意味着只涉及1个reducer.那是对的吗?
例如,单词count将输入分成N个块,N Map将运行,产生(word,#)列表.我的问题是,一旦Map阶段完成,是否只有一个reducer实例运行来计算结果?还是会有减速器并行运行?
我目前正在试图弄清楚当你运行MapReduce作业时,通过在代码上的某些位置制作一些system.out.println()会发生什么,但是当作业运行时,知道那些print语句会在我的终端上打印.有人可以帮我弄清楚我到底做错了什么.
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.OutputCommitter;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.RecordWriter;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.StatusReporter;
import org.apache.hadoop.mapreduce.TaskAttemptID;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCountJob {
public static int iterations;
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException { …
Run Code Online (Sandbox Code Playgroud)