我正在尝试构建一个Scala jar文件以在spark中运行它.
我正在学习本教程.
试图为使用SBT建jar文件时,在这里,我面临着以下错误
[info] Resolving org.apache.spark#spark-core_2.10.4;1.0.2 ...
[warn] module not found: org.apache.spark#spark-core_2.10.4;1.0.2
[warn] ==== local: tried
[warn] /home/hduser/.ivy2/local/org.apache.spark/spark-core_2.10.4/1.0.2/ivys/ivy.xml
[warn] ==== Akka Repository: tried
[warn] http://repo.akka.io/releases/org/apache/spark/spark-core_2.10.4/1.0.2/spark-core_2.10.4-1.0.2.pom
[warn] ==== public: tried
[warn] http://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10.4/1.0.2/spark-core_2.10.4-1.0.2.pom
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.apache.spark#spark-core_2.10.4;1.0.2: not found
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[error] {file:/home/prithvi/scala/asd/}default-d57abf/*:update: sbt.ResolveException: unresolved dependency: org.apache.spark#spark-core_2.10.4;1.0.2: not found
[error] Total time: 2 s, completed 13 Aug, 2014 5:24:24 PM
Run Code Online (Sandbox Code Playgroud)
问题是什么以及如何解决它.
依赖问题已得到解决.谢谢"om-nom-nom",
但出现了新的错误
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: FAILED DOWNLOADS …Run Code Online (Sandbox Code Playgroud) 我正在使用Hadoop 2.3.0版本.有时,当我执行Map reduce作业时,将显示以下错误.
14/08/10 12:14:59 INFO mapreduce.Job: Task Id : attempt_1407694955806_0002_m_000780_0, Status : FAILED
Error: java.io.IOException: All datanodes 192.168.30.2:50010 are bad. Aborting...
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)
Run Code Online (Sandbox Code Playgroud)
当我尝试检查这些失败任务的日志文件时,此任务的日志文件夹将为空.
我无法理解这个错误背后的原因.有人可以让我知道如何解决这个问题.谢谢你的帮助.
假设我/example在MarkLogic数据库的目录中有50条记录.有没有办法找到目录的大小?在这种情况下它应该给我50.我在搜索API中需要它.
提前致谢
我必须将HDFS文件导出到MySql中.
假设我的HDFS文件是:
1,abcd,23
2,efgh,24
3,ijkl,25
4,mnop,26
5,qrst,27
Run Code Online (Sandbox Code Playgroud)
并说我的Mysql数据库架构是:
+-----+-----+-------------+
| ID | AGE | NAME |
+-----+-----+-------------+
| | | |
+-----+-----+-------------+
Run Code Online (Sandbox Code Playgroud)
当我使用以下Sqoop命令插入时:
sqoop export \
--connect jdbc:mysql://localhost/DBNAME \
--username root \
--password root \
--export-dir /input/abc \
--table test \
--fields-terminated-by "," \
--columns "id,name,age"
Run Code Online (Sandbox Code Playgroud)
它工作正常并插入数据库.
但是,当我需要更新已经存在的记录时,我必须使用--update-key和--columns.
现在,当我尝试使用以下命令更新表时:
sqoop export \
--connect jdbc:mysql://localhost/DBNAME \
--username root \
--password root \
--export-dir /input/abc \
--table test \
--fields-terminated-by "," \
--columns "id,name,age" \
--update-key id
Run Code Online (Sandbox Code Playgroud)
我面临的问题是数据没有更新到列中的指定 …
hadoop ×2
hdfs ×2
apache-spark ×1
hadoop-yarn ×1
hadoop2 ×1
mapreduce ×1
marklogic ×1
sbt ×1
scala ×1
sqoop2 ×1