在构建Sqoop2时:
mvn包-Pbinary
我收到一个错误:
":行家现场-插件:3.0-β-3:现场:执行org.apache.maven.plugins甲所需的类缺少组织/ Sonatype的/乙醚/图形/ DependencyFilter"
如何构建Sqoop2?
我在跑步:
Apache Maven 3.2.1
Java版本:1.7.0_51
CentOS 6.5,内核2.6.32-431.5.1.el6.x86_64
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-site-plugin:3.0-beta-3:site (packaging-documentation) on project sqoop-docs: Execution packaging-documentation of goal org.apache.maven.plugins:maven-site-plugin:3.0-beta-3:site failed: A required class was missing while executing org.apache.maven.plugins:maven-site-plugin:3.0-beta-3:site: org/sonatype/aether/graph/DependencyFilter
[ERROR] -----------------------------------------------------
[ERROR] realm = plugin>org.apache.maven.plugins:maven-site-plugin:3.0-beta-3
[ERROR] strategy = org.codehaus.plexus.classworlds.strategy.SelfFirstStrategy
[ERROR] urls[0] = file:/home/dk/.m2/repository/org/apache/maven/plugins/maven-site-plugin/3.0-beta-3/maven-site-plugin-3.0-beta-3.jar
[ERROR] urls[1] = file:/home/dk/.m2/repository/org/apache/maven/reporting/maven-reporting-api/3.0/maven-reporting-api-3.0.jar
[ERROR] urls[2] = file:/home/dk/.m2/repository/org/codehaus/plexus/plexus-interpolation/1.14/plexus-interpolation-1.14.jar
[ERROR] urls[3] = file:/home/dk/.m2/repository/org/sonatype/sisu/sisu-inject-bean/1.4.2/sisu-inject-bean-1.4.2.jar
[ERROR] urls[4] = file:/home/dk/.m2/repository/org/sonatype/sisu/sisu-guice/2.1.7/sisu-guice-2.1.7-noaop.jar
[ERROR] urls[5] = file:/home/dk/.m2/repository/org/codehaus/plexus/plexus-component-annotations/1.5.5/plexus-component-annotations-1.5.5.jar
[ERROR] urls[6] = file:/home/dk/.m2/repository/org/sonatype/plexus/plexus-sec-dispatcher/1.3/plexus-sec-dispatcher-1.3.jar
[ERROR] urls[7] = …Run Code Online (Sandbox Code Playgroud) 根据我的理解,sqoop用于将表/数据从数据库导入或导出到HDFS或Hive或HBASE。
而且我们可以直接导入单个表或表列表。内部mapreduce程序(我认为只有map任务)将运行。
我的疑问是什么是直接使用sqoop,什么时候使用直接选择sqoop?
因此,使用mapreduce v2,您可以使用绑定到某些YARN队列来管理资源和优先级.基本上是通过使用
"hadoop jar /xyz.jar -D mapreduce.job.queuename = QUEUE1/input/output"完美无缺.
在运行sqoop查询时,如何将Yarn队列绑定与Sqoop集成?
即.sqoop import\--connect'jdbc:// server'\ --target-dir \和什么?
我正在使用 sqoop 1.4.2 版本。我正在尝试将 sqoop 元存储从默认的 hsqldb 更改为 mysql。
我在 sqoop-site.xml 文件中配置了以下属性。
<property>
<name>sqoop.metastore.client.enable.autoconnect</name>
<value>false</value>
<description>If true, Sqoop will connect to a local metastore
for job management when no other metastore arguments are
provided.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.url</name>
<value>jdbc:mysql://ip:3206/sqoop?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.username</name>
<value>userName</value>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.password</name>
<value>password</value>
</property>
</configuration>
Run Code Online (Sandbox Code Playgroud)
当我尝试使用元连接 url 创建 sqoop 作业时,它无法连接到配置的 mysql 数据库。
sqoop job --create --meta-connect {mysql_jdbc_url} sqoop job defination
Run Code Online (Sandbox Code Playgroud)
它抛出以下异常。
14/06/06 15:04:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.4.2.0.6.1-101
14/06/06 15:04:55 WARN hsqldb.HsqldbJobStorage: Could not interpret …Run Code Online (Sandbox Code Playgroud) 我正在sqoop上运行命令
sqoop import --connect jdbc:mysql://localhost/hadoopguide --table widgets
Run Code Online (Sandbox Code Playgroud)
我的sqoop版本:Sqoop 1.4.4.2.0.6.1-101
Hadoop - Hadoop 2.2.0.2.0.6.0-101
两者均取自hortonworks分销.HADOOP_HOME,HCAT_HOME,SQOOP_HOME等所有路径都已正确设置.我可以通过在sqoop中运行list-database,list-tables命令从mysql数据库获取数据库列表,表列表.甚至能够从--query'select*from widgets'获取数据; 但是当我使用--table选项低于错误.
14/02/06 14:02:17 WARN mapred.LocalJobRunner: job_local177721176_0001
java.lang.Exception: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:403)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)
at org.apache.sqoop.mapreduce.db.DBConfiguration.getInputClass(DBConfiguration.java:394)
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.createDBRecordReader(DataDrivenDBInputFormat.java:233)
at org.apache.sqoop.mapreduce.db.DBInputFormat.createRecordReader(DBInputFormat.java:236)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:491)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:734)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.ClassNotFoundException: Class widgets not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)
... 13 …Run Code Online (Sandbox Code Playgroud) 我在SQOOP中运行下面的代码段,最后在代码下面列出了错误.
bin/sqoop job --create myjob import --connect jdbc:mysql:// localhost/test -username root -password root --table patient -m 1 --target-dir/Sqoop/MRJob
administrator @ ubuntu:〜/ sqoop-1.4.4.bin__hadoop-1.0.0 $ bin/sqoop job --create myjob import --connect jdbc:mysql:// localhost/test -username root -password root --table patient - m 1 --target-dir/Sqoop/MRJob警告:/ usr/lib/hcatalog不存在!HCatalog作业将失败.请将$ HCAT_HOME设置为HCatalog安装的根目录.警告:不推荐使用$ HADOOP_HOME.
16/07/16 23:23:36错误工具.BaseSqoopTool:解析作业的参数时出错:14/07/16 23:23:36错误工具.BaseSqoopTool:无法识别的参数:import 14/07/16 23:23:36 ERROR tool.BaseSqoopTool:无法识别的参数: - connect 14/07/16 23:23:36 ERROR tool.BaseSqoopTool:无法识别的参数:jdbc:mysql:// localhost/test 14/07/16 23:23:36错误工具.BaseSqoopTool:无法识别的参数:-username 14/07/16 23:23:36 ERROR tool.BaseSqoopTool:无法识别的参数:root 14/07/16 23:23:36 ERROR tool.BaseSqoopTool:无法识别的参数:-password 14/07/16 23:23:36 ERROR tool.BaseSqoopTool:无法识别的参数:root 14/07/16 23:23:36 ERROR tool.BaseSqoopTool:无法识别的参数:--table 14/07/16 23:23:36错误工具. BaseSqoopTool:无法识别的参数:patient 14/07/16 …
我必须将HDFS文件导出到MySql中.
假设我的HDFS文件是:
1,abcd,23
2,efgh,24
3,ijkl,25
4,mnop,26
5,qrst,27
Run Code Online (Sandbox Code Playgroud)
并说我的Mysql数据库架构是:
+-----+-----+-------------+
| ID | AGE | NAME |
+-----+-----+-------------+
| | | |
+-----+-----+-------------+
Run Code Online (Sandbox Code Playgroud)
当我使用以下Sqoop命令插入时:
sqoop export \
--connect jdbc:mysql://localhost/DBNAME \
--username root \
--password root \
--export-dir /input/abc \
--table test \
--fields-terminated-by "," \
--columns "id,name,age"
Run Code Online (Sandbox Code Playgroud)
它工作正常并插入数据库.
但是,当我需要更新已经存在的记录时,我必须使用--update-key和--columns.
现在,当我尝试使用以下命令更新表时:
sqoop export \
--connect jdbc:mysql://localhost/DBNAME \
--username root \
--password root \
--export-dir /input/abc \
--table test \
--fields-terminated-by "," \
--columns "id,name,age" \
--update-key id
Run Code Online (Sandbox Code Playgroud)
我面临的问题是数据没有更新到列中的指定 …