我正在尝试使用下面的代码连接到 Kerberized hdfs 集群,使用下面相同的代码我当然可以使用 HBaseConfiguration 访问 hbase,
Configuration config = new Configuration();
config.set("hadoop.security.authentication", "Kerberos");
UserGroupInformation.setConfiguration(config);
UserGroupInformation ugi = null;
ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI("me@EXAMPLE>COM","me.keytab");
model = ugi.doAs((PrivilegedExceptionAction<Map<String,Object>>) () -> {
testHadoop(hcb.gethDFSConfigBean());
return null;
});
Run Code Online (Sandbox Code Playgroud)
我已经能够使用相同的密钥表和主体成功访问 Solr、Impala,但我遇到了这个奇怪的问题:无法找到 hdfs 的服务名称。
请查看下面的堆栈跟踪
java.io.IOException: Failed on local exception: java.io.IOException: java.lang.IllegalArgumentException: Failed to specify server's Kerberos principal name; Host Details : local host is: "Securonix-int3.local/10.0.4.36"; destination host is: "sobd189.securonix.com":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
at org.apache.hadoop.ipc.Client.call(Client.java:1472)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) …Run Code Online (Sandbox Code Playgroud) 我尝试强制分割一个区域并收到以下错误。
ERROR: org.apache.hadoop.hbase.DoNotRetryIOException: 3dd9ec2b32c98131b39fbfa8266881f9 NOT splittable
at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.checkSplittable(SplitTableRegionProcedure.java:193)
at org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.<init>(SplitTableRegionProcedure.java:115)
at org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:750)
at org.apache.hadoop.hbase.master.HMaster$3.run(HMaster.java:1859)
at org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:134)
at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1851)
at org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:808)
at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
Run Code Online (Sandbox Code Playgroud)
有人对这个错误有深入的了解吗?
我正在使用 Cloudera 6.1.1 和 HBase 2.1.0。
我正在尝试在 Windows 10 中运行 Cloudera docker 机器,但它甚至在记录一行之前就死掉了,我尝试
docker run -m 8G --memory-reservation 3G --memory-swap 8G --hostname=quickstart.cloudera --privileged=true -t -i -v C:\\sw\\mi_docker_vol_1:/src --publish-all=true -p 8888 cloudera/quickstart /usr/bin/docker-quickstart
Run Code Online (Sandbox Code Playgroud)
但它不起作用,有什么想法吗?
我有很多输入文件,我想根据最后附加的日期处理选定的文件.我现在很困惑我在哪里使用globStatus方法来过滤掉文件.
我有一个自定义的RecordReader类,我试图在其下一个方法中使用globStatus,但它没有成功.
public boolean next(Text key, Text value) throws IOException {
Path filePath = fileSplit.getPath();
if (!processed) {
key.set(filePath.getName());
byte[] contents = new byte[(int) fileSplit.getLength()];
value.clear();
FileSystem fs = filePath.getFileSystem(conf);
fs.globStatus(new Path("/*" + date));
FSDataInputStream in = null;
try {
in = fs.open(filePath);
IOUtils.readFully(in, contents, 0, contents.length);
value.set(contents, 0, contents.length);
} finally {
IOUtils.closeStream(in);
}
processed = true;
return true;
}
return false;
}
Run Code Online (Sandbox Code Playgroud)
我知道它返回一个FileStatus数组,但我如何使用它来过滤文件.有人可以解释一下吗?
我有5个节点hadoop集群.我已经为每个节点配置了10个映射器.当MR作业正在运行时,其中一个hdfs节点死亡.这最终会导致该任务跟踪器被列入黑名单.在黑名单和MR作业完成之前,如果我修复受影响的hdfs节点,是否可以从黑名单中恢复任务跟踪器?
我在ubuntu上使用cloudera cdh 4.2.
我想知道是否可以根据文件数告诉使用的映射器/缩减器的数量(默认情况下)?
我知道映射器的数量取决于块大小而不是实际文件大小,但是要确保我是否遗漏了任何东西.
例如:
如果hdfs中有4个目录,其中包含4个文件.
dir1/file1 - contains (testing file 1, testing again)
dir2/file2 - contains (testing file 2, testing again)
dir3/file3 - contains (testing file 3, testing again)
dir4/file4 - contains (testing file 4, testing again)
Run Code Online (Sandbox Code Playgroud)
有没有办法告诉我们将使用多少映射器和缩减器来处理上述四个文件?
我将unixodbc配置为在我的Linux Mint机器中使用cloudera的hive连接器,但是在尝试连接到hive时我一直收到以下错误(例如使用isql -v hive)
S1000][unixODBC][Cloudera][ODBC] (11560) Unable to locate SQLGetPrivateProfileString function.
[ISQL]ERROR: Could not SQLConnect
Run Code Online (Sandbox Code Playgroud)
我想我以正确的方式设置了/etc/odbcinst.ini和〜/ .odbc.ini:
# content of /etc/odbcinst.ini
[hive]
Description = Cloudera ODBC Driver for Apache Hive (64-bit)
Driver=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
ODBCInstLib=libodbcinst.a(libodbcinst.so.1)
UsageCount = 1
DriverManagerEncoding=UTF-16
ErrorMessagesPath=/opt/cloudera/hiveodbc/ErrorMessages/
LogLevel=0
SwapFilePath=/tmp
Run Code Online (Sandbox Code Playgroud)
和我的〜/ .odbc.ini文件包含:
[hive]
Description=Cloudera ODBC Driver for Apache Hive (64-bit) DSN
Driver = hive
ErrorMessagesPath=/opt/cloudera/hiveodbc/ErrorMessages/
# Values for HOST, PORT, KrbHostFQDN, and KrbServiceName should be set here.
# They can also be specified on the connection string.
HOST= …Run Code Online (Sandbox Code Playgroud) 我正在使用CDH 5.3.3并使用hive JDBC驱动程序连接到安全集群中的配置单元.我尝试使用keytab登录
UserGroupInformation.loginUserFromKeytab(lprincipal,keytabpath);
我使用以下格式的hive网址.
JDBC:hive2://本地主机:10000; AuthMech = 1; KrbRealm = EXAMPLE.COM; KrbHostFQDN = hs2.example.com; KrbServiceName =蜂巢
示例代码:
// Authenticating Kerberos principal
System.out.println("Principal Authentication: ");
final String user = "cloudera@CLOUDERA.COM";
final String keyPath = "cloudera.keytab";
UserGroupInformation.loginUserFromKeytab(user, keyPath);
Connection connection = DriverManager.getConnection(url);
Run Code Online (Sandbox Code Playgroud)
网址格式如下:
JDBC:hive2://本地主机:10000; AuthMech = 1; KrbRealm = EXAMPLE.COM; KrbHostFQDN = hs2.example.com; KrbServiceName =蜂巢
我得到以下异常,如果在确定此问题的原因时提供了一些帮助,我将不胜感激:
com.cloudera.hive.support.exceptions.GeneralException: CONN_KERBEROS_AUTHENTICATION_ERROR_GET_TICKETCACHE
javax.security.auth.login.LoginException: Unable to obtain Princpal Name for authentication
at com.sun.security.auth.module.Krb5LoginModule.promptForName(Krb5LoginModule.java:800)
at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:671)
at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:584)
at sun.reflect.NativeMethodAccessorImpl.inv
Run Code Online (Sandbox Code Playgroud)
对于不同的集群分布,添加调试后,我看到以下异常:
DEBUG org.apache.hadoop.security.UserGroupInformation: hadoop login
DEBUG org.apache.hadoop.security.UserGroupInformation: …Run Code Online (Sandbox Code Playgroud) 我使用hbase来存储来自web的一些数据.我还使用apache Hue直观地查看hbase中的内容.但它只显示数据库中的前十个条目.我无法让它显示更多.没有下一页按钮.
我知道我可以使用API和终端与hbase进行交互,但我喜欢Hue给我的便利.
我正在使用Cloudera 4.7.1-1.cdh4.7.1.p0.47附带的Hue.太旧了吗?
任何人都知道如何让Hue显示hbase数据库的其余部分?搜索似乎也不能很好地工作
我正在尝试通过在cloudera quickstart VM 5.3.0上运行的独立Spark服务来执行通过Scala IDE构建的Spark应用程序.
我的cloudera帐户JAVA_HOME是/ usr/java/default
但是,我从cloudera用户执行start-all.sh命令时面临以下错误消息,如下所示:
[cloudera@localhost sbin]$ pwd
/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/sbin
[cloudera@localhost sbin]$ ./start-all.sh
chown: changing ownership of `/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/sbin/../logs': Operation not permitted
starting org.apache.spark.deploy.master.Master, logging to /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/sbin/../logs/spark-cloudera-org.apache.spark.deploy.master.Master-1-localhost.localdomain.out
/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/sbin/spark-daemon.sh: line 151: /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/sbin/../logs/spark-cloudera-org.apache.spark.deploy.master.Master-1-localhost.localdomain.out: Permission denied
failed to launch org.apache.spark.deploy.master.Master:
tail: cannot open `/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/sbin/../logs/spark-cloudera-org.apache.spark.deploy.master.Master-1-localhost.localdomain.out' for reading: No such file or directory
full log in /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/sbin/../logs/spark-cloudera-org.apache.spark.deploy.master.Master-1-localhost.localdomain.out
cloudera@localhost's password:
localhost: chown: changing ownership of `/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/logs': Operation not permitted
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/logs/spark-cloudera-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
localhost: /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/sbin/spark-daemon.sh: line 151: /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/logs/spark-cloudera-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out: Permission denied
localhost: …Run Code Online (Sandbox Code Playgroud) cloudera ×10
hadoop ×7
java ×3
mapreduce ×3
hbase ×2
hive ×2
kerberos ×2
apache-spark ×1
cloudera-cdh ×1
docker ×1
hadoop-yarn ×1
hue ×1
keytab ×1
odbc ×1
scala ×1
unixodbc ×1
windows-subsystem-for-linux ×1
wsl-2 ×1