我想使用 Spark Streaming 从 Kafka 检索数据。现在,我想将数据保存在远程 HDFS 中。我知道我必须使用函数 saveAsText。但是,我不知道如何准确指定路径。
如果我写这个是正确的:
myDStream.foreachRDD(frm->{
frm.saveAsTextFile("hdfs://ip_addr:9000//home/hadoop/datanode/myNewFolder");
});
Run Code Online (Sandbox Code Playgroud)
ip_addr我的 hdfs 远程服务器的 IP 地址在哪里。
/home/hadoop/datanode/是我安装hadoop时创建的DataNode HDFS目录(不知道要不要指定这个目录)。而且,
myNewFolder是我要保存数据的文件夹。
提前致谢。
亚西尔
我正在运行以下脚本:
from __future__ import print_function
import paramiko
import boto3
#print('Loading function')
paramiko.util.log_to_file("/tmp/Dawny.log")
# List of EC2 variables
region = 'us-east-1'
image = 'ami-<>'
keyname = '<>.pem'
ec2 = boto3.resource('ec2')
instances = ec2.create_instances(ImageId=image, MinCount=1, MaxCount=1, InstanceType = 't2.micro', KeyName=keyname)
instance = instances[0]
instance.wait_until_running()
instance.load()
print(instance.public_dns_name)
def lambda_handler(event, context):
instances = ec2.create_instances(ImageId=image, MinCount=1, MaxCount=1, InstanceType = 't2.micro', KeyName=keyname)
instance = instances[0]
instance.wait_until_running()
instance.load()
print(instance.public_dns_name)
Run Code Online (Sandbox Code Playgroud)
当我运行它时,我收到此错误
botocore.exceptions.ClientError: An error occurred (InvalidKeyPair.NotFound) when calling the RunInstances operation: The key pair '<>.pem' does not exist …Run Code Online (Sandbox Code Playgroud) 我有一个 S3 存储桶
aws s3 ls s3://myBucket/
PRE 2020032600/
PRE 2020032700/
PRE 2020032800/
PRE results_2020011200/
PRE results_2020011300/
PRE results_2020011400/
PRE results_2020011500/
Run Code Online (Sandbox Code Playgroud)
我只想在本地复制以以下开头的文件夹results_
aws s3 cp s3://myBucket/*something /Users/myName/myFolder/ --recursive
Run Code Online (Sandbox Code Playgroud) 我在执行 Jenkins 的 ansible-playbook 时遇到问题,
喜欢 :
PLAY [centos-slave-02] *********************************************************
TASK [Gathering Facts] *********************************************************
fatal: [centos-slave-02]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Host key verification failed.", "unreachable": true}
PLAY RECAP *********************************************************************
centos-slave-02 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
Run Code Online (Sandbox Code Playgroud)
但我能够得到乒乓响应,每次都要求
Matching host key in /var/jenkins_home/.ssh/known_hosts:5 :
Run Code Online (Sandbox Code Playgroud)
jenkins@c11582cb5024:~/jenkins-ansible$ ansible -i hosts -m ping centos-slave-02
Warning: the ECDSA host key for 'centos-slave-02' differs from the key for the IP address '172.19.0.3'
Offending key for …Run Code Online (Sandbox Code Playgroud) 我已经安装了zookeeper 3.4.9版本
我的zoo.cfg文件配置如下
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper/
clientPort=2181
DataLogDir=/usr/local/log/
server.1=hadoop-master:2888:3888
server.2=hadoop-slave-1:2889:3889
server.3=hadoop-slave-2:2890:3890
Run Code Online (Sandbox Code Playgroud)
当然我已经在三个节点中启动了myid文件/usr/local/zookeeper/data/myid
它包含hadoop-master服务器中的值1和hadoop-slave-1中的值2
和hadoop-slave-2中的3个
我的奴隶文件如下
hadoop-slave-1
hadoop-slave-2
hadoop-master
Run Code Online (Sandbox Code Playgroud)
我已经./zkServer.sh start在所有三个节点中发出了命令,它给了我输出
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper-3.4.9/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
Run Code Online (Sandbox Code Playgroud)
但发出命令 ./zkServer.sh status
给了我输出
Using config: /usr/local/zookeeper-3.4.9/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
Run Code Online (Sandbox Code Playgroud)
为什么我得到这个输出?
并且jps命令不会查看
QuorumPeerMain
Run Code Online (Sandbox Code Playgroud)
我的.bashrc档案是
export JAVA_HOME=/usr/local/java-1.7.0/
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/usr/local/hadoop/
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS …Run Code Online (Sandbox Code Playgroud) 问题:我想列出除那些未处于Completed状态的 Pod 之外的所有 Pod。
当命令Completed中的“选择”未完成时,字段选择器仍同相输出 Podkubectl
建议的解决方案:观察我也在Completed同步获取 Pod。
kubectl get pods --all-namespaces --field-selector=spec.nodeName=node1,status.phase!=Completed --no-headers
Run Code Online (Sandbox Code Playgroud)
test-2d7dbabf-f8cc-4c0b-af3b-80db52d2257e nightly-2020-04-01-13-00-tokyo-demo-route-gt-4245049053 2/2 Running 0 13m
test-2d7dbabf-f8cc-4c0b-af3b-80db52d2257e nightly-2020-04-01-13-00-willows-405-a2a-1380625152 0/2 Completed 0 3h31m
test-2d7dbabf-f8cc-4c0b-af3b-80db52d2257e nightly-2020-04-01-13-00-willows-405-a2a-1464250510 0/2 Completed 0 7h33m
Run Code Online (Sandbox Code Playgroud)
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.2", GitCommit:"c97fe5036ef3df2967d086711e6c0c405941e14b", GitTreeState:"clean", BuildDate:"2019-10-15T19:18:23Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.5", GitCommit:"2166946f41b36dea2c4626f90a77706f426cdea2", GitTreeState:"clean", BuildDate:"2019-03-25T15:19:22Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Run Code Online (Sandbox Code Playgroud)
但是当我这样做时Running它有效并仅列出已完成的阶段吊舱
kubectl get pods --all-namespaces --field-selector=spec.nodeName=node1,status.phase!=Running --no-headers
Run Code Online (Sandbox Code Playgroud) 我正在编写一些 python 脚本,我正在尝试使用 boto3 将文件上传到亚马逊云。问题是我想将文件上传到特定的子文件夹...在某些情况下,我需要将文件上传到子文件夹的子文件夹。
我正在尝试这样做:
s3.meta.client.upload_file( "C:\\Users...\\folder1" + "\\" + someFile.txt, "folder/subfolder1/subfolder2", someFile.txt)
Run Code Online (Sandbox Code Playgroud)
我收到以下错误消息:
Invalid bucket name "...": Bucket name must match the regex "^[a-zA-Z0-9. \-]{1,255}$"
Run Code Online (Sandbox Code Playgroud)
如果我只做文件夹它会起作用,但如果我尝试做folder/subfolder1/subfolder2.
我试图通过文档理解它,但不能。有人可以向我解释一下吗?
谢谢
我正在使用AWSLambdaClient.createFunction()。已弃用。
请让我知道,这有什么替代方法。
AWSLambdaClient lclient = new AWSLambdaClient(Credentials);
..
..
lclient.createFunction(request);
Run Code Online (Sandbox Code Playgroud) 我需要删除其中包含一些对象的 S3 存储桶:
aws s3 rb --force s3://ansible.prod-us-east
remove_bucket failed: s3://ansible.prod-us-east An error occurred (BucketNotEmpty) when calling the DeleteBucket operation: The bucket you tried to delete is not empty. You must delete all versions in the bucket.
Run Code Online (Sandbox Code Playgroud)
我也尝试过这个:
aws s3api delete-bucket --bucket "ansible.prod-us-east" --region "us-east-1"
An error occurred (BucketNotEmpty) when calling the DeleteBucket operation: The bucket you tried to delete is not empty. You must delete all versions in the bucket.
Run Code Online (Sandbox Code Playgroud)
错误表明存储桶已满。但是当我在命令行上列出它或查看控制台中的存储桶时,存储桶已经是空的。
当我尝试从控制台删除存储桶时,出现同样的错误。存储桶是空的,但错误表明存储桶已满,我无法删除该存储桶。
我怎样才能完成这件事?
amazon-s3 ×3
aws-cli ×2
boto3 ×2
hadoop ×2
python ×2
amazon-ec2 ×1
ansible ×1
apache-spark ×1
aws-lambda ×1
aws-sdk ×1
centos ×1
hdfs ×1
java ×1
jenkins ×1
kubectl ×1
kubernetes ×1
ssh ×1
ssh-keys ×1