小编ear*_*arl的帖子

如何用HDFS中存在的文件中的另一个文本替换文本

我在UNIX文件系统中有file.txt.其内容如下:

{abc}]}
{pqr}]}

Run Code Online (Sandbox Code Playgroud)

我想将此file.txt转换为:

[
{abc}]},
{pqr}]}
]

Run Code Online (Sandbox Code Playgroud)

我可以使用下面的shell脚本来做到这一点:

sed -i 's/}]}/}]},/g' file.txt
sed -i '1i [' file.txt
sed -i '$ s/}]},/}]}]/g' file.txt

Run Code Online (Sandbox Code Playgroud)

我的问题是如果这个文件存在于/测试位置的HDFS上.

如果我使用: sed -i 's/}]}/}]},/g' /test/file.txt

它会查看unix partition/test并说文件不存在.

如果我使用: sed -i 's/}]}/}]},/g' | hadoop fs -cat /test/file.txt

它说----- sed:没有输入文件,然后根据cat命令打印file.txt的内容.

如果我使用 hadoop fs -cat /test/file.txt | sed -i 's/}]}/}]},/g'

它说---- sed:没有输入文件cat:无法写入输出流

那么,我该如何用其他字符串替换HDFS文件中的字符串呢？

sed hdfs

ear*_*arl

lucky-day

9
推荐指数

1
解决办法

4966
查看次数

hadoop namenode -format 命令查询

在执行“hadoop namenode -format”时，出现以下消息。

Re-format filesystem in Storage Directory /opt/data/temp/dfs/name ? (Y or N)

Run Code Online (Sandbox Code Playgroud)

在这里应该给予什么？“Y”或“N”。

如果给Y，它会丢失HDFS中的数据吗？

hadoop hdfs namenode

ear*_*arl

lucky-day

5
推荐指数

1
解决办法

5720
查看次数

加载到HIVE表时忽略CSV文件中的引号

我有一个csv文件，其中的数据格式如下：

"SomeName1",25,"SomeString1"
"SomeName2",26,"SomeString2"
"SomeName3",27,"SomeString3"

Run Code Online (Sandbox Code Playgroud)

我正在将此CSV加载到配置单元表中。在表中，第1列和第3列与我不需要的引号一起插入。我希望第1 SomeName1列是第3列SomeString1

我尝试过

WITH SERDEPROPERTIES (
   "separatorChar" = "\t",
   "quoteChar"     = "\""
)

Run Code Online (Sandbox Code Playgroud)

但它不起作用，并保留“”。

这里应该采取什么方法？

表创建语句：

CREATE TABLE `abcdefgh`(
  `name` string COMMENT 'from deserializer',
  `age` string COMMENT 'from deserializer',
  `value` string COMMENT 'from deserializer')
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
  'quoteChar'='\"',
  'separatorChar'='\t')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://a-b-c-d-e:9000/user/hive/warehouse/abcdefgh'
TBLPROPERTIES (
  'numFiles'='1',
  'numRows'='0',
  'rawDataSize'='0',
  'totalSize'='3134916',
  'transient_lastDdlTime'='1490713221')

Run Code Online (Sandbox Code Playgroud)

hadoop hive

ear*_*arl

2017 03-28

4
推荐指数

1
解决办法

1420
查看次数