小编Dav_d H的帖子

HDP 2.5 Hortonworks ambari-admin-password-reset缺失

我从hortonworks(Centos OS)下载了沙箱,然后尝试按照教程进行操作.似乎ambari-admin-password-reset命令不存在并且丢失.我也尝试用putty登录,控制台让我更改密码,所以我做了. 现在似乎命令就在那里,但是我有不同的控制台密码和一个用于同一用户的putty密码.

我试图找出为什么对于同一个用户'root'我有两个不同的密码(一个用于虚拟盒控制台,一个用于putty)我可以登录的原因.我在每个盒子上看到不同的命令.当我共享文件夹时,我只能在虚拟机控制台上看到它,而不是在putty控制台上看到它,这真的令人沮丧.

我如何强制执行我从putty看到的内容与我在虚拟框控制台中看到的内容相同.

我认为它与TTY有某种关系,但我不确定.

编辑:从虚拟机器输出运行命令:

grep "^passwd" /etc/nsswitch.conf

Run Code Online (Sandbox Code Playgroud)

OUT:passwd:files sss

grep root /etc/passwd

Run Code Online (Sandbox Code Playgroud)

OUT:rppt"x"0"0"root:/ root:/ bin/bash operator:x:11:0:operator:/ root:/ sbin/nologin

getent passwd root

Run Code Online (Sandbox Code Playgroud)

OUT:root:x:0:0:root:/ root:/ bin/bash

编辑: 我认为这是关于docker容器的.看起来机器2222端口是hdp 2.5容器的ssh端口而不是托管机器.现在我又遇到了另一个问题跑步的时候

docker exec sandbox ls

Run Code Online (Sandbox Code Playgroud)

它被卡住了.任何帮助？

谢谢你的帮助

linux centos docker hortonworks-sandbox

10
推荐指数

1
解决办法

5105
查看次数

纱线上产生火花，容器退出，退出代码为非零143

我正在使用HDP 2.5，将spark-submit作为纱线簇模式运行。

我试图使用数据框交叉连接生成数据。即

val generatedData = df1.join(df2).join(df3).join(df4)
generatedData.saveAsTable(...)....

Run Code Online (Sandbox Code Playgroud)

df1的存储级别为MEMORY_AND_DISK

df2，df3，df4存储级别为MEMORY_ONLY

df1具有更多记录，即500万条记录，而df2至df4具有最多100条记录。这样，使用BroadcastNestedLoopJoin解释计划，我的解释就会得到更好的性能。

由于某种原因，它总是失败。我不知道如何调试它以及内存在哪里爆炸。

错误日志输出：

16/12/06 19:44:08 WARN YarnAllocator: Container marked as failed: container_e33_1480922439133_0845_02_000002 on host: hdp4. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

16/12/06 19:44:08 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_e33_1480922439133_0845_02_000002 on host: hdp4. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container …

Run Code Online (Sandbox Code Playgroud)

hive hadoop-yarn hortonworks-data-platform apache-spark

7
推荐指数

2
解决办法

2万
查看次数

类型安全配置为空值

有什么方法可以加载属性值为空的.conf文件？

即

prop1 = 
prop2 =

Run Code Online (Sandbox Code Playgroud)

如果没有，那么如何处理呢？

我之所以使用该文件，是因为我不想读取文件。
我想写一个属性文件，即

def replaceProperties(sourceFile:String,map:Map[String,String])={
    import scala.collection.JavaConversions._
    import java.io.File
    val config = ConfigLoader(Some(new File(sourceFile)))
    val properties = ConfigValueFactory.fromMap(map)
    val merged = properties.withFallback(config)
    //write $merged to a file

}

Run Code Online (Sandbox Code Playgroud)

scala playframework typesafe-config

5
推荐指数

0
解决办法

1683
查看次数

Hive on Spark列出特定配置单元表的所有分区并添加分区

我正在使用spark 2.0,我想知道,是否有可能列出特定蜂巢表的所有文件？如果是这样,我可以使用spark直接逐步更新这些文件sc.textFile("file.orc").如何在hive表中添加新分区？我可以从火花中使用蜂巢状的Metast？

有没有办法获得映射数据帧的内部配置单元功能 row => partition_path

我的主要推理是表的增量更新.现在我唯一想到的方法是FULL OUTER JOINSQL + SaveMode.Overwrite,这不是那么有效,因为他会覆盖所有表,而我的主要兴趣是某些特定分区的增量更新/添加新分区

从我在HDFS上看到的编辑,当SaveMode.Overwrite spark将发出表定义即CREATE TABLE my_table .... PARTITION BY (month,..).spark将所有文件置于其下$HIVE/my_table并且不在其下$HIVE/my_table/month/...,这意味着他没有对数据进行分区.当我写的时候,我df.write.partitionBy(...).mode(Overwrite).saveAsTable("my_table")在hdfs上看到它是正确的.我用过,SaveMode.Overwrite因为我正在更新记录而不是附加数据.

我加载数据使用spark.table("my_table")这意味着火花懒惰加载表是一个问题,因为我不想加载所有表只是if的一部分.

对于这个问题:

1.由于我已经使用过partitionBy(),所以火花会改变数据,或者他会比较当前的分区,如果它相同,他就不会对数据进行混洗.

2.当从数据中改变部分时(即仅针对特定的月/年),并且应用该更改而不是加载所有数据时,是否足够聪明地使用分区修剪？(FULL OUTER JOIN基本上是扫描所有表格的操作)

hive apache-spark

4
推荐指数

2
解决办法

9294
查看次数

Spark SQL"限制"

环境:使用Hadoop的spark 1.6.Hortonworks数据平台2.5

我有一张包含100亿条记录的表格,我想获得3亿条记录并将它们移到临时表格中.

sqlContext.sql("select ....from my_table limit 300000000").repartition(50)
.write.saveAsTable("temporary_table")

Run Code Online (Sandbox Code Playgroud)

我看到Limit关键字实际上会让spark只使用一个执行器!这意味着将3亿条记录移动到一个节点并将其写回Hadoop.如何避免这种减少,但在拥有多个执行程序的情况下仍然可以获得3亿条记录.我希望所有节点都写入hadoop.

抽样可以帮助我吗？如果是这样的话？

hadoop hive hortonworks-data-platform apache-spark

3
推荐指数

1
解决办法

6918
查看次数

在无限参数上应用函数

我想用这种语法编写一个可以获得无限参数的函数

myfunc arg1 arg2 arg3 ....我尝试了一些使用curring但没有任何帮助我试图使其递归但是然后scala编译器说:"scala递归方法需要结果类型"递归:

def func(x:Int) = {
  doSomething(x); myVal:Int=>func(myVal)
}

Run Code Online (Sandbox Code Playgroud)

谢谢你的帮助

functional-programming scala

2
推荐指数

2
解决办法

320
查看次数

maven assembly-子文件夹

我想使用maven程序集插件程序集

     '<plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <executions>
                <execution>
                    <id>dist</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                    <configuration>
                        <descriptors>
                            <descriptor>src/main/dist.xml</descriptor>
                        </descriptors>
                    </configuration>
                </execution>
            </executions>
        </plugin>'

Run Code Online (Sandbox Code Playgroud)

我的描述符文件是:

<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2 
  http://maven.apache.org/xsd/assembly-1.1.2.xsd">
 <id>dist</id>
 <formats>
 <format>dir</format>
  </formats>
<files>
<file>
   <source>pom.xml</source>
   <outputDirectory>/ET</outputDirectory>
</file>

</files>

 </assembly>

Run Code Online (Sandbox Code Playgroud)

结果是文件夹层次结构是:ET-> MyProject-MySnapshot-dist - > pom.xml

我希望结果是:ET-> pom.xml

我怎么配置它？

maven maven-assembly-plugin

2
推荐指数

1
解决办法

920
查看次数

Spark SQL的哪一部分解析SQL语句并创建执行计划？

假设以下查询:

select * from my_table

Run Code Online (Sandbox Code Playgroud)

Spark的哪一部分解析sql并创建执行计划？

Spark SQL执行引擎是否有自己的sql解析器将其转换为自己的执行模型？这个怎么运作？

我得到一些异常,因为某些函数还没有支持它们,它是否意味着火花解析sql查询？彼此的执行引擎也做了吗？

apache-spark apache-spark-sql

2
推荐指数

1
解决办法

2287
查看次数

maven scalaest插件编码

我试过使用maven-scalaest-plugin并且他运行良好.问题是因为编码结果不好看.我曾尝试将它与eclipse或cmd一起使用,但结果相同.

我所看到的形象

这是一个pom:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>jp.mwsoft.sample</groupId>
<artifactId>java-scala-test</artifactId>
<version>0.0.1-SNAPSHOT</version>

<properties>
    <scala-version>2.9.2</scala-version>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<dependencies>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-compiler</artifactId>
        <version>${scala-version}</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.10</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.scalatest</groupId>
        <!-- 
        <artifactId>scalatest_${scala-version}</artifactId>
         -->
         <artifactId>scalatest_2.9.0</artifactId>
        <version>2.0.M5</version>
        <scope>test</scope>
    </dependency>
</dependencies>

<build>
    <sourceDirectory>src/main/java</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
        <plugin>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
                <source>1.6</source>
                <target>1.6</target>
            </configuration>
        </plugin>
        <plugin>
            <groupId>org.scala-tools</groupId>
            <artifactId>maven-scala-plugin</artifactId>
            <executions>
                <execution>
                    <id>test-compile</id>
                    <goals>
                        <goal>testCompile</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-surefire-plugin</artifactId>
            <version>2.7</version>
            <configuration>
                <skipTests>true</skipTests>
            </configuration>
        </plugin>
        <plugin>
            <groupId>org.scalatest</groupId>
            <artifactId>scalatest-maven-plugin</artifactId>
            <version>1.0-M2</version>
            <configuration>
                <argLine></argLine>
            </configuration>
            <executions>
                <execution>
                    <goals> …

Run Code Online (Sandbox Code Playgroud)

eclipse scala maven scalatest

0
推荐指数

1
解决办法

303
查看次数

Scala模式与mixin匹配

我想基于mixin技术增强模式匹配,例如:

trait Base {
  def match(x:Any):String
}

trait HandleAString {
  def match(x:Any):String = x match {
     case "A" => "matched A string"
  }
}

trait HandleOneInt {
  def match(x:Any):String = x match {
     case x:Int if (x==1) => "matched 1 int"
  } 
}


//main 
val handler = new Base extends HandleOneInt with HandleAString 
println(handler.match("a") ) //should print  "matched A string"
println(handler.match(1) )  //should print  "matched 1 int"
println(handler.match(2) )  //should throw exception

Run Code Online (Sandbox Code Playgroud)

如果你有任何其他技术我想听到...

0
推荐指数

1
解决办法

256
查看次数

标签统计

apache-spark ×4

hive ×3

hortonworks-data-platform ×2

apache-spark-sql ×1

functional-programming ×1

hadoop-yarn ×1

hortonworks-sandbox ×1

maven-assembly-plugin ×1

playframework ×1

typesafe-config ×1