小编cod*_*din的帖子

了解火花物理计划

我试图理解火花的物理计划,但我不理解某些部分,因为它们看起来与传统的rdbms不同.例如,在下面的这个计划中,它是关于对hive表的查询的计划.查询是这样的:

select
        l_returnflag,
        l_linestatus,
        sum(l_quantity) as sum_qty,
        sum(l_extendedprice) as sum_base_price,
        sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
        avg(l_quantity) as avg_qty,
        avg(l_extendedprice) as avg_price,
        avg(l_discount) as avg_disc,
        count(*) as count_order
    from
        lineitem
    where
        l_shipdate <= '1998-09-16'
    group by
        l_returnflag,
        l_linestatus
    order by
        l_returnflag,
        l_linestatus;


== Physical Plan ==
Sort [l_returnflag#35 ASC,l_linestatus#36 ASC], true, 0
+- ConvertToUnsafe
   +- Exchange rangepartitioning(l_returnflag#35 ASC,l_linestatus#36 ASC,200), None
      +- ConvertToSafe
         +- TungstenAggregate(key=[l_returnflag#35,l_linestatus#36], functions=[(sum(l_quantity#31),mode=Final,isDistinct=false),(sum(l_extendedpr#32),mode=Final,isDistinct=false),(sum((l_extendedprice#32 * (1.0 - …

Run Code Online (Sandbox Code Playgroud)

sql catalyst query-optimization apache-spark apache-spark-sql

cod*_*din

2016 05-29

20
推荐指数

1
解决办法

9157
查看次数

无法获取数据库默认返回 NoSuchObjectException

当我启动 Spark 时，我收到以下警告：

Using Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
16/04/03 15:07:31 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/04/03 15:07:31 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/04/03 15:07:39 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0 …

Run Code Online (Sandbox Code Playgroud)

hive scala apache-spark apache-spark-sql

cod*_*din

2016 04-03

5
推荐指数

1
解决办法

2万
查看次数

./spark-shell无法正常启动(spark1.6.1-bin.hadoop2.6版本)

我安装了这个火花版:spark-1.6.1-bin-hadoop2.6.tgz.

现在当我用./spark-shell命令启动spark时我得到了这个问题(它显示了很多错误行,所以我只是把一些看起来很重要)

     Cleanup action completed
        16/03/27 00:19:35 ERROR Schema: Failed initialising database.
        Failed to create database 'metastore_db', see the next exception for details.
        org.datanucleus.exceptions.NucleusDataStoreException: Failed to create database 'metastore_db', see the next exception for details.
            at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:516)

        Caused by: java.sql.SQLException: Directory /usr/local/spark-1.6.1-bin-hadoop2.6/bin/metastore_db cannot be created.
            org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
            ... 128 more
        Caused by: ERROR XBM0H: Directory /usr/local/spark-1.6.1-bin-hadoop2.6/bin/metastore_db cannot be created.


        Nested Throwables StackTrace:
        java.sql.SQLException: Failed to create database 'metastore_db', see the next exception for details.
  org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
            ... 128 …

Run Code Online (Sandbox Code Playgroud)

apache-spark

cod*_*din

2016 03-30

3
推荐指数

1
解决办法

6705
查看次数

火花查询：未封闭的字符文字

我使用HiveQL用spark执行此查询：

var hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

result = hiveContext.sql("select linestatus, sum(quantity) as sum_qty,count(*) as count_order from lineitem
where shipdate <= '1990-09-16' group by linestatus order by
linestatus")

Run Code Online (Sandbox Code Playgroud)

但是我得到这个错误：

<console>:1: error: unclosed character literal
       where shipdate <= '1990-09-16' group by linestatus order by

Run Code Online (Sandbox Code Playgroud)

你知道为什么吗？

scala hiveql apache-spark

cod*_*din

lucky-day

2
推荐指数

1
解决办法

7795
查看次数

创建配置单元表：没有文件匹配路径文件...但文件存在于路径中

我正在尝试使用存储在 hdfs 中的文件创建 hive orc 表。

我有一个表“partsupp.tbl”文件，其中每一行的格式如下：

 1|25002|8076|993.49|ven ideas. quickly even packages print. pending multipliers must have to are fluff|

Run Code Online (Sandbox Code Playgroud)

我创建了一个这样的配置单元表：

create table if not exists partsupp (PS_PARTKEY BIGINT,
 PS_SUPPKEY BIGINT,
 PS_AVAILQTY INT,
 PS_SUPPLYCOST DOUBLE,
 PS_COMMENT STRING)
STORED AS ORC TBLPROPERTIES ("orc.compress"="SNAPPY")
;

Run Code Online (Sandbox Code Playgroud)

现在我试图加载表中 .tbl 文件中的数据，如下所示：

LOAD DATA  LOCAL INPATH '/tables/partsupp/partsupp.tbl' INTO TABLE partsupp

Run Code Online (Sandbox Code Playgroud)

但我遇到了这个问题：

No files matching path file:/tables/partsupp/partsupp.tbl

Run Code Online (Sandbox Code Playgroud)

但是文件存在于hdfs中...

hadoop hive

cod*_*din

lucky-day

2
推荐指数

1
解决办法

2977
查看次数

标签统计

apache-spark ×4

apache-spark-sql ×2

hive ×2

scala ×2

catalyst ×1

hadoop ×1

hiveql ×1

query-optimization ×1

sql ×1

了解火花物理计划

无法获取数据库默认返回 NoSuchObjectException

./spark-shell无法正常启动(spark1.6.1-bin.hadoop2.6版本)

火花查询：未封闭的字符文字

创建配置单元表：没有文件匹配路径文件...但文件存在于路径中

标签 统计

小编cod_din的帖子

标签统计