Abe*_*Abe 6 apache-spark apache-spark-sql pyspark
我有一个外部数据库的SparkSQL连接:
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.getOrCreate()
Run Code Online (Sandbox Code Playgroud)
如果我知道表的名称,则很容易查询.
users_df = spark \
.read.format("jdbc") \
.options(dbtable="users", **db_config) \
.load()
Run Code Online (Sandbox Code Playgroud)
但有没有一种列出/发现表的好方法?
我希望SHOW TABLES在mysql或\dtpostgres中等效.
我正在使用pyspark v2.1,以防万一.
这个问题的答案实际上不是特定于火花的.你只需要加载information_schema.tables.
信息模式由一组视图组成,这些视图包含有关当前数据库中定义的对象的信息.信息模式在SQL标准中定义,因此可以预期是可移植的并且保持稳定 - 与系统目录不同,系统目录特定于RDBMS并且在实现问题之后建模.
我将使用MySQL作为我的代码片段,其中包含enwiki我要列出表的数据库:
# read the information schema table
spark.read.format('jdbc'). \
options(
url='jdbc:mysql://localhost:3306/', # database url (local, remote)
dbtable='information_schema.tables',
user='root',
password='root',
driver='com.mysql.jdbc.Driver'). \
load(). \
filter("table_schema = 'enwiki'"). \ # filter on specific database.
show()
# +-------------+------------+----------+----------+------+-------+----------+----------+--------------+-----------+---------------+------------+----------+--------------+--------------------+-----------+----------+---------------+--------+--------------+-------------+
# |TABLE_CATALOG|TABLE_SCHEMA|TABLE_NAME|TABLE_TYPE|ENGINE|VERSION|ROW_FORMAT|TABLE_ROWS|AVG_ROW_LENGTH|DATA_LENGTH|MAX_DATA_LENGTH|INDEX_LENGTH| DATA_FREE|AUTO_INCREMENT| CREATE_TIME|UPDATE_TIME|CHECK_TIME|TABLE_COLLATION|CHECKSUM|CREATE_OPTIONS|TABLE_COMMENT|
# +-------------+------------+----------+----------+------+-------+----------+----------+--------------+-----------+---------------+------------+----------+--------------+--------------------+-----------+----------+---------------+--------+--------------+-------------+
# | def| enwiki| page|BASE TABLE|InnoDB| 10| Compact| 7155190| 115| 828375040| 0| 975601664|1965031424| 11359093|2017-01-23 08:42:...| null| null| binary| null| | |
# +-------------+------------+----------+----------+------+-------+----------+----------+--------------+-----------+---------------+------------+----------+--------------+--------------------+-----------+----------+---------------+--------+--------------+-------------+
Run Code Online (Sandbox Code Playgroud)
注意:此解决方案可以应用于受尊重语言约束的scala和java.
| 归档时间: |
|
| 查看次数: |
11593 次 |
| 最近记录: |