从 Oracle 数据库表生成 .csv 文件的最快方法是什么?

Jor*_*res 6 csv oracle

我正在尝试将一些 Oracle DB 表迁移到云(Snowflake),我想知道从表创建 .csv 文件的最佳方法是什么。

我有大约 200 个表,有些表超过 30M 记录。我想要批量数据

ken*_*sai 4

所以我的场景是快速获取 300GB oracle db 的 CSV 导出并将它们存储在 S3 中进行 Spark/Hive 分析,spool 非常慢,SQL 开发人员非常慢。好吧,接下来怎么办?

\n\n

https://github.com/hyee/OpenCSV

\n\n

超级快,这里是如何使用的示例,您需要为 Oracle db 注册 odbc jar:

\n\n
package com.company;\n\nimport com.opencsv.CSVWriter;\nimport com.opencsv.ResultSetHelperService;\n\nimport java.sql.*;\n\npublic class Main {\n\n    public static void main(String[] args) throws Exception {\n\n    // write your code here\n        //step1 load the driver class\n        Class.forName("oracle.jdbc.driver.OracleDriver");\n\n//step2 create  the connection object\n        Connection con= DriverManager.getConnection(\n                "jdbc:oracle:thin:@host:port:service_name",\n                "ora_user","password");\n\n//step3 create the statement object\n        Statement stmt=con.createStatement();\n\n//step4 execute query\n        ResultSet rs=stmt.executeQuery("select c1,c2,c3 from my shitty table");\n//        while(rs.next())\n//            System.out.println(rs.getInt(1)+"  "+rs.getString(2)+"  "+rs.getString(3));\n\n//step5 close the connection object\n\n\n        String fileName = "C:\\\\Temp\\\\output.csv";\n        boolean async = true;\n\n        try (CSVWriter writer = new CSVWriter(fileName)) {\n\n            //Define fetch size(default as 30000 rows), higher to be faster performance but takes more memory\n            ResultSetHelperService.RESULT_FETCH_SIZE=50000;\n            //Define MAX extract rows, -1 means unlimited.\n            ResultSetHelperService.MAX_FETCH_ROWS=-1;\n            writer.setAsyncMode(async);\n            int result = writer.writeAll(rs, true);\n            //return result - 1;\n            System.out.println("Result: " + (result - 1));\n        }\n        con.close();\n    }\n\n    //Extract ResultSet to CSV file, auto-compress if the fileName extension is ".zip" or ".gz"\n//Returns number of records extracted\n    public static int ResultSet2CSV(final ResultSet rs, final String fileName, final String header, final boolean aync) throws Exception {\n        try (CSVWriter writer = new CSVWriter(fileName)) {\n            //Define fetch size(default as 30000 rows), higher to be faster performance but takes more memory\n            ResultSetHelperService.RESULT_FETCH_SIZE=10000;\n            //Define MAX extract rows, -1 means unlimited.\n            ResultSetHelperService.MAX_FETCH_ROWS=20000;\n            writer.setAsyncMode(aync);\n            int result = writer.writeAll(rs, true);\n            return result - 1;\n        }\n    }\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

另一个快速解决方案,但我仍然认为它比上面慢,将直接使用 Spark:

\n\n
query = "(select empno,ename,dname from emp, dept where emp.deptno = dept.deptno) emp"\nempDF = spark.read \\\n    .format("jdbc") \\\n    .option("url", "jdbc:oracle:thin:username/password@//hostname:portnumber/SID") \\\n    .option("dbtable", query) \\\n    .option("user", "db_user_name") \\\n    .option("password", "password") \\\n    .option("driver", "oracle.jdbc.driver.OracleDriver") \\\n    .load()\nempDF.printSchema()\nempDF.show()\n\n# Write to S3\nempDF.write().format(\xe2\x80\x9corc/parquet/csv.gz\xe2\x80\x9d).save(\xe2\x80\x9cs3://bucketname/key/\xe2\x80\x9d)\n
Run Code Online (Sandbox Code Playgroud)\n\n

当然你可以重新分区,并做一些其他的优化。

\n