Dea*_*ark 5 java oracle etl oracle12c
我们的一个 ETL 应用程序遇到了一个奇怪的问题。实际上,该过程打开一个游标以从一个数据库中提取数据,执行一些转换,然后使用批量插入插入到另一个数据库中。
对于 ETL 中的所有表,我们的提交间隔设置为 1000 行。因此,在读取 1k 行的每个块并执行转换后,我们对目标数据库执行一次批量插入(使用 Java、Spring Batch、OJDBC7 v12.1.0.2)。
然而,有些表的速度非常慢。我们首先确保 FK 已关闭(确实如此)。然后我们检查以确保触发器已被禁用(它们是)。我们添加了日志记录来获取每个批量插入中的行(除了每个线程的最终插入之外,它是 1000 行)。
最后,查询v$sql,对于某些表,我们看到每次执行接近 1000 行,这正是我们所期望的。然而,对于痛苦的表,它通常徘徊在接近一个!我们预计大多数表的数据量都在 900 左右,因为线程的最终提交可能没有完整的 1k 行,但某些表上每次执行的异常低的行数确实令人头疼。
一些宽表(100+列)有问题,但其他的则没问题。一些高度分区(100+分区)的表很慢,但其他表则很好。所以我很困惑。有没有人见过这个?我的想法已经用完了!
谢谢!
这是我们看到的内容v$sql(表名被混淆):
SELECT *
FROM
(SELECT REGEXP_SUBSTR (sql_text, 'Insert into [^\(]*') sql_text,
sql_id,
TRUNC(
CASE
WHEN SUM (executions) > 0
THEN SUM (rows_processed) / SUM (executions)
END,2) rows_per_execution
FROM v$sql
WHERE parsing_schema_name = 'PFT010_RAPP_PRO'
AND sql_text LIKE 'Insert into%'
GROUP BY sql_text,
sql_id
)
ORDER BY rows_per_execution ASC;
SQL_STATEMENT SQL_ID ROWS_PER_EXECUTION
---------------------------------------------------------------------------------
Insert into C__PFT010.S_T___V_R_L_A_ agwu1dd1wr2ux 1.04
Insert into C__PFT010.S_T___G_L_A___T_ 7ymw7jtdd9g53 1.25
Insert into C__PFT010.S_T___F_L_A_ 7cynt9fmtpz83 1.44
Insert into C__PFT010.S_T___Q_L_A___A_ 27v3fuj028cy6 1.57
Insert into C__PFT010.S_T___E_R_P_Y_A_P_S_A_ 2t544j11a286z 1.80
Insert into C__PFT010.S_T___I_S_R_ anu8aac070sut 1.84
Insert into C__PFT010.S_T___R_C_R___T_T_ 0ydz33s6guvcn 2.05
Insert into C__PFT010.S_T_R___D_R_P_Y_A_P_ 7y76r10dmzqvh 2.14
Insert into C__PFT010.S_T___S_L_A___Y_T_S_S_ d7136fg9w033w 2.25
Insert into C__PFT010.S_T___R_C_R___T_T_ 2pswt3cmp48s4 2.31
Insert into C__PFT010.S_T___F_R_P_Y_A_P_S_P_ 170c7v23yyrms 2.46
Insert into C__PFT010.C_M_N_C___R_S_ fw3wbt4p08kx4 2.66
Insert into C__PFT010.T_A_H_N_T___E_A_Y_ dk5rwm58qqy8b 2.68
Insert into C__PFT010.O_G_L_A___N_O_ gtd4azc32gku4 3.05
Insert into C__PFT010.N_L_S_D___I_B_S_G_ a1a01vthwf2yk 3.15
Insert into C__PFT010.S_T___Q_L_A___A_ 7ac6dqwb1jfyh 3.56
Insert into C__PFT010.S_T___J_P_M___A_A_ 8n5z68bgkuan1 3.88
Insert into C__PFT010.S_T_R___F_R_P_Y_A_P_S_P 1r62s9qgjucy8 4.25
Insert into C__PFT010.L_A___W_E_S_I_ 19rxcmgvct74c 4.28
Insert into C__PFT010.C___U___T_D_T_P_ fdzfdbpdzd18c 4.40
Insert into C__PFT010.S_T_R___U_T_A_S_E_ gs6z5szk9x1n2 4.61
Insert into C__PFT010.S_T_R___H_S_B_I_Y_L_S_ 0zsz69pa3ahga 6.58
Insert into C__PFT010.C___F___U_R_P_T_ 13xgutdszxab1 8.00
Insert into C__PFT010.S_T_R___J_P_M___A_A_ 355gqx1sspdr0 20.19
Insert into C__PFT010.C___D___O___V_ 4dmu2bqrra0fg 22.40
Insert into C__PFT010.S_T_R___Q_L_A___A_ dsx0nsrxkz5cf 36.14
Insert into C__PFT010.S_T___V_R_L_A___E_R_ 2urs0mbjn3nm2 126.96
Insert into C__PFT010.S_S_C_S___E_A_L_S_G_ awq4fzkk3rsww 179.48
Insert into C__PFT010.S_S_D_S___C_I_I_Y_S_G_ 7hpw0kv2z5nsh 417.87
Insert into C__PFT010.S_T_R___D_P_S___M_I_ cjgdmgfznapdk 502.36
Insert into C__PFT010.C___F___E_ 6hv4smzmm4hx8 531.00
Insert into C__PFT010.N_L_S_E_R___R_ 61zu9j25kgn2u 533.50
Insert into C__PFT010.S_T___B_P_S___A_T_R_ 31xpaj7afk054 714.94
Insert into C__PFT010.S_T_R___C_L_A___O_G_V_ dx4mna12hdh9c 749.66
Insert into C__PFT010.S_T___C_P_S___D_R_S_ b7z4y1mruk714 784.56
Insert into C__PFT010.S_T___S_L_A___Y_T_S_S_ 29qbqkzhmt83h 792.63
Insert into C__PFT010.A_H_C_R_T_ c6kmyt3a410ch 801.67
Insert into C__PFT010.S_T___X_P_S___H_N_ g6cbtus4bccm8 826.19
Insert into C__PFT010.S_T___K_R_B_T___T_T_ 0xps4ddmw322h 873.36
Insert into C__PFT010.C___O___C_L___M_ fz91ju8jw22yc 928.90
Insert into C__PFT010.S_T___H_L_A___T_T_ 44rh8722j51fm 982.16
Insert into C__PFT010.C___C_L_S_C_R_T_ 4vpnstj8qxy80 991.75
Insert into C__PFT010.S_T___P_L_A___E_U_D_ fgunfbpddf2af 994.50
Insert into C__PFT010.S_T___A_S___I___O_S_ 0d0x5ymp2y248 996.09
Insert into C__PFT010.S_T___K_R_B_T___T_T_ 61rmgzvqrbudh 999.25
Insert into C__PFT010.S_T___D_P_S___M_I_ bu3hc03yugc8h 999.88
Insert into C__PFT010.L___R_E___E_L_R_C_P_2_00 bvrxzq2v3npc6 999.91
Insert into C__PFT010.N_L_S_G_A_T_S_N___R_C 7sj2ydm7m2z6u 999.96
Insert into C__PFT010.S_T___V_R_L_A___E_E_E 8n6nbsjfpvu70 999.98
Insert into C__PFT010.S_T___L_I_T_B_N_F_T 5b89j9um2jkuu 999.98
Insert into C__PFT010.S_T___D_P_S___M_I_ 906jnw4jarsxk 999.98
Insert into C__PFT010.S_T___T_E_R_M_T 9a8vnhnbp5jpn 1000.00
Run Code Online (Sandbox Code Playgroud)
更新:此时数据有点过时(所有快速线程都已完成),但这里有一些 SQL ID、执行计数和行/执行的计数。所有这些表都有(或将有)数千万行
SQL ID Executions Rows/Execution
agwu1dd1wr2ux 118043 1.04
anu8aac070sut 194768 1.84
dr8qxkcx1xybj 11635084 1.85
a37vqfjqcyd3j 4939754 2.36
8n5z68bgkuan1 2642091 3.95
4sps6y4bkkr6p 268739 13.77
5tdhpn96vpz6d 240227 166.85
Run Code Online (Sandbox Code Playgroud)
其他 SQL 跟踪数据...:
这是一个运行良好的表格插入件
PARSING IN CURSOR #139935830739792 len=315 dep=0 uid=845 oct=2 lid=845 tim=2116001679604 hv=581690290 ad='c168de130' sqlid='906jnw4jarsxk'
Insert into ___PFT010.S_T__A__P_S__E_A_L
(A_A_D_ID, CREATE_DTM, DOC_TXN_ID, EFF_DT, EFF_END_DT, EFF_START_DT, EXTRACT_DT, G_O__A_D__F_G, MAINT_DTM, MAINT_USERID, P_R_O__E_A_L, P_R_O__R_L_, R_C_RD_T_P, S__ID, S_R_I_E__ID )
values (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 )
END OF STMT
PARSE #139935830739792:c=0,e=25,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=2116001679603
WAIT #139935830739792: nam='SQL*Net more data from client' ela= 72 driver id=675562835 #bytes=3 p3=0 obj#=-1 tim=2116001679871
WAIT #139935830739792: nam='db file sequential read' ela= 551 file#=99 block#=78343664 blocks=1 obj#=1255124 tim=2116001680643
* * * * * * * * * * * * * * * * * *
* * * a bunch more of these
* * * * * * * * * * * * * * * * * *
WAIT #139935830739792: nam='db file sequential read' ela= 750 file#=99 block#=66416561 blocks=1 obj#=1255124 tim=2116001788121
WAIT #139935830739792: nam='db file sequential read' ela= 176 file#=99 block#=45513746 blocks=1 obj#=1255124 tim=2116001787117
WAIT #139935830739792: nam='db file sequential read' ela= 750 file#=99 block#=66416561 blocks=1 obj#=1255124 tim=2116001788121
* * * * * * * * * * * * * * * * * *
* * * r=1000, indicating 1000 rows were written
* * * * * * * * * * * * * * * * * *
EXEC #139935830739792:c=57991,e=109295,p=131,cr=69,cu=3313,mis=0,r=1000,dep=0,og=1,plh=0,tim=2116001788944
STAT #139935830739792 id=1 cnt=0 pid=0 pos=1 obj=0 op='LOAD TABLE CONVENTIONAL SAT1_AD_PRSN_EMAIL (cr=69 pr=131 pw=0 time=109260 us)'
XCTEND rlbk=0, rd_only=0, tim=2116001789025
CLOSE #139935830739792:c=0,e=12,dep=0,type=1,tim=2116016169474
Run Code Online (Sandbox Code Playgroud)
这是一个令人烦恼的问题。这次,执行中只得到 1 行
PARSING IN CURSOR #139935830737584 len=520 dep=0 uid=845 oct=2 lid=845 tim=2116016176184 hv=1904916192 ad='97e96dc98' sqlid='355gqx1sspdr0'
Insert into ___PFT010.S_TE_R_BJ_P_M__D_T_
(A_A_D_ID, CREATE_DTM, DOC_TXN_ID, EFF_END_DT, EFF_START_DT, ERR_CD, ERR_FIELD, EXTRACT_DT, MAINT_USERID, P_M__A_R_I_T_A_T, P_M__A_T, P_M__C_P_I_T_A_T, P_M__C_T_H_P_AMT, P_M__E_F_DT, P_M__N_G_A_R__A_T, P_M__N_N_C_P_I_T_A_T, P_M__O_T_F_E_A_T, P_M__P_I_B_L_A_T, P_M__T_P, R_C_RD_T_P, S__ID, S_R_I_E__ID, T_A_S_I__D_, Z_R__P_M__I_D )
values (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 , :20 , :21 , :22 , :23 , :24 )
END OF STMT
PARSE #139935830737584:c=0,e=62,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=0,tim=2116016176183
PARSE #139935830738688:c=0,e=14,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=140787661,tim=2116016176703
EXEC #139935830738688:c=0,e=49,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=140787661,tim=2116016176780
FETCH #139935830738688:c=0,e=38,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=4,plh=140787661,tim=2116016176837
CLOSE #139935830738688:c=0,e=4,dep=1,type=3,tim=2116016176862
* * * * * * * * * * * * * * * * * *
* * * r=1, indicating only 1 row affected by execution
* * * * * * * * * * * * * * * * * *
EXEC #139935830737584:c=999,e=1065,p=0,cr=4,cu=5,mis=1,r=1,dep=0,og=1,plh=0,tim=2116016177301
STAT #139935830737584 id=1 cnt=0 pid=0 pos=1 obj=0 op='LOAD TABLE CONVENTIONAL SATERR_BJ_PYMT_DATA (cr=1 pr=0 pw=0 time=50 us)'
XCTEND rlbk=0, rd_only=0, tim=2116016177362
WAIT #139935830737584: nam='log file sync' ela= 396 buffer#=92400 sync scn=2454467328 p3=0 obj#=-1 tim=2116016177846
WAIT #139935830737584: nam='SQL*Net message to client' ela= 0 driver id=675562835 #bytes=1 p3=0 obj#=-1 tim=2116016177877
WAIT #139935830737584: nam='SQL*Net message from client' ela= 1045 driver id=675562835 #bytes=1 p3=0 obj#=-1 tim=2116016178938
CLOSE #139935830737584:c=0,e=4,dep=0,type=0,tim=2116016178981
Run Code Online (Sandbox Code Playgroud)
这是同一张表,有 34 行而不是 1 行。事实上,它不一致是最让我烦恼的
PARSING IN CURSOR #139935830737584 len=520 dep=0 uid=845 oct=2 lid=845 tim=2116016169849 hv=1904916192 ad='97e96dc98' sqlid='355gqx1sspdr0'
Insert into ___PFT010.S_TE_R_BJ_P_M__D_T_
(A_A_D_ID, CREATE_DTM, DOC_TXN_ID, EFF_END_DT, EFF_START_DT, ERR_CD, ERR_FIELD, EXTRACT_DT, MAINT_USERID, P_M__A_R_I_T_A_T, P_M__A_T, P_M__C_P_I_T_A_T, P_M__C_T_H_P_AMT, P_M__E_F_DT, P_M__N_G_A_R__A_T, P_M__N_N_C_P_I_T_A_T, P_M__O_T_F_E_A_T, P_M__P_I_B_L_A_T, P_M__T_P, R_C_RD_T_P, S__ID, S_R_I_E__ID, T_A_S_I__D_, Z_R__P_M__I_D )
values (:1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 , :20 , :21 , :22 , :23 , :24 )
END OF STMT
PARSE #139935830737584:c=0,e=326,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=0,tim=2116016169848
PARSE #139935830738688:c=0,e=19,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=140787661,tim=2116016170242
EXEC #139935830738688:c=0,e=59,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,plh=140787661,tim=2116016170329
FETCH #139935830738688:c=0,e=44,p=0,cr=3,cu=0,mis=0,r=0,dep=1,og=4,plh=140787661,tim=2116016170393
CLOSE #139935830738688:c=0,e=3,dep=1,type=3,tim=2116016170421
* * * * * * * * * * * * * * * * * *
* * * r=34, indicating only 34 row affected by execution. WHAT IS HAPPENING?!?!
* * * * * * * * * * * * * * * * * *
EXEC #139935830737584:c=5000,e=4592,p=0,cr=11,cu=48,mis=1,r=34,dep=0,og=1,plh=0,tim=2116016174513
STAT #139935830737584 id=1 cnt=0 pid=0 pos=1 obj=0 op='LOAD TABLE CONVENTIONAL SATERR_BJ_PYMT_DATA (cr=8 pr=0 pw=0 time=3648 us)'
XCTEND rlbk=0, rd_only=0, tim=2116016174622
WAIT #139935830737584: nam='log file sync' ela= 684 buffer#=92313 sync scn=2454467326 p3=0 obj#=-1 tim=2116016175452
WAIT #139935830737584: nam='SQL*Net message to client' ela= 1 driver id=675562835 #bytes=1 p3=0 obj#=-1 tim=2116016175551
WAIT #139935830737584: nam='SQL*Net message from client' ela= 481 driver id=675562835 #bytes=1 p3=0 obj#=-1 tim=2116016176058
CLOSE #139935830737584:c=0,e=6,dep=0,type=0,tim=2116016176107
Run Code Online (Sandbox Code Playgroud)
好吧,这是一个有趣的问题,不幸的是这个答案只能解决 99% 的问题......
首先,我们通过查看绑定变量来确定我们绑定的参数类型正在翻转,每次发生这种情况时,我们都会执行先前的语句并解析一个新的语句(尽管只executeBatch()从我们的 发出一个命令PreparedStatement)。因此,我们最终在跟踪日志中看到了这一点:
Row # Bind :1 Bind :2 Bind :3 Bind :4 Bind :5
----- -------------- -------------- -------------- -------------- --------------
--parse--
1 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
2 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
3 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
4 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
--execute & parse--
5 VARCHAR2(128) VARCHAR2(32) TIMESTAMP CLOB VARCHAR2(2000)
--execute & parse--
6 VARCHAR2(128) NUMBER TIMESTAMP VARCHAR2(32) VARCHAR2(2000)
--execute & parse--
7 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
8 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
9 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
10 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
--execute & parse--
11 VARCHAR2(2000) NUMBER VARCHAR2(32) CLOB VARCHAR2(2000)
--execute & parse--
12 VARCHAR2(2000) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
13 VARCHAR2(2000) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
--execute--
Run Code Online (Sandbox Code Playgroud)
经过进一步挖掘,我们确定 JDBC 无法像处理null非空值那样自动确定对象的数据类型。当列保持一致(始终为空或始终填充)时,这不是问题,但当数据存在变化时,那就很残酷了。
由于我们是从文件加载,所以我们没有源数据类型,但幸运的是,我们确实能够获取目标数据类型(应该匹配),因此我们能够在设置每个参数时指定它PreparedStatement。
这一变化做出了一些重大改进,但我们最终仍然看到以下内容:
Row # Bind :1 Bind :2 Bind :3 Bind :4 Bind :5
----- -------------- -------------- -------------- -------------- --------------
--parse--
1 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
2 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
3 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
4 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
5 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
--execute & parse--
6 VARCHAR2(128) NUMBER TIMESTAMP VARCHAR2(32) VARCHAR2(2000)
--execute & parse--
7 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
8 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
9 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
10 VARCHAR2(128) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
--execute & parse--
11 VARCHAR2(2000) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
12 VARCHAR2(2000) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
13 VARCHAR2(2000) NUMBER TIMESTAMP CLOB VARCHAR2(2000)
--execute--
Run Code Online (Sandbox Code Playgroud)
绝对是一个改进,但我们没有修复CLOB,有时我们看到尺寸VARCHAR2扩大。经过更多研究后,我们偶然发现了这个关于由于 bind_mismatch 导致的高版本计数的线程,这听起来很有希望。我们的数据良好且一致的表运行起来没有问题,但长度不同的字段(例如电子邮件地址)会对性能造成严重破坏。因此,我们运行以下命令将绑定VARCHAR2大小强制设置为 4000:
ALTER SYSTEM SET EVENTS '10503 trace name context forever, level 2001';
Run Code Online (Sandbox Code Playgroud)
之后我们再次尝试,得到如下结果:
Row # Bind :1 Bind :2 Bind :3 Bind :4 Bind :5
----- -------------- -------------- -------------- -------------- --------------
--parse--
1 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
2 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
3 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
4 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
5 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
--execute & parse--
6 VARCHAR2(40000) NUMBER TIMESTAMP VARCHAR2(32) VARCHAR2(4000)
--execute & parse--
7 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
8 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
9 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
10 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
11 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
12 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
13 VARCHAR2(40000) NUMBER TIMESTAMP CLOB VARCHAR2(4000)
--execute--
Run Code Online (Sandbox Code Playgroud)
现在我们已经几乎完美了,但是我们不知道如何VARCHAR2在得到 null 时阻止 JDBC 绑定CLOB。幸运的是,我们只有几个带有可为空列的表CLOB,因此我们显着提高了性能并减少了更改绑定的影响。但我内心肯定有一部分人希望得到最后 1%……有什么建议吗?
| 归档时间: |
|
| 查看次数: |
636 次 |
| 最近记录: |