To summarize the specifics: We need to stage approximately 5 million rows into a vendor (Oracle) database. Everything goes great for batches of 500k rows using OracleBulkCopy
(ODP.NET), but when we try to scale up to 5M, the performance starts slowing to a crawl once it hits the 1M mark, gets progressively slower as more rows are loaded, and eventually times out after 3 hours or so.
I suspect it's related to a primary key on the table, but I've …
假设我有这个假设的模式:
源 (OLTP) 数据库:
Table Orders
------------
OrderID int IDENTITY (PK),
CustomerID int NOT NULL,
OrderAmount decimal NOT NULL
Run Code Online (Sandbox Code Playgroud)
目的地 (DSS) 数据库:
Table Activity
--------------
ActivityID int IDENTITY (PK),
PersonID int NOT NULL,
Amount decimal NOT NULL
Table ActivityOrderImport
--------------------
ActivityID int NOT NULL,
SourceOrderID int NOT NULL
Table CustomerMapping
---------------------
CustomerID int NOT NULL,
PersonID int NOT NULL
Run Code Online (Sandbox Code Playgroud)
显然,随着更多的转换,真正的交易要复杂得多。但暂时假设 ETL 所做的只是将来自外部实体的特定事务(“订单”)合并到跟踪通用“活动”的 DSS 中。外部客户和 DSS 人员之间的链接位于 CustomerMapping 表中。
“导入”表的想法是在出现问题时提供某种审计跟踪。我们对源系统没有太多控制权,并且知道它有点不稳定。因此,能够了解任何给定活动的起源对我们来说非常重要。
现在,有一个使用 DDL 执行此操作的脚本,如下所示:
ALTER TABLE Activity
ADD OrderID int NULL
MERGE …
Run Code Online (Sandbox Code Playgroud)