usr*_*ΛΩΝ 18 java mysql hibernate jdbc batch-processing
我目前正面临众所周知和常见的Hibernate插入批处理问题.
我需要保存500万行的批次.我首先尝试使用更轻的有效载荷.由于我必须只插入两种类型的实体(首先是所有类型A的记录,然后是所有类型B的记录,都指向公共类型C ManyToOne父级),我想从JDBC批量插入中获取最大优势.
我已经阅读了很多文档,但我没有尝试过.
AUTO_INCREMENTID,我用一个技巧设置了ID:SELECT MAX(ID) FROM ENTITIES每次都增加.hibernate.jdbc.batch_size与我的应用程序的批量大小一致,所以我在LocalSessionFactoryBean(Spring ORM集成)中设置它这是我的实体
共同的父实体.这将首先插入到单个事务中.我不关心这里的自动增量列.每批作业只有一个记录
@Entity
@Table(...)
@SequenceGenerator(...)
public class Deal
{
@Id
@Column(
name = "DEAL_ID",
nullable = false)
@GeneratedValue(
strategy = GenerationType.AUTO)
protected Long id;
................
}
Run Code Online (Sandbox Code Playgroud)
其中一个孩子(假设每批2.5M记录)
@Entity
@Table(
name = "TA_LOANS")
public class Loan
{
@Id
@Column(
name = "LOAN_ID",
nullable = false)
protected Long id;
@ManyToOne(
optional = false,
targetEntity = Deal.class,
fetch = FetchType.LAZY)
@JoinColumn(
name = "DEAL_ID",
nullable = false)
protected Deal deal;
.............
}
Run Code Online (Sandbox Code Playgroud)
其他孩子打字.让我们说其他2.5M记录
@Entity
@Table(
name = "TA_BONDS")
public class Bond
{
@Id
@Column(
name = "BOND_ID")
@ManyToOne(
fetch = FetchType.LAZY,
optional = false,
targetEntity = Deal.class)
@JoinColumn(
name = "DEAL_ID",
nullable = false,
updatable = false)
protected Deal deal;
}
Run Code Online (Sandbox Code Playgroud)
插入记录的简化代码
long loanIdCounter = loanDao.getMaxId(), bondIdCounter = bondDao.getMaxId(); //Perform SELECT MAX(ID)
Deal deal = null;
List<Bond> bondList = new ArrayList<Bond>(COMMIT_BATCH_SIZE); //500 constant value
List<Loan> loanList = new ArrayList<Loan>(COMMIT_BATCH_SIZE);
for (String msg: inputStreamReader)
{
log.debug(msg.toString());
if (this is a deal)
{
Deal deal = parseDeal(msg.getMessage());
deal = dealManager.persist(holder.deal); //Called in a separate transaction using Spring annotation @Transaction(REQUIRES_NEW)
}
else if (this is a loan)
{
Loan loan = parseLoan(msg.getMessage());
loan.setId(++loanIdCounter);
loan.setDeal(deal);
loanList.add(loan);
if (loanList.size() == COMMIT_BATCH_SIZE)
{
loanManager.bulkInsert(loanList); //Perform a bulk insert in a single transaction, not annotated but handled manually this time
loanList.clear();
}
}
else if (this is a bond)
{
Bond bond = parseBond(msg.getMessage());
bond.setId(++bondIdCounter);
bond.setDeal(deal);
bondList.add(bond);
if (bondList.size() == COMMIT_BATCH_SIZE) //As above
{
bondManager.bulkInsert(bondList);
bondList.clear();
}
}
}
if (!bondList.isEmpty())
bondManager.bulkInsert(bondList);
if (!loanList.isEmpty())
loanManager.bulkInsert(loanList);
//Flush remaining items, not important
Run Code Online (Sandbox Code Playgroud)
执行bulkInsert:
@Override
public void bulkInsert(Collection<Bond> bonds)
{
// StatelessSession session = sessionFactory.openStatelessSession();
Session session = sessionFactory.openSession();
try
{
Transaction t = session.beginTransaction();
try
{
for (Bond bond : bonds)
// session.persist(bond);
// session.insert(bond);
session.save(bond);
}
catch (RuntimeException ex)
{
t.rollback();
}
finally
{
t.commit();
}
}
finally
{
session.close();
}
}
Run Code Online (Sandbox Code Playgroud)
正如你从评论中看到的那样,我尝试了几种有状态/无状态的组合session.没有用.
我的网址dataSource是ComboPooledDataSource以下网址
<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&rewriteBatchedStatements=true" />
Run Code Online (Sandbox Code Playgroud)
我的 SessionFactory
<b:bean id="sessionFactory" class="class.that.extends.org.springframework.orm.hibernate3.LocalSessionFactoryBean" lazy-init="false" depends-on="dataSource">
<b:property name="dataSource" ref="phoenixDataSource" />
<b:property name="hibernateProperties">
<b:props>
<b:prop key="hibernate.dialect">${hibernate.dialect}</b:prop> <!-- MySQL5InnoDb-->
<b:prop key="hibernate.show_sql">${hibernate.showSQL}</b:prop>
<b:prop key="hibernate.jdbc.batch_size">500</b:prop>
<b:prop key="hibernate.jdbc.use_scrollable_resultset">false</b:prop>
<b:prop key="hibernate.cache.use_second_level_cache">false</b:prop>
<b:prop key="hibernate.cache.provider_class">org.hibernate.cache.EhCacheProvider</b:prop>
<b:prop key="hibernate.cache.use_query_cache">false</b:prop>
<b:prop key="hibernate.validator.apply_to_ddl">false</b:prop>
<b:prop key="hibernate.validator.autoregister_listeners">false</b:prop>
<b:prop key="hibernate.order_inserts">true</b:prop>
<b:prop key="hibernate.order_updates">true</b:prop>
</b:props>
</b:property>
</b:bean>
Run Code Online (Sandbox Code Playgroud)
即使我的项目范围的类扩展LocalSessionFactoryBean,它也不会覆盖它的方法(只添加几个项目范围的方法)
几天后我生气了.我读了几篇文章,没有人帮助我启用批量插入.我从使用Spring上下文的JUnit测试中运行我的所有代码(所以我可以使用@Autowire我的类).我的所有尝试只产生了许多单独的INSERT陈述
我错过了什么?
nei*_*ldo 18
您的查询可能正在被重写,但您不会知道是否通过查看Hibernate SQL日志.Hibernate不会重写insert语句--MySQL驱动程序会重写它们.换句话说,Hibernate会向驱动程序发送多个insert语句,然后驱动程序将重写它们.所以Hibernate日志只显示SQL Hibernate发送给驱动程序的内容,而不是驱动程序发送到数据库的SQL.
您可以通过在连接url中启用MySQL的profileSQL参数来验证这一点:
<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&rewriteBatchedStatements=true&profileSQL=true" />
Run Code Online (Sandbox Code Playgroud)
使用类似于你的示例,这是我的输出:
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
Wed Feb 05 13:29:52 MST 2014 INFO: Profiler Event: [QUERY] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) duration: 1 ms, connection-id: 81, statement-id: 33, resultset-id: 0, message: insert into Person (firstName, lastName, id) values ('person1', 'Name', 1),('person2', 'Name', 2),('person3', 'Name', 3),('person4', 'Name', 4),('person5', 'Name', 5),('person6', 'Name', 6),('person7', 'Name', 7),('person8', 'Name', 8),('person9', 'Name', 9),('person10', 'Name', 10)
Run Code Online (Sandbox Code Playgroud)
Hibernate正在记录前10行,但这并不是实际发送到MySQL数据库的内容.最后一行来自MySQL驱动程序,它清楚地显示了具有多个值的单个批处理插入,这实际上是发送到MySQL数据库的.
| 归档时间: |
|
| 查看次数: |
6310 次 |
| 最近记录: |