我想通过JPA从数据库(MySQL)获取数据,我希望它按一些列值排序.
那么,最佳做法是什么:
要么
提前致谢
Gen*_*diy 53
如果要检索所有数据库数据的子集,例如在1000个屏幕上显示20行,则最好对数据库进行排序.这将更快,更容易,并允许您一次检索一页行(20,50,100)而不是所有行.
如果您的数据集非常小,那么如果您希望实现复杂的排序,则可以更方便地对代码进行排序.通常这种复杂的排序可以SQL
在代码中完成,但不像代码那样容易.
缺点是,经验法则是排序SQL
,通过一些边缘情况来规则.
Ale*_*lli 33
一般情况下,最好ORDER BY
在SQL查询中使用 - 这样,如果有适用的索引,您可能会"免费"进行排序(最坏的情况是,它将与执行相同的工作量)在你的代码中,但通常它可能不那么重要!).
Phi*_*ler 18
这不完全是关键,但我最近发布了一些与数据库和应用程序端排序有关的内容.这篇文章是关于.net技术的,所以大部分内容都可能对您不感兴趣,但基本原则仍然存在:
将排序推迟到客户端(例如jQuery,Dataset/Dataview排序)可能看起来很诱人.它实际上是一个可行的选项,用于分页,排序和过滤,如果(并且仅限于):
1.数据集很小,而且
1.对性能和可扩展性几乎没有任何担忧
根据我的经验,符合这种标准的系统很少.请注意,在应用程序/数据库中混合和匹配排序/分页是不可能的 - 如果您向数据库询问未排序的100行数据,然后在应用程序端对这些行进行排序,您可能不会得到该集合你期待的数据.这似乎是显而易见的,但我已经看到错误已经足够多次,我想至少提一下.
出于多种原因,在数据库中进行排序和过滤效率要高得多.首先,数据库引擎经过高度优化,可以完成排序和过滤所需的工作; 这就是他们的底层代码的设计目的.但即便如此,即使假设您可以编写可以匹配成熟数据库引擎的排序,过滤和分页性能的代码,仍然可以在数据库中执行此工作,原因很简单,限制更有效从数据库传输到应用程序服务器的数据量.
因此,例如,如果在过滤之前有10,000行,并且您的查询将该数字减少到75,则在客户端上进行过滤会导致所有10,000行的数据通过线路传输(并进入应用服务器的内存),其中过滤在数据库端将导致仅在数据库和应用程序之间移动过滤的75行.他可以对性能和可扩展性产生巨大影响.
完整的帖子在这里:http: //psandler.wordpress.com/2009/11/20/dynamic-search-objects-part-5sorting/
Pau*_*aul 17
我遇到了同样的问题,并决定我应该运行一些基准来量化速度差异.结果让我感到惊讶.我想用这个问题发表我的经验.
和其他一些海报一样,我的想法是数据库层可以更快地进行排序,因为它们应该针对这类事情进行调整.@Alex提出了一个很好的观点,即如果数据库已经在排序上有索引,那么它会更快.我想回答一下在非索引排序上哪个原始排序更快的问题.注意,我说的更快,不简单.我认为在许多情况下让数据库完成工作更简单,更不容易出错.
我的主要假设是排序适合主存.并非所有问题都适合这里,但很多问题都适用.对于内存不足,很可能数据库在这里闪耀,尽管我没有测试过.在内存中的情况下,在我的非正式基准测试中,所有java/c/c ++的性能都超过了mysql,如果有人可以称之为.
我希望我有更多时间来更彻底地比较数据库层和应用层,但还有其他职责.不过,我忍不住为其他在这条路上行驶的人记录这张纸条.
当我开始走这条路时,我开始看到更多的障碍.我应该比较数据传输吗?怎么样?我可以比较读取db与时间的时间来读取java中的平面文件吗?如何隔离排序时间与数据传输时间与读取记录的时间?这些问题是我提出的方法和时间数字.
除非另有说明,否则所有时间均为ms
所有排序例程都是该语言提供的默认值(这些对于随机排序数据足够好)
所有编译都是通过netbeans选择的典型"发布 - 配置文件",没有自定义,除非另有发布
mysql的所有测试都使用以下模式
mysql> CREATE TABLE test_1000000
(
pk bigint(11) NOT NULL,
float_value DOUBLE NULL,
bigint_value bigint(11) NULL,
PRIMARY KEY (pk )
) Engine MyISAM;
mysql> describe test_1000000;
+--------------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------+------+-----+---------+-------+
| pk | bigint(11) | NO | PRI | NULL | |
| float_value | double | YES | | NULL | |
| bigint_value | bigint(11) | YES | | NULL | |
+--------------+------------+------+-----+---------+-------+
Run Code Online (Sandbox Code Playgroud)
首先是一个填充数据库的小片段.可能有更简单的方法,但这就是我所做的:
public static void BuildTable(Connection conn, String tableName, long iterations) {
Random ran = new Random();
Math.random();
try {
long epoch = System.currentTimeMillis();
for (long i = 0; i < iterations; i++) {
if (i % 100000 == 0) {
System.out.println(i + " next 100k");
}
PerformQuery(conn, tableName, i, ran.nextDouble(), ran.nextLong());
}
} catch (Exception e) {
logger.error("Caught General Exception Error from main " + e);
}
}
Run Code Online (Sandbox Code Playgroud)
MYSQL Direct CLI结果:
select * from test_10000000 order by bigint_value limit 10;
10 rows in set (2.32 sec)
Run Code Online (Sandbox Code Playgroud)
这些时间有点困难,因为我唯一的信息是执行命令后报告的时间.
从mysql提示符10000000元素,它大约是2.1到2.4,用于排序bigint_value或float_value
Java JDBC mysql调用(类似于从mysql cli做的排序)
public static void SortDatabaseViaMysql(Connection conn, String tableName) {
try {
Statement stmt = conn.createStatement();
String cmd = "SELECT * FROM " + tableName + " order by float_value limit 100";
ResultSet rs = stmt.executeQuery(cmd);
} catch (Exception e) {
}
}
Run Code Online (Sandbox Code Playgroud)
五跑:
da=2379 ms
da=2361 ms
da=2443 ms
da=2453 ms
da=2362 ms
Run Code Online (Sandbox Code Playgroud)
Java Sort在运行时生成随机数(实际上比磁盘IO读取慢).分配时间是生成随机数并填充数组的时间
打电话就好
JavaSort(10,10000000);
Run Code Online (Sandbox Code Playgroud)
时间结果:
assignment time 331 sort time 1139
assignment time 324 sort time 1037
assignment time 317 sort time 1028
assignment time 319 sort time 1026
assignment time 317 sort time 1018
assignment time 325 sort time 1025
assignment time 317 sort time 1024
assignment time 318 sort time 1054
assignment time 317 sort time 1024
assignment time 317 sort time 1017
Run Code Online (Sandbox Code Playgroud)
这些结果用于以二进制模式读取双精度文件
assignment time 4661 sort time 1056
assignment time 4631 sort time 1024
assignment time 4733 sort time 1004
assignment time 4725 sort time 980
assignment time 4635 sort time 980
assignment time 4725 sort time 980
assignment time 4667 sort time 978
assignment time 4668 sort time 980
assignment time 4757 sort time 982
assignment time 4765 sort time 987
Run Code Online (Sandbox Code Playgroud)
执行缓冲区传输可以大大加快运行时间
assignment time 77 sort time 1192
assignment time 59 sort time 1125
assignment time 55 sort time 999
assignment time 55 sort time 1000
assignment time 56 sort time 999
assignment time 54 sort time 1010
assignment time 55 sort time 999
assignment time 56 sort time 1000
assignment time 55 sort time 1002
assignment time 56 sort time 1002
Run Code Online (Sandbox Code Playgroud)
C和C++时序结果(参见下面的源代码)
使用qsort调试配置文件
assignment 0 seconds 110 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 90 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 90 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 340 milliseconds
assignment 0 seconds 100 milliseconds Time taken 2 seconds 330 milliseconds
Run Code Online (Sandbox Code Playgroud)
使用qsort释放配置文件
assignment 0 seconds 100 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 580 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 80 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 590 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 600 milliseconds
assignment 0 seconds 90 milliseconds Time taken 1 seconds 580 milliseconds
Run Code Online (Sandbox Code Playgroud)
发布配置文件使用std :: sort(a,a + ARRAY_SIZE);
assignment 0 seconds 100 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 870 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 120 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 900 milliseconds
assignment 0 seconds 90 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 100 milliseconds Time taken 0 seconds 890 milliseconds
assignment 0 seconds 150 milliseconds Time taken 0 seconds 870 milliseconds
Run Code Online (Sandbox Code Playgroud)
发布配置文件从文件中读取随机数据并使用std :: sort(a,a + ARRAY_SIZE)
assignment 0 seconds 50 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 40 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 50 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 50 milliseconds Time taken 0 seconds 880 milliseconds
assignment 0 seconds 40 milliseconds Time taken 0 seconds 880 milliseconds
Run Code Online (Sandbox Code Playgroud)
以下是使用的源代码.希望最小的错误:)
Java源码请注意,JavaSort内部的runCode和writeFlag需要根据您想要的时间进行调整.还要注意内存分配发生在for循环中(因此测试GC,但我没有看到任何明显的差异将分配移到循环外)
public static void JavaSort(int iterations, int numberElements) {
Random ran = new Random();
Math.random();
int runCode = 2;
boolean writeFlag = false;
for (int j = 0; j < iterations; j++) {
double[] a1 = new double[numberElements];
long timea = System.currentTimeMillis();
if (runCode == 0) {
for (int i = 0; i < numberElements; i++) {
a1[i] = ran.nextDouble();
}
}
else if (runCode == 1) {
//do disk io!!
try {
DataInputStream in = new DataInputStream(new FileInputStream("MyBinaryFile.txt"));
int i = 0;
//while (in.available() > 0) {
while (i < numberElements) { //this should be changed so that I always read in the size of array elements
a1[i++] = in.readDouble();
}
}
catch (Exception e) {
}
}
else if (runCode == 2) {
try {
FileInputStream stream = new FileInputStream("MyBinaryFile.txt");
FileChannel inChannel = stream.getChannel();
ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
//int[] result = new int[500000];
buffer.order(ByteOrder.BIG_ENDIAN);
DoubleBuffer doubleBuffer = buffer.asDoubleBuffer();
doubleBuffer.get(a1);
}
catch (Exception e) {
}
}
if (writeFlag) {
try {
DataOutputStream out = new DataOutputStream(new FileOutputStream("MyBinaryFile.txt"));
for (int i = 0; i < numberElements; i++) {
out.writeDouble(a1[i]);
}
} catch (Exception e) {
}
}
long timeb = System.currentTimeMillis();
Arrays.sort(a1);
long timec = System.currentTimeMillis();
System.out.println("assignment time " + (timeb - timea) + " " + " sort time " + (timec - timeb));
//delete a1;
}
}
Run Code Online (Sandbox Code Playgroud)
C/C++源码
#include <iostream>
#include <vector>
#include <algorithm>
#include <fstream>
#include <cstdlib>
#include <ctime>
#include <cstdio>
#include <math.h>
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#define ARRAY_SIZE 10000000
using namespace std;
int compa(const void * elem1, const void * elem2) {
double f = *((double*) elem1);
double s = *((double*) elem2);
if (f > s) return 1;
if (f < s) return -1;
return 0;
}
int compb (const void *a, const void *b) {
if (*(double **)a < *(double **)b) return -1;
if (*(double **)a > *(double **)b) return 1;
return 0;
}
void timing_testa(int iterations) {
clock_t start = clock(), diffa, diffb;
int msec;
bool writeFlag = false;
int runCode = 1;
for (int loopCounter = 0; loopCounter < iterations; loopCounter++) {
double *a = (double *) malloc(sizeof (double)*ARRAY_SIZE);
start = clock();
size_t bytes = sizeof (double)*ARRAY_SIZE;
if (runCode == 0) {
for (int i = 0; i < ARRAY_SIZE; i++) {
a[i] = rand() / (RAND_MAX + 1.0);
}
}
else if (runCode == 1) {
ifstream inlezen;
inlezen.open("test", ios::in | ios::binary);
inlezen.read(reinterpret_cast<char*> (&a[0]), bytes);
}
if (writeFlag) {
ofstream outf;
const char* pointer = reinterpret_cast<const char*>(&a[0]);
outf.open("test", ios::out | ios::binary);
outf.write(pointer, bytes);
outf.close();
}
diffa = clock() - start;
msec = diffa * 1000 / CLOCKS_PER_SEC;
printf("assignment %d seconds %d milliseconds\t", msec / 1000, msec % 1000);
start = clock();
//qsort(a, ARRAY_SIZE, sizeof (double), compa);
std::sort( a, a + ARRAY_SIZE );
//printf("%f %f %f\n",a[0],a[1000],a[ARRAY_SIZE-1]);
diffb = clock() - start;
msec = diffb * 1000 / CLOCKS_PER_SEC;
printf("Time taken %d seconds %d milliseconds\n", msec / 1000, msec % 1000);
free(a);
}
}
/*
*
*/
int main(int argc, char** argv) {
printf("hello world\n");
double *a = (double *) malloc(sizeof (double)*ARRAY_SIZE);
//srand(1);//change seed to fix it
srand(time(NULL));
timing_testa(5);
free(a);
return 0;
}
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
26110 次 |
最近记录: |