如何在 MySQL (redux) 中创建用户定义的聚合函数？

Question

如何在 MySQL (redux) 中创建用户定义的聚合函数？

好吧，我可能会因为第二次问同样的问题而被禁止，但我在这里是一个令人难以置信的新手，我被告知我没有足够的声誉来评论原始问题。

我同意 OP (Matt Fenwick) 在如何创建用户定义的聚合函数中回答 5 的评论。因为它给了一条鱼一天，但真正的问题是学习如何钓鱼。

我需要一个 GROUP BY 聚合函数，例如 MIN 函数，但是在调用 SMALL(t.value, 2) 时它返回第二小的数据元素，在调用 SMALL(t.value, 3) 时它返回第三个最小的数据元素，等等。（这实际上适用于 LibreOffice Calc 的 SMALL 函数，顺便说一句。）

我提出这个问题的特殊原因的问题是我想将大学课程的评分电子表格转换为 MySQL 数据库，并且我需要删除两个最低的出勤分数、两个最低的作业分数和两个最低的测验每个学生的分数，以计算他们调整后的出勤率、调整后的家庭作业平均值和调整后的测验平均值，以及对数据的其他调整。

tblQuizzes 看起来像

mysql> select * from tblQuizzes LIMIT 12;
+------------+---------+-----------+
| StudentKey | QuizKey | QuizGrade |
+------------+---------+-----------+
|          1 |       1 |     0.123 |
|          2 |       1 |     0.456 |
|          3 |       1 |     0.789 |
|          4 |       1 |     0.890 |
|          5 |       1 |     0.123 |
|          6 |       1 |     1.000 |
|          1 |       2 |     0.789 |
|          2 |       2 |     0.123 |
|          3 |       2 |     0.456 |
|          4 |       2 |     0.789 |
|          5 |       2 |     0.123 |
|          6 |       2 |     1.000 |
+------------+---------+-----------+

Run Code Online (Sandbox Code Playgroud)

我需要一个类似的查询

SELECT
tblQuizzes.StudentKey,
(SUM(tblQuizzes.QuizGrade) - MIN(tblQuizzes.QuizGrade) - UDF_SMALL(tblQuizzes.QuizGrade, 2))/(COUNT(tblQuizzes.QuizGrade) - 2) AS AdjQuizAverage
FROM
tblQuizzes
GROUP BY
tblQuizzes.StudentKey

Run Code Online (Sandbox Code Playgroud)

删除最低的 2 个测验分数并平均剩余的测验。

所以，请记住，我不只是想要“一天的鱼”并且知道如何获得第二低的测验分数，而是对“学习如何钓鱼”并写一个 MIN(table.value, offset) GROUP BY 聚合函数（如果需要，在 C/C++ 或 PHP 中），任何人都可以提供一些提示/完整答案以及这个“钓鱼问题”的链接和示例吗？

非常感谢您提供的任何帮助；如果我在“要求”/请求中显得粗鲁，我很抱歉，我刚刚了解到模糊的请求会导致解决方法，“钓鱼一天”的回复。

Answer 1

Jef*_*and 0

这不是一个完整的答案，但已经非常接近了。

我在 MySQL 中有一个用户定义的函数（“UDF”），它应该返回 GROUP BY 中每个组的第 n 个最小元素（第一个最小的元素与 MIN 函数一致，第二个最小的元素是新的），但它似乎有一些错误；我不断收到我认为是核心转储的信息（实际错误消息是“ERROR 2013 (HY000)：在查询期间丢失与 MySQL 服务器的连接”）。

/******************************************************************************
 ** udf_small.c
 **
 ** This MySQL user-defined function (UDF) sorts through an aggregate dataset, 
 ** then returns the ith smallest element each group in the GROUP BY statement.
 **
 ** Author:        Jeffrey Rolland
 ** Creation Date: 12/07/2014
 ** Modifications: None
 ******************************************************************************/

/******************************************************************************
** A dynamically loadable file should be compiled shared.
** (something like: gcc -shared -o my_func.so -I /usr/includes/mysql/ my_func.c).
** You can easily get all switches right by doing:
** cd sql ; make udf_example.o
** Take the compile line that make writes, remove the '-c' near the end of
** the line and add -shared -o udf_example.so to the end of the compile line.
** The resulting library (udf_example.so) should be copied to some dir
** searched by ld. (/usr/lib/mysql/plugin/ ?)
** If you are using gcc, then you should be able to create the udf_example.so
** by simply doing 'make udf_example.so'.
**
** After the library is made one must notify mysqld about the new
** functions with the commands:
**
** CREATE FUNCTION metaphon RETURNS STRING SONAME "udf_example.so";
** CREATE FUNCTION myfunc_double RETURNS REAL SONAME "udf_example.so";
** CREATE FUNCTION myfunc_int RETURNS INTEGER SONAME "udf_example.so";
** CREATE FUNCTION sequence RETURNS INTEGER SONAME "udf_example.so";
** CREATE FUNCTION lookup RETURNS STRING SONAME "udf_example.so";
** CREATE FUNCTION reverse_lookup RETURNS STRING SONAME "udf_example.so";
** CREATE AGGREGATE FUNCTION avgcost RETURNS REAL SONAME "udf_example.so";
** CREATE FUNCTION myfunc_argument_name RETURNS STRING SONAME "udf_example.so";
**
** After this the functions will work exactly like native MySQL functions.
** Functions should be created only once.
**
** The functions can be deleted by:
**
** DROP FUNCTION metaphon;
** DROP FUNCTION myfunc_double;
** DROP FUNCTION myfunc_int;
** DROP FUNCTION lookup;
** DROP FUNCTION reverse_lookup;
** DROP FUNCTION avgcost;
** DROP FUNCTION myfunc_argument_name;
**
** The CREATE FUNCTION and DROP FUNCTION update the func@mysql table. All
** Active function will be reloaded on every restart of server
** (if --skip-grant-tables is not given)
**
** If you ge problems with undefined symbols when loading the shared
** library, you should verify that mysqld is compiled with the -rdynamic
** option.
**
** If you can't get AGGREGATES to work, check that you have the column
** 'type' in the mysql.func table. If not, run 'mysql_upgrade'.
**
**************************************************************************/

/*************************************************************************
** Syntax for the new aggregate commands are:
** CREATE AGGREGATE FUNCTION <function_name> RETURNS {STRING|REAL|INTEGER}
** SONAME <name_of_shared_library>
**
** Syntax for avgcost: AVGCOST( t.quantity, t.price )
** with t.quantity=integer, t.price=double
**
** (This example is provided by Andreas F. Bobak <bobak@relog.ch>)
**************************************************************************/

#ifdef STANDARD
    /* STANDARD is defined, don't use any mysql functions */
    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    #ifdef __WIN__
        typedef unsigned __int64 ulonglong; /* Microsofts 64 bit types */
        typedef __int64 longlong;
    #else
        typedef unsigned long long ulonglong;
        typedef long long longlong;
    #endif /*__WIN__*/
#else
    #include <my_global.h>
    #include <my_sys.h>
    #if defined(MYSQL_SERVER)
        #include <m_string.h>   /* To get strmov() */
    #else
        /* when compiled as standalone */
        #include <string.h>
        #define strmov(a,b) stpcpy(a,b)
        #define bzero(a,b) memset(a,0,b)
    #endif
#endif
#include <mysql.h>
#include <ctype.h>
#ifdef _WIN32
    /* inet_aton needs winsock library */
    #pragma comment(lib, "ws2_32")
#endif
#ifdef HAVE_DLOPEN
    #if !defined(HAVE_GETHOSTBYADDR_R) || !defined(HAVE_SOLARIS_STYLE_GETHOST)
        static pthread_mutex_t LOCK_hostname;
    #endif
#endif
#include <stdbool.h>

my_bool udf_small_init( UDF_INIT* initid, UDF_ARGS* args, char* message );
void udf_small_deinit( UDF_INIT* initid );
void udf_small_reset( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char *error );
void udf_small_clear( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char *error );
void udf_small_add( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char *error );
double udf_small( UDF_INIT* initid, UDF_ARGS* args, char* is_null, char *error );

struct SortArray
{
    int length;
    double *data;
};

/*************************************************************************
** Example of init function
** Arguments:
** my_bool maybe_null 1 if function can return NULL
** Default value is 1 if any of the arguments
** is declared maybe_null
**
**
** initid Points to a structure that the init function should fill.
** This argument is given to all other functions.
**
** unsigned int decimals Number of decimals.
** Default value is max decimals in any of the
** arguments.
**
** unsigned int max_length Length of string result.
** The default value for integer functions is 21
** The default value for real functions is 13+
** default number of decimals.
** The default value for string functions is
** the longest string argument.
**
** char *ptr; A pointer that the function can use.
**
**
** args Points to a structure which contains:
** unsigned int arg_count Number of arguments
**
** enum Item_result *arg_type Types for each argument.
** Types are STRING_RESULT, REAL_RESULT
** and INT_RESULT.
**
** char **args Pointer to constant arguments.
** Contains 0 for not constant argument.
**
** unsigned long *lengths; max string length for each argument
** char *maybe_null Information of which arguments
** may be NULL
**
**
** message Error message that should be passed to the user on fail.
** The message buffer is MYSQL_ERRMSG_SIZE big, but one should
** try to keep the error message less than 80 bytes long!
**
** This function should return 1 if something goes wrong. In this case
** message should contain something usefull!
**************************************************************************/
my_bool udf_small_init( UDF_INIT* initid, UDF_ARGS* args, char* message )
{
    struct SortArray *sort = NULL;
    int target_depth;
    int i;

    if (args->arg_count != 2)
    {
        strcpy(message, "wrong number of arguments: SMALL() requires two arguments");
        return 1;
    }
    if ((args->arg_type[0] != REAL_RESULT) || (args->arg_type[1] != INT_RESULT) )
    {
        strcpy(message, "wrong argument type: SMALL() requires an REAL and a INT");
        return 1;
    }
    if ((int)(*args->args[1]) < 1)
    {
        strcpy(message, "wrong argument value: SMALL() requires second parameter that is positive");
        return 1;
    }

    /*
    ** force arguments to double.
    */
    /*args->arg_type[0] = REAL_RESULT;
    args->arg_type[1] = REAL_RESULT;*/

    initid->maybe_null = 0; /* The result may be null */
    initid->decimals = 10; /* We want 10 decimals in the result */
    initid->max_length = 20; /* 10 digits + . + 10 decimals */
    /*
    if (!(data = (double*) malloc((int)(args->arg_type[1])*sizeof(double))))
    {
        strmov(message,"Couldn't allocate memory");
        return 1;
    }
    */

    target_depth = (int)(*args->args[1]);

    sort = malloc((sizeof(struct SortArray)));

    if(sort == NULL)
    {
        strcpy(message, "Couldn't allocate memory");
        return 1;
    }   

    initid->ptr = (char*)sort;  

    sort->data = malloc((target_depth)*(sizeof(double)));

    if(sort->data == NULL)
    {
        strcpy(message, "Couldn't allocate memory");
        return 1;
    }   

    for(i = 0; i < target_depth; i++)
    {
        sort->data[i] = DBL_MAX;
    }

    return 0;
}

/****************************************************************************
** Deinit function. This should free all resources allocated by
** this function.
**
** Arguments:
** initid Return value from xxxx_init
****************************************************************************/
void udf_small_deinit( UDF_INIT* initid )
{
    free(initid->ptr);  
}

/****************************************************************************
** Small Aggregate Function.
**
** There are 3 extra functions for an aggregate function, xxx_reset, 
** xxx_clear, and xxx_add
****************************************************************************/

/****************************************************************************
** xxx_reset
**
** Arguments:
** initid Structure filled by xxx_init
**
** args The same structure as to xxx_init. This structure
** contains values for all parameters.
** Note that the functions MUST check and convert all
** to the type it wants! Null values are represented by
** a NULL pointer
**
** is_null If the result is null, one should store 1 here.
**
** message Error message that should be passed to the user on fail.
** The message buffer is MYSQL_ERRMSG_SIZE big, but one should
** try to keep the error message less than 80 bytes long!.
**
** xxx_reset gets done every time we begin processing the records for a new 
** group. It calls xxx_clear and xxx_add
****************************************************************************/

/* This is only for MySQL 4.0 compability */
void udf_small_reset(UDF_INIT* initid, UDF_ARGS* args, char* is_null, char* message)
{
    udf_small_clear(initid, args, is_null, message);
    udf_small_add(initid, args, is_null, message);
}

/****************************************************************************
** xxx_clear
**
** Arguments:
** initid Structure filled by xxx_init
**
** args The same structure as to xxx_init. This structure
** contains values for all parameters.
** Note that the functions MUST check and convert all
** to the type it wants! Null values are represented by
** a NULL pointer
**
** is_null If the result is null, one should store 1 here.
**
** message Error message that should be passed to the user on fail.
** The message buffer is MYSQL_ERRMSG_SIZE big, but one should
** try to keep the error message less than 80 bytes long!.
**
** xxx_clear resets the processing variables to their initial values for a 
** new group
****************************************************************************/

/* This is needed to get things to work in MySQL 4.1.1 and above */
void udf_small_clear(UDF_INIT* initid, UDF_ARGS* args, char* is_null __attribute__((unused)), char* message __attribute__((unused)))
{
    struct SortArray *sort = NULL;
    int target_depth;
    int i;

    sort = (struct SortArray*)initid->ptr;

    target_depth = (int)(*args->args[1]);

    /* if(sort == NULL)
    {
        strcpy(message, "Initid Pointer is NULL");
        exit(1);
    }

    if(sort->data == NULL)
    {
        strcpy(message, "Couldn't allocate memory");
        exit(1);
    } */

    sort->length = 0;

    for(i = 0; i < target_depth; i++)
    {
        sort->data[i] = DBL_MAX;
    }
}

/****************************************************************************
** xxx_add
**
** Arguments:
** initid Structure filled by xxx_init
**
** args The same structure as to xxx_init. This structure
** contains values for all parameters.
** Note that the functions MUST check and convert all
** to the type it wants! Null values are represented by
** a NULL pointer
**
** is_null If the result is null, one should store 1 here.
**
** message Error message that should be passed to the user on fail.
** The message buffer is MYSQL_ERRMSG_SIZE big, but one should
** try to keep the error message less than 80 bytes long!.
**
** xxx_add is the main processing workhorse of a anggregate UDF. It processes
** each new record in the group.
****************************************************************************/

void udf_small_add(UDF_INIT* initid, UDF_ARGS* args, char* is_null __attribute__((unused)), char* message __attribute__((unused)))
{
    if (args->args[0] && args->args[1])
    {
        struct SortArray *sort = NULL;
        int target_depth;
        double col_value;
        int counter;
        int i, j;
        bool done;

        sort = (struct SortArray*)initid->ptr;

        target_depth = (int)(*args->args[1]);   

        col_value = (double)(*args->args[0]);   

        /* if(sort == NULL)
        {
            strcpy(message, "Initid pointer is NULL");
            exit(1);
        } */

        i = target_depth;
        counter = 1;
        done = FALSE;

        while ((!(done)) && (i > 0))
        {
            if(col_value <= sort->data[i-1])
            {
                for(j = 0; j < i-1; j++)
                {
                    sort->data[j] = sort->data[j+1];
                }
                sort->data[i-1] = col_value;
                done = TRUE;
            }
            else
            {
                i--;
                counter++;
            }
        }

        if(counter >= sort->length)
        {
            sort->length = counter;
        }

    }
}

/***************************************************************************
** UDF double function.
**
** Arguments:
** initid Structure filled by xxx_init
**
** args The same structure as to xxx_init. This structure
** contains values for all parameters.
** Note that the functions MUST check and convert all
** to the type it wants! Null values are represented by
** a NULL pointer
**
** is_null If the result is null, one should store 1 here.
**
** error If something goes fatally wrong one should store 1 here.
**
** This function should return a double. It returns a value at the end of 
** each group.
***************************************************************************/

double udf_small( UDF_INIT* initid, UDF_ARGS* args __attribute__((unused)), char* is_null, char* error __attribute__((unused)))
{
    struct SortArray *sort = NULL;
    int target_depth;

    sort = (struct SortArray*)initid->ptr;
    target_depth = (int)(*args->args[1]);

    if(sort != NULL)
    {
        if(sort->length == target_depth);
        {
            is_null = 0;
            return sort->data[0];
        }   
    }
    else
    {
        *is_null = 1;
        return  1.0;
    }
}

Run Code Online (Sandbox Code Playgroud)

我认为问题在于指向结构体和双精度数组的指针。问题是，我不知道在结构中声明有多大的双精度数组，直到我从查询中读取参数 target_depth (=(int)(*args->args[1])) （确保查询是格式良好）。我一直对指针和指针/数组二元性有点模糊。

无论如何，如果有人可以帮助我对 UDF 进行故障排除以使其正常运行，这将回答问题并提供构建块来解决一系列类似的问题，例如从组中查找第二大元素。

预先感谢您可以提供的任何帮助。

归档时间：	10 年，11 月前
查看次数：	1215 次
最近记录：	10 年，11 月前