当我尝试"解码"原始或内部R函数时,Ben Bolkers对这个问题的回答和Uwe Ligges的文章已经非常有用.但原始R函数如何与其相应的C函数相连?我想不知何故.Primitive必须提供这个缺失的链接.举个例子is.na:
> is.na
function (x) .Primitive("is.na")
Run Code Online (Sandbox Code Playgroud)
FUNTAB R_FunTab[] 在文件"names.c"中包含
{"is.na", do_isna, 0, 1, 1, {PP_FUNCALL, PREC_FN, 0}},
Run Code Online (Sandbox Code Playgroud)
这意味着is.na使用C函数do_isna.
do_isna在文件"coerce.c"中定义:
SEXP attribute_hidden do_isna(SEXP call, SEXP op, SEXP args, SEXP rho)
{
SEXP ans, dims, names, x;
R_xlen_t i, n;
checkArity(op, args);
check1arg(args, call, "x");
if (DispatchOrEval(call, op, "is.na", args, rho, &ans, 1, 1))
return(ans);
PROTECT(args = ans);
#ifdef stringent_is
if (!isList(CAR(args)) && !isVector(CAR(args)))
errorcall_return(call, "is.na " R_MSG_list_vec);
#endif
x = CAR(args);
n = xlength(x);
PROTECT(ans = allocVector(LGLSXP, n));
if (isVector(x)) {
PROTECT(dims = getAttrib(x, R_DimSymbol));
if (isArray(x))
PROTECT(names = getAttrib(x, R_DimNamesSymbol));
else
PROTECT(names = getAttrib(x, R_NamesSymbol));
}
else dims = names = R_NilValue;
switch (TYPEOF(x)) {
case LGLSXP:
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = (LOGICAL(x)[i] == NA_LOGICAL);
break;
case INTSXP:
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = (INTEGER(x)[i] == NA_INTEGER);
break;
case REALSXP:
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = ISNAN(REAL(x)[i]);
break;
case CPLXSXP:
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = (ISNAN(COMPLEX(x)[i].r) ||
ISNAN(COMPLEX(x)[i].i));
break;
case STRSXP:
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = (STRING_ELT(x, i) == NA_STRING);
break;
/* Same code for LISTSXP and VECSXP : */
#define LIST_VEC_NA(s) \
if (!isVector(s) || length(s) != 1) \
LOGICAL(ans)[i] = 0; \
else { \
switch (TYPEOF(s)) { \
case LGLSXP: \
case INTSXP: \
LOGICAL(ans)[i] = (INTEGER(s)[0] == NA_INTEGER); \
break; \
case REALSXP: \
LOGICAL(ans)[i] = ISNAN(REAL(s)[0]); \
break; \
case STRSXP: \
LOGICAL(ans)[i] = (STRING_ELT(s, 0) == NA_STRING); \
break; \
case CPLXSXP: \
LOGICAL(ans)[i] = (ISNAN(COMPLEX(s)[0].r) \
|| ISNAN(COMPLEX(s)[0].i)); \
break; \
default: \
LOGICAL(ans)[i] = 0; \
} \
}
case LISTSXP:
for (i = 0; i < n; i++) {
LIST_VEC_NA(CAR(x));
x = CDR(x);
}
break;
case VECSXP:
for (i = 0; i < n; i++) {
SEXP s = VECTOR_ELT(x, i);
LIST_VEC_NA(s);
}
break;
case RAWSXP:
/* no such thing as a raw NA */
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = 0;
break;
default:
warningcall(call, _("%s() applied to non-(list or vector) of type '%s'"),
"is.na", type2char(TYPEOF(x)));
for (i = 0; i < n; i++)
LOGICAL(ans)[i] = 0;
}
if (dims != R_NilValue)
setAttrib(ans, R_DimSymbol, dims);
if (names != R_NilValue) {
if (isArray(x))
setAttrib(ans, R_DimNamesSymbol, names);
else
setAttrib(ans, R_NamesSymbol, names);
}
if (isVector(x))
UNPROTECT(2);
UNPROTECT(1);
UNPROTECT(1); /*ans*/
return ans;
}
Run Code Online (Sandbox Code Playgroud)
但是,如果我们要评估is.na(x=3),例如,如何参数
call,op,args,rho产生的?至少必须使用一些外部信息,x=3这还不够.而且,乍一看x=3根本没有使用,当然一定是错的:
> is.na
function (x) .Primitive("is.na")
Run Code Online (Sandbox Code Playgroud)
R代码.Primitive没有给出提示:
> .Primitive
function (name) .Primitive(".Primitive")
Run Code Online (Sandbox Code Playgroud)
考虑到所有这些因素,一个明显优秀isNA的is.na失败副本就不足为奇了:
> isNA <- function (x) .Primitive("is.na")
> isNA
function (x) .Primitive("is.na")
> is.na
function (x) .Primitive("is.na")
> isNA(x=3)
function (x) .Primitive("is.na")
> is.na(x=3)
[1] FALSE
Run Code Online (Sandbox Code Playgroud)
说得直接:所有的C函数do_...有这些争论
call,op,args,rho.当调用原始R函数时,他们通过什么公式计算?
好问题.我在gdb下启动了R R -d gdb,设置了一个断点do_isna,然后继续R并进入is.na(3).
$ R -d gdb
(gdb) run
Starting program: /home/mtmorgan/bin/R-3-3-branch/bin/exec/R --no-save --no-restore --silent
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> ## break, cntrl-C
Program received signal SIGINT, Interrupt.
0x00007ffff722fd83 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) b do_isna
Breakpoint 1 at 0x7ffff77e0b3b: file /home/mtmorgan/src/R-3-3-branch/src/main/coerce.c, line 1982.
(gdb) continue
Continuing.
> is.na(3)
Breakpoint 1, do_isna (call=0x1838888, op=0x628218, args=0x1838770, rho=0x63f648)
at /home/mtmorgan/src/R-3-3-branch/src/main/coerce.c:1982
1982 checkArity(op, args);
(gdb)
Run Code Online (Sandbox Code Playgroud)
在gdb提示我问
(gdb) where
#0 do_isna (call=0x1838888, op=0x628218, args=0x1838770, rho=0x63f648) at /home/mtmorgan/src/R-3-3-branch/src/main/coerce.c:1982
#1 0x00007ffff7869170 in Rf_eval (e=0x1838888, rho=0x63f648) at /home/mtmorgan/src/R-3-3-branch/src/main/eval.c:717
#2 0x00007ffff78b36af in Rf_ReplIteration (rho=0x63f648, savestack=0, browselevel=0, state=0x7fffffffcaf0) at /home/mtmorgan/src/R-3-3-branch/src/main/main.c:258
...
Run Code Online (Sandbox Code Playgroud)
从#2开始,Rf_ReplIteration是试图评估的REPL(读取 - 评估 - 打印循环)is.na(3).它提供了调用函数的环境.当它Rf_eval()在第258行呼叫时,它知道环境和呼叫
(gdb) call Rf_PrintValue(rho)
<environment: R_GlobalEnv>
(gdb) call Rf_PrintValue(thisExpr)
is.na(3)
Run Code Online (Sandbox Code Playgroud)
由#1(eval.c:717),R想出的值op和tmp.
(gdb) call Rf_PrintValue(op)
function (x) .Primitive("is.na")
(gdb) call TYPEOF(op)
$2 = 8
Run Code Online (Sandbox Code Playgroud)
(类型8是'BUILTINSXP',来自Rinternals.h中的表).它通过发现它e是LANGSXP(第614行)来实现这一点,即.na是SYMSXP(第670行),并且它引用的函数(op)是BUILTINSXP(第700行).然后使用(第717行)
(gdb) call PRIMFUN(op)
$8 = (SEXP (*)(SEXP, SEXP, SEXP, SEXP)) 0x7ffff77e0b20 <do_isna>
Run Code Online (Sandbox Code Playgroud)
发现它应该调用do_isna它发现的值.
希望这能消除一些神秘感,并指向代码的相关部分.
| 归档时间: |
|
| 查看次数: |
403 次 |
| 最近记录: |