Dim*_*ios 8 floating-point rational-number ocaml arbitrary-precision
我正在使用Zarith库来进行任意精度的有理算术.假设我有一个有理数q
的类型Q.t
,即两个大整数的比率(Q
是Zarith的任意精度有理数模块).有时,为了便于阅读,我想将此数字打印为浮点数,有时我需要将此数字浮点数转换为以后的非任意精度计算.有没有办法将q
浮点数转换为一定的精度?
我转换q
为浮点的方式现在无法保证,并且可以创建未定义的浮点数(Z
是任意精度整数模块):
let to_float q =
let n, d = num q, den q in
(* check if d is zero and raise an error if it is *)
let nf, df = Z.to_float n, Z.to_float d in
nf /. df
Run Code Online (Sandbox Code Playgroud)
有没有更好的方法来处理这个问题,我可以获得最准确接近任何浮点数的浮点数q
?
编辑
如果有人有兴趣,我很快就会在OCaml中写下Mark Dickinson的回答.它可能(绝对)可以改进和清理.如果我这样做,或者如果有人有任何改进建议,我会编辑.但是现在这已经解决了我的问题!
let to_float q =
let n, d = num q, den q in
let n_sign = Z.sign n in
let d_sign = Z.sign d in (* always >= 0 *)
if d_sign = 0 then raise Division_by_zero;
let n = Z.abs n in
if n_sign = 0 then 0. else
let shift = (Z.numbits n) - (Z.numbits d) - 55 in
let is_subnormal = shift < -1076 in
let shift = if is_subnormal then -1076 else shift in
let d = if shift >= 0 then Z.shift_left d shift else d in
let n = if shift < 0 then Z.shift_left n (-shift)
else n in
let quotient, remainder = Z.div_rem n d in
let quotient = if (Z.compare remainder (Z.zero)) = 0 && Z.is_even quotient then
Z.add Z.one quotient else quotient in
let quotient = if not is_subnormal then quotient else
let round_select = Z.to_int @@ Z.rem quotient @@ Z.of_int 8 in
Z.add quotient [|Z.zero;Z.minus_one;Z.of_int (-2);Z.one;Z.zero
;Z.minus_one;Z.of_int 2;Z.one|].(round_select)
in
let unsigned_res = ldexp (Z.to_float quotient) shift in
if n_sign = 1 then unsigned_res else -.unsigned_res
Run Code Online (Sandbox Code Playgroud)
我稍后会考虑为GMP的mpq_get_d
函数编写一个接口,但我不完全确定如何做到这一点.我看到如何做到这一点的唯一方法是转换q : Q.t
为字符串并将其传递给:
int mpq_set_str (mpq_t rop, const char *str, int base)
Run Code Online (Sandbox Code Playgroud)
有谁知道如何传递rop
到mpq_get_d
OCaml或有一个描述如何做到这一点的参考?我查看了RWO的第19章并没有看到这样的情况.
如果您有权使用
log2
运算,以及那么进行自己正确舍入的转换相对容易。简而言之,该方法如下所示:
n > 0
,d > 0
; 过滤掉明显的下溢/上溢shift
,使其2^-shift*n/d
位于2^54
和之间2^56
。x = 2^-shift*n/d
,使用舍入到奇数舍入方法舍入到最接近的整数。x
为最近的IEEE 754双精度值dx
,并使用通常的舍入到舍入舍入模式。ldexp(dx, shift)
。恐怕我不会流利使用OCaml,但是以下Python代码说明了积极输入的想法。我留给您为负输入和零除做出明显的修改。您还可能希望尽早返回极端溢出和下溢的情况:通过查找下面的额外大或小值很容易检测到这些情况shift
。
from math import ldexp
def to_float(numerator, denominator):
"""
Convert numerator / denominator to float, correctly rounded.
For simplicity, assume both inputs are positive.
"""
# Shift satisfies 2**54 < (numerator / denominator) / 2**shift < 2**56
shift = numerator.bit_length() - denominator.bit_length() - 55
# Divide the fraction by 2**shift.
if shift >= 0:
denominator <<= shift
else:
numerator <<= -shift
# Convert to the nearest integer, using round-to-odd.
q, r = divmod(numerator, denominator)
if r != 0 and q % 2 == 0:
q += 1
# Now convert to the nearest float and shift back.
return ldexp(float(q), shift)
Run Code Online (Sandbox Code Playgroud)
一些注意事项:
bit_length
以正整数n
表示的方法给出表示所需的位数n
,换句话说1 + floor(log2(n))
。divmod
是一个Python函数,可同时计算整数除法的商和余数。q
(轻松地)适合64位整数numerator / denominator
整数转换为最接近的整数,另一次将该整数舍入为浮点数。第一轮使用“ 从零开始”的方法;这样可以确保第二轮(从int到float的转换是隐含的)得出的结果与将小数直接舍入为float的结果相同。ldexp
操作可能会引入第三次舍入。谨慎处理是可能的。参见下面的一些代码。上面实际上是Python在将一个(大)整数除以另一个以获得浮点结果时使用的算法的简化版本。您可以在此处查看源代码。long_true_divide
函数开头的注释概述了该方法。
为了完整起见,这是一个变体,也可以正确处理次等结果。
def to_float(numerator, denominator):
"""
Convert numerator / denominator to float, correctly rounded.
For simplicity, assume both inputs are positive.
"""
# Choose shift so that 2**54 < numerator / denominator / 2**shift < 2**56
shift = numerator.bit_length() - denominator.bit_length() - 55
# The 'treat_as_subnormal' flag catches all cases of subnormal results,
# along with some cases where the result is not subnormal but *is* still
# smaller than 2**-1021. In all these cases, it's sufficient to find the
# closest integer multiple of 2**-1074. We first round to the nearest
# multiple of 2**-1076 using round-to-odd.
treat_as_subnormal = shift < -1076
if treat_as_subnormal:
shift = -1076
# Divide the fraction by 2**shift.
if shift >= 0:
denominator <<= shift
else:
numerator <<= -shift
# Convert to the nearest integer, using round-to-odd.
q, r = divmod(numerator, denominator)
if r != 0 and q % 2 == 0:
q += 1
# Now convert to the nearest float and shift back.
if treat_as_subnormal:
# Round to the nearest multiple of 4, rounding ties to
# the nearest multiple of 8. This avoids double rounding
# from the ldexp call below.
q += [0, -1, -2, 1, 0, -1, 2, 1][q%8]
return ldexp(float(q), shift)
Run Code Online (Sandbox Code Playgroud)