How to count each digit in a range of integers?

Question

How to count each digit in a range of integers?

Car*_*rez 51 language-agnostic algorithm count clarion

Imagine you sell those metallic digits used to number houses, locker doors, hotel rooms, etc. You need to find how many of each digit to ship when your customer needs to number doors/houses:

1 to 100
51 to 300
1 to 2,000 with zeros to the left

The obvious solution is to do a loop from the first to the last number, convert the counter to a string with or without zeros to the left, extract each digit and use it as an index to increment an array of 10 integers.

我想知道是否有更好的方法来解决这个问题,而不必遍历整个整数范围.

欢迎使用任何语言或伪代码的解决方案.

编辑:

答案审查
约翰在CashCommons和韦恩·康拉德的评论,我目前的做法是好的,不够快.让我用一个愚蠢的比喻:如果你在不到1分钟内完成了在棋盘上计算方块的任务,你可以通过逐个计算方块来完成任务,但更好的解决方案是计算边和做一个乘法,因为你可能会被要求计算建筑物中的瓷砖.
Alex Reisner指出了一个非常有趣的数学定律,遗憾的是,它似乎与这个问题无关.
Andres建议我使用相同的算法,但是使用%10操作而不是子串提取数字.
约翰在CashCommons和phord建议预先计算所需的数字并将它们存储在查找表中,或者对于原始速度,存储数组.如果我们有一个绝对的,不可移动的,最大的整数值,这可能是一个很好的解决方案.我从未见过其中的一个.
高性能标记 和过滤器计算各种范围的所需数字.一毫米的结果似乎表明存在一定比例,但其他数字的结果显示不同的比例.
过滤器发现了一些公式,可用于计算数字的数字,这是10的幂. Robert Harvey在MathOverflow上发布了一个非常有趣的经历.其中一个数学家用数学符号写了一个解决方案.
Aaronaught使用数学开发并测试了一个解决方案.发布后,他回顾了源自Math Overflow的公式并发现了它的缺陷(指向Stackoverflow :).
noahlavine开发了一种算法并以伪代码形式呈现.

一个新的解决方案
在阅读完所有答案并进行一些实验后,我发现对于1到10 ⁿ -1 的整数范围:

为数字1至9中,n*10 ^(N-1)片,需要
对于数字0,如果不使用前导零,则需要n*10 ^n-1 - ((10 ⁿ -1)/ 9)
对于数字0,如果使用前导零,则需要n*10 ^n-1 -n

第一个公式是由过滤器(也可能是其他人)发现的,我通过反复试验找到了另外两个公式(但它们可能包含在其他答案中).

例如,如果n = 6,则范围是1到999,999:

对于数字1到9,我们需要每个6*10 ⁵ = 600,000
对于数字0,没有前导零,我们需要6*10 ⁵ - (10 ⁶ -1)/ 9 = 600,000 - 111,111 = 488,889
对于数字0,前导零,我们需要6*10 ⁵ - 6 = 599,994

可以使用高性能标记结果检查这些数字.

使用这些公式,我改进了原始算法.它仍然从整数范围内的第一个数字循环到最后一个数字,但是,如果它找到一个10的幂数,它使用公式来添加数字计数整数范围1到9的数量或1到99或1到999等.这是伪代码中的算法:

integer First,Last //First and last number in the range
integer Number     //Current number in the loop
integer Power      //Power is the n in 10^n in the formulas
integer Nines      //Nines is the resut of 10^n - 1, 10^5 - 1 = 99999
integer Prefix     //First digits in a number. For 14,200, prefix is 142
array 0..9  Digits //Will hold the count for all the digits

FOR Number = First TO Last
  CALL TallyDigitsForOneNumber WITH Number,1  //Tally the count of each digit 
                                              //in the number, increment by 1
  //Start of optimization. Comments are for Number = 1,000 and Last = 8,000.
  Power = Zeros at the end of number //For 1,000, Power = 3
  IF Power > 0                       //The number ends in 0 00 000 etc 
    Nines = 10^Power-1                 //Nines = 10^3 - 1 = 1000 - 1 = 999
    IF Number+Nines <= Last            //If 1,000+999 < 8,000, add a full set
      Digits[0-9] += Power*10^(Power-1)  //Add 3*10^(3-1) = 300 to digits 0 to 9
      Digits[0]   -= -Power              //Adjust digit 0 (leading zeros formula)
      Prefix = First digits of Number    //For 1000, prefix is 1
      CALL TallyDigitsForOneNumber WITH Prefix,Nines //Tally the count of each 
                                                     //digit in prefix,
                                                     //increment by 999
      Number += Nines                    //Increment the loop counter 999 cycles
    ENDIF
  ENDIF 
  //End of optimization
ENDFOR  

SUBROUTINE TallyDigitsForOneNumber PARAMS Number,Count
  REPEAT
    Digits [ Number % 10 ] += Count
    Number = Number / 10
  UNTIL Number = 0

例如,对于范围786到3,021,计数器将递增:

从786到790(5个周期)
从790到799(1个周期)
从799到800
99从800到899
从899到900
99从900到999
从999到1000
从1000到1999年999
从1999年到2000年1
从2000年到2999年999
从2999到3000
从3000到3010(10个周期)
从3010到3019(1个周期)
从3019到3021(2个周期)1

总计:28个周期没有优化:2,235个周期

请注意,此算法可在不带前导零的情况下解决问题.要使用前导零,我使用了黑客:

如果需要前导零的范围700到1,000,则使用10,700到11,000的算法,然后从数字1的计数中减去1,000 - 700 = 300.

基准和源代码

我测试了原始方法,使用%10的相同方法和针对某些大范围的新解决方案,具有以下结果:

Original             104.78 seconds
With %10              83.66
With Powers of Ten     0.07

基准应用程序的屏幕截图:
alt text http://clarion.sca.mx/images/stories/digitsbench.png

如果您想查看完整的源代码或运行基准测试,请使用以下链接:

完整的源代码(在Clarion中):http://sca.mx/ftp/countdigits.txt
Compilable项目和win32 exe:http://sca.mx/ftp/countdigits.zip

接受的答案

noahlavine解决方案可能是正确的,但我只是无法遵循伪代码,我认为有一些细节缺失或没有完全解释.

Aaronaught解决方案似乎是正确的,但代码太复杂了我的口味.

我接受了过滤器的回答,因为他的思路引导我开发出这个新的解决方案.

Answer 1

Aar*_*ght 10

对于像这样的问题,有一个明确的数学解决方案.让我们假设该值是零填充到最大位数(它不是,但我们稍后会补偿),并通过它推理:

从0到9,每个数字出现一次
从0到99,每个数字出现20次(位置1为10倍,位置2为10倍)
从0到1999,每个数字出现300次(P1为100x,P2为100x,P3为100x)

任何给定数字的明显模式,如果范围从0到10的幂,则为N*10 ^N-1,其中N是10的幂.

如果范围不是10的幂,该怎么办？从最低功率10开始,然后再进行操作.最简单的处理方法是最大值399.我们知道,对于100的每个倍数,每个数字至少出现20次,但我们必须补偿它出现在最高位数位置的次数,对于数字0-3,它将正好为100,对于所有其他数字,正好为零.具体而言,相关数字的额外增加量为10 ^N.

将其放入公式中,对于比10的幂的某个倍数(即399,6999等)小1的上限,它变为: M*N*10 ^N-1 + iif(d <= M,10 ^N,0)

现在你只需要处理余数(我们称之为R).以445为例.这是399的结果,加上400-445的范围.在此范围内,MSD出现R次,并且所有数字(包括MSD)也出现在它们与范围[0- R ] 相同的频率上.

现在我们只需要补偿前导零.这种模式很简单 - 它只是:

10 ^N + 10 ^N-1 + 10 ^N-2 + ... +**10 ⁰

更新: 此版本正确地考虑了"填充零",即处理余数时的中间位置的零([4 0 0,4 0 1,4 0 2,...]).找出填充零有点难看,但修改后的代码(C风格的伪代码)处理它:

function countdigits(int d, int low, int high) {
    return countdigits(d, low, high, false);
}

function countdigits(int d, int low, int high, bool inner) {
    if (high == 0)
        return (d == 0) ? 1 : 0;

    if (low > 0)
        return countdigits(d, 0, high) - countdigits(d, 0, low);

    int n = floor(log10(high));
    int m = floor((high + 1) / pow(10, n));
    int r = high - m * pow(10, n);
    return
        (max(m, 1) * n * pow(10, n-1)) +                             // (1)
        ((d < m) ? pow(10, n) : 0) +                                 // (2)
        (((r >= 0) && (n > 0)) ? countdigits(d, 0, r, true) : 0) +   // (3)
        (((r >= 0) && (d == m)) ? (r + 1) : 0) +                     // (4)
        (((r >= 0) && (d == 0)) ? countpaddingzeros(n, r) : 0) -     // (5)
        (((d == 0) && !inner) ? countleadingzeros(n) : 0);           // (6)
}

function countleadingzeros(int n) {
      int tmp= 0;
      do{
         tmp= pow(10, n)+tmp;
         --n;
         }while(n>0);
         return tmp;
         }

function countpaddingzeros(int n, int r) {
    return (r + 1) * max(0, n - max(0, floor(log10(r))) - 1);
}

Run Code Online (Sandbox Code Playgroud)

正如你所看到的,它有点丑陋,但它仍然在O(log n)时间运行,所以如果你需要处理数十亿的数字,这仍然会给你即时结果.:-)如果你在[0 - 1000000]范围内运行它,你会获得与高性能标记发布的完全相同的分布,所以我几乎肯定它是正确的.

仅供参考,inner变量的原因是前导零函数已经递归,所以它只能在第一次执行时计算countdigits.

更新2:如果代码难以阅读,这里是对countdigitsreturn语句的每一行含义的参考(我尝试了内联注释,但它们使代码更难阅读):

任意数字的频率,最高功率为10(0-99等)
MSD的频率高于10的最高功率倍数(100-399)
余数中任意数字的频率(400-445,R = 45)
剩余的MSD的额外频率
将剩余范围中的零点数放在中间位置(404,405 ......)
仅减去前导零(在最外面的循环上)

Answer 2

Noa*_*ine 8

我假设你想要一个数字在一个范围内的解决方案,你有一个起始和结束的数字.想象一下从起始编号开始并计算直到达到结束编号 - 它会起作用,但速度很慢.我认为快速算法的技巧是要意识到为了在10 ^ x位置上升一位并保持其他所有相同,你需要使用它之前的所有数字10 ^ x次加上所有数字0 -9 10 ^(x-1)次.(除非您的计数可能涉及超过第x位的进位 - 我在下面更正.)

这是一个例子.假设你从523到1004.

首先,你从523到524计数.它每次使用数字5,2和4.
其次,从524到604进行计数.最右边的数字在所有数字中进行6个循环,因此每个数字需要6个副本.第二个数字通过数字2到0,每个数字10次.第三个数字是6 5次和5 100-24次.
第三,从604到1004计数.最右边的数字执行40个周期,因此每个数字添加40个副本.右数字中的第二个数字执行4个周期,因此每个数字添加4个副本.最左边的数字分别为7,8和9中的100个,以及0中的5个和6中的100 - 5.最后一个数字是1 5次.

要加速最后一点,请查看最右边两个位置的部分.它使用每个数字10 + 1次.一般来说,1 + 10 + ... + 10 ^ n =(10 ^(n + 1) - 1)/ 9,我们可以用它来加速计数.

我的算法是从起始编号到结束编号进行计数(使用基数为10的计数),但使用上述事实可以快速完成.您遍历起始编号的数字从最小到最重要,并在每个地方计数,以便该数字与结束编号中的数字相同.在每个点,n是你到达一个进位之前需要做的向上计数的数量,以及之后你需要做的数量.

现在让我们假设伪代码计为一种语言.那么,这就是我要做的事情:

convert start and end numbers to digit arrays start[] and end[]
create an array counts[] with 10 elements which stores the number of copies of
     each digit that you need

iterate through start number from right to left. at the i-th digit,
    let d be the number of digits you must count up to get from this digit
        to the i-th digit in the ending number. (i.e. subtract the equivalent
        digits mod 10)
    add d * (10^i - 1)/9 to each entry in count.
    let m be the numerical value of all the digits to the right of this digit,
        n be 10^i - m.
    for each digit e from the left of the starting number up to and including the
        i-th digit, add n to the count for that digit.
    for j in 1 to d
        increment the i-th digit by one, including doing any carries
        for each digit e from the left of the starting number up to and including
            the i-th digit, add 10^i to the count for that digit
    for each digit e from the left of the starting number up to and including the
        i-th digit, add m to the count for that digit.
    set the i-th digit of the starting number to be the i-th digit of the ending
        number.

哦,由于i的值每次增加1,跟踪你的旧10 ^ i并将其乘以10得到新值,而不是每次取幂.

Answer 3

Hig*_*ark 6

这是一个非常糟糕的答案,我很惭愧发布它.我要求Mathematica计算所有数字中使用的数字,从1到1,000,000,没有前导0.这是我得到的:

Run Code Online (Sandbox Code Playgroud)

下次您在硬件商店中订购粘性数字进行销售时,按这些比例订购,您就不会错.

Answer 4

Rob*_*vey 5

我在Math Overflow上问了这个问题,然后因为问这么简单的问题而打屁股.其中一位网友对我表示同情并说如果我把它发布到"解决问题的艺术"中,他会回答它; 所以我做了.

以下是他发布的答案:http://www.artofproblemsolving.com/Forum/viewtopic.php？ p =
1741600#1741600

令人尴尬的是,我的数学不足以理解他发布的内容(这个家伙已经19岁......这太令人沮丧了).我真的需要参加一些数学课程.

从好的方面来看,这个等式是递归的,所以将它理解为具有几行代码的递归函数应该是一个简单的事情,由了解数学的人.

Answer 5

str*_*ner 5

要从数字中取出数字,我们只需要做一个代价高昂的字符串转换,如果我们不能做一个mod,数字最快可以推送一个这样的数字:

feed=number;
do
{ digit=feed%10;
  feed/=10; 
  //use digit... eg. digitTally[digit]++;
  }
while(feed>0)

Run Code Online (Sandbox Code Playgroud)

该循环应该非常快,并且可以放在开始到结束数字的循环内,以便最简单的方式来计算数字.

为了更快,对于更大的数字范围,我正在寻找一个优化的方法来计算从0到数字*10 ^意义的所有数字(从开始到结束bazzogles我)

这是一个表格,显示一些单个有效数字的数字.这些包括0,但不是最高值本身,这是一个疏忽,但它可能更容易看到模式(这里没有最高值数字)这些标签不包括尾随零,

  1 10 100 1000 10000 2 20 30 40 60 90 200 600 2000  6000

0 1 1  10  190  2890  1  2  3  4  6  9  30 110  490  1690
1 0 1  20  300  4000  1 12 13 14 16 19 140 220 1600  2800
2 0 1  20  300  4000  0  2 13 14 16 19  40 220  600  2800
3 0 1  20  300  4000  0  2  3 14 16 19  40 220  600  2800
4 0 1  20  300  4000  0  2  3  4 16 19  40 220  600  2800
5 0 1  20  300  4000  0  2  3  4 16 19  40 220  600  2800
6 0 1  20  300  4000  0  2  3  4  6 19  40 120  600  1800
7 0 1  20  300  4000  0  2  3  4  6 19  40 120  600  1800
8 0 1  20  300  4000  0  2  3  4  6 19  40 120  600  1800
9 0 1  20  300  4000  0  2  3  4  6  9  40 120  600  1800

Run Code Online (Sandbox Code Playgroud)

编辑:清理我的原创想法:

从暴力表中显示从0(包括)到poweroTen(notinc)的结果,可以看出是一个十大力量的主力:

increments tally[0 to 9] by md*tp*10^(tp-1)
increments tally[1 to md-1] by 10^tp
decrements tally[0] by (10^tp - 10) 
(to remove leading 0s if tp>leadingzeros)
can increment tally[moresignificantdigits] by self(md*10^tp) 
(to complete an effect)

Run Code Online (Sandbox Code Playgroud)

如果对每个有效数字应用这些计数调整,则应修改计数,就像从0到结束1计数一样

可以反转调整以删除前一个范围(起始编号)

感谢Aaronaught提供的完整且经过测试的答案.

归档时间：	15 年，9 月前
查看次数：	23043 次
最近记录：	6 年，4 月前