R返回所有因子的函数

JD *_*ong 32 r factorization

我正常的搜索foo让我失望.我正在尝试找到一个R函数,它返回整数的所有因子.至少有2个包含factorize()函数的包:gmp和conf.design,但这些函数只返回素因子.我想要一个能够返回所有因子的函数.

显然,搜索这个很困难,因为R有一个叫做因子的结构,它会在搜索中产生很多噪音.

Cha*_*ase 20

为了跟进我的评论(感谢@Ramnath我的错字),我的64位8演出机器上的强力方法似乎运行得相当好:

FUN <- function(x) {
    x <- as.integer(x)
    div <- seq_len(abs(x))
    factors <- div[x %% div == 0L]
    factors <- list(neg = -factors, pos = factors)
    return(factors)
}
Run Code Online (Sandbox Code Playgroud)

几个例子:

> FUN(100)
$neg
[1]   -1   -2   -4   -5  -10  -20  -25  -50 -100

$pos
[1]   1   2   4   5  10  20  25  50 100

> FUN(-42)
$neg
[1]  -1  -2  -3  -6  -7 -14 -21 -42

$pos
[1]  1  2  3  6  7 14 21 42

#and big number

> system.time(FUN(1e8))
   user  system elapsed 
   1.95    0.18    2.14 
Run Code Online (Sandbox Code Playgroud)

  • 使用整数:`div [x %% div == 0L]`和`x <-as.integer(x)`.应该快2倍. (8认同)
  • 如果您不需要输入本身,则可以使用`div < - seq_len(abs(x/2))`,这应该花费不到一半的时间. (5认同)
  • @GeorgeDontas - 无论如何,请随时建议一个适用于大数字的解决方案.我个人没有希望修改我的答案,以解决17位数字. (3认同)

Ric*_*ton 13

您可以从主要因素中获得所有因素.gmp很快计算这些.

library(gmp)
library(plyr)

get_all_factors <- function(n)
{
  prime_factor_tables <- lapply(
    setNames(n, n), 
    function(i)
    {
      if(i == 1) return(data.frame(x = 1L, freq = 1L))
      plyr::count(as.integer(gmp::factorize(i)))
    }
  )
  lapply(
    prime_factor_tables, 
    function(pft)
    {
      powers <- plyr::alply(pft, 1, function(row) row$x ^ seq.int(0L, row$freq))
      power_grid <- do.call(expand.grid, powers)
      sort(unique(apply(power_grid, 1, prod)))
    }
  )
}

get_all_factors(c(1, 7, 60, 663, 2520, 75600, 15876000, 174636000, 403409160000))
Run Code Online (Sandbox Code Playgroud)


Jos*_*ood 8

Update

The algorithm has been fully updated and now implements multiple polynomials as well as some clever sieving techniques that eliminates millions of checks. In addition to the original links, this paper along with this post from primo were very helpful for this last stage (many kudos to primo). Primo does a great job of explaining the guts of the QS in a relatively short space and also wrote a pretty amazing algorithm (it will factor the number at the bottom, 38! + 1, in under 2 secs!! Insane!!).

As promised, below is my humble R implementation of the Quadratic Sieve. I have been working on this algorithm sporadically since I promised it in late January. I will not try to explain it fully (unless requested... also, the links below do a very good job) as it is very complicated and hopefully, my function names speak for themselves. This has proved to be one of the most challenging algorithms I have ever attempted to execute as it is demanding both from a programmer's point of view as well as mathematically. I have read countless papers and ultimately, I found these five to be the most helpful (QSieve1, QSieve2, QSieve3, QSieve4, QSieve5).

N.B. This algorithm, as it stands, does not serve very well as a general prime factorization algorithm. If it was optimized further, it would need to be accompanied by a section of code that factors out smaller primes (i.e. less than 10^5 as suggested by this post), then call QuadSieveAll, check to see if these are primes, and if not, call QuadSieveAll on both of these factors, etc. until you are left with all primes (all of these steps are not that difficult). However, the main point of this post is to highlight the heart of the Quadratic Sieve, so the examples below are all semiprimes (even though it will factor most odd numbers not containing a square… Also, I haven’t seen an example of the QS that didn’t demonstrate a non-semiprime). I know the OP was looking for a method to return all factors and not the prime factorization, but this algorithm (if optimized further) coupled with one of the algorithms above would be a force to reckon with as a general factoring algorithm (especially given that the OP was needing something for Project Euler,这通常需要比蛮力方法更多的东西).顺便说一句,这个MyIntToBit函数是这个答案的一个变体,而且PrimeSieve来自Dontas先生不久前出现的一个帖子(同样也是赞誉).

QuadSieveMultiPolysAll <- function(MyN, fudge1=0L, fudge2=0L, LenB=0L) {
### 'MyN' is the number to be factored; 'fudge1' is an arbitrary number
### that is used to determine the size of your prime base for sieving;
### 'fudge2' is used to set a threshold for sieving;
### 'LenB' is a the size of the sieving interval. The last three
### arguments are optional (they are determined based off of the
### size of MyN if left blank)

### The first 8 functions are helper functions

    PrimeSieve <- function(n) {
        n <- as.integer(n)
        if (n > 1e9) stop("n too large")
        primes <- rep(TRUE, n)
        primes[1] <- FALSE
        last.prime <- 2L
        fsqr <- floor(sqrt(n))
        while (last.prime <= fsqr) {
            primes[seq.int(last.prime^2, n, last.prime)] <- FALSE
            sel <- which(primes[(last.prime + 1):(fsqr + 1)])
            if (any(sel)) {
                last.prime <- last.prime + min(sel)
            } else {
                last.prime <- fsqr + 1
            }
        }
        MyPs <- which(primes)
        rm(primes)
        gc()
        MyPs
    }

    MyIntToBit <- function(x, dig) {
        i <- 0L
        string <- numeric(dig)
        while (x > 0) {
            string[dig - i] <- x %% 2L
            x <- x %/% 2L
            i <- i + 1L
        }
        string
    }

    ExpBySquaringBig <- function(x, n, p) {
        if (n == 1) {
            MyAns <- mod.bigz(x,p)
        } else if (mod.bigz(n,2)==0) {
            MyAns <- ExpBySquaringBig(mod.bigz(pow.bigz(x,2),p),div.bigz(n,2),p)
        } else {
            MyAns <- mod.bigz(mul.bigz(x,ExpBySquaringBig(mod.bigz(
                pow.bigz(x,2),p), div.bigz(sub.bigz(n,1),2),p)),p)
        }
        MyAns
    }

    TonelliShanks <- function(a,p) {
        P1 <- sub.bigz(p,1); j <- 0L; s <- P1
        while (mod.bigz(s,2)==0L) {s <- s/2; j <- j+1L}
        if (j==1L) {
            MyAns1 <- ExpBySquaringBig(a,(p+1L)/4,p)
            MyAns2 <- mod.bigz(-1 * ExpBySquaringBig(a,(p+1L)/4,p),p)
        } else {
            n <- 2L
            Legendre2 <- ExpBySquaringBig(n,P1/2,p)
            while (Legendre2==1L) {n <- n+1L; Legendre2 <- ExpBySquaringBig(n,P1/2,p)}
            x <- ExpBySquaringBig(a,(s+1L)/2,p)
            b <- ExpBySquaringBig(a,s,p)
            g <- ExpBySquaringBig(n,s,p)
            r <- j; m <- 1L
            Test <- mod.bigz(b,p)
            while (!(Test==1L) && !(m==0L)) {
                m <- 0L
                Test <- mod.bigz(b,p)
                while (!(Test==1L)) {m <- m+1L; Test <- ExpBySquaringBig(b,pow.bigz(2,m),p)}
                if (!m==0) {
                    x <- mod.bigz(x * ExpBySquaringBig(g,pow.bigz(2,r-m-1L),p),p)
                    g <- ExpBySquaringBig(g,pow.bigz(2,r-m),p)
                    b <- mod.bigz(b*g,p); r <- m
                }; Test <- 0L
            }; MyAns1 <- x; MyAns2 <- mod.bigz(p-x,p)
        }
        c(MyAns1, MyAns2)
    }

    SieveLists <- function(facLim, FBase, vecLen, sieveD, MInt) {
        vLen <- ceiling(vecLen/2); SecondHalf <- (vLen+1L):vecLen
        MInt1 <- MInt[1:vLen]; MInt2 <- MInt[SecondHalf]
        tl <- vector("list",length=facLim)

        for (m in 3:facLim) {
            st1 <- mod.bigz(MInt1[1],FBase[m])
            m1 <- 1L+as.integer(mod.bigz(sieveD[[m]][1] - st1,FBase[m]))
            m2 <- 1L+as.integer(mod.bigz(sieveD[[m]][2] - st1,FBase[m]))
            sl1 <- seq.int(m1,vLen,FBase[m])
            sl2 <- seq.int(m2,vLen,FBase[m])
            tl1 <- list(sl1,sl2)
            st2 <- mod.bigz(MInt2[1],FBase[m])
            m3 <- vLen+1L+as.integer(mod.bigz(sieveD[[m]][1] - st2,FBase[m]))
            m4 <- vLen+1L+as.integer(mod.bigz(sieveD[[m]][2] - st2,FBase[m]))
            sl3 <- seq.int(m3,vecLen,FBase[m])
            sl4 <- seq.int(m4,vecLen,FBase[m])
            tl2 <- list(sl3,sl4)
            tl[[m]] <- list(tl1,tl2)
        }
        tl
    }

    SieverMod <- function(facLim, FBase, vecLen, SD, MInt, FList, LogFB, Lim, myCol) {

        MyLogs <- rep(0,nrow(SD))

        for (m in 3:facLim) {
            MyBool <- rep(FALSE,vecLen)
            MyBool[c(FList[[m]][[1]][[1]],FList[[m]][[2]][[1]])] <- TRUE
            MyBool[c(FList[[m]][[1]][[2]],FList[[m]][[2]][[2]])] <- TRUE
            temp <- which(MyBool)
            MyLogs[temp] <- MyLogs[temp] + LogFB[m]
        }

        MySieve <- which(MyLogs > Lim)
        MInt <- MInt[MySieve]; NewSD <- SD[MySieve,]
        newLen <- length(MySieve); GoForIT <- FALSE

        MyMat <- matrix(integer(0),nrow=newLen,ncol=myCol)
        MyMat[which(NewSD[,1L] < 0),1L] <- 1L; MyMat[which(NewSD[,1L] > 0),1L] <- 0L
        if ((myCol-1L) - (facLim+1L) > 0L) {MyMat[,((facLim+2L):(myCol-1L))] <- 0L}
        if (newLen==1L) {MyMat <- matrix(MyMat,nrow=1,byrow=TRUE)}

        if (newLen > 0L) {
            GoForIT <- TRUE
            for (m in 1:facLim) {
                vec <- rep(0L,newLen)
                temp <- which((NewSD[,1L]%%FBase[m])==0L)
                NewSD[temp,] <- NewSD[temp,]/FBase[m]; vec[temp] <- 1L
                test <- temp[which((NewSD[temp,]%%FBase[m])==0L)]
                while (length(test)>0L) {
                    NewSD[test,] <- NewSD[test,]/FBase[m]
                    vec[test] <- (vec[test]+1L)
                    test <- test[which((NewSD[test,]%%FBase[m])==0L)]
                }
                MyMat[,m+1L] <- vec
            }
        }

        list(MyMat,NewSD,MInt,GoForIT)
    }

    reduceMatrix <- function(mat) {
        tempMin <- 0L; n1 <- ncol(mat); n2 <- nrow(mat)
        mymax <- 1L
        for (i in 1:n1) {
            temp <- which(mat[,i]==1L)
            t <- which(temp >= mymax)
            if (length(temp)>0L && length(t)>0L) {
                MyMin <- min(temp[t])
                if (!(MyMin==mymax)) {
                    vec <- mat[MyMin,]
                    mat[MyMin,] <- mat[mymax,]
                    mat[mymax,] <- vec
                }
                t <- t[-1]; temp <- temp[t]
                for (j in temp) {mat[j,] <- (mat[j,]+mat[mymax,])%%2L}
                mymax <- mymax+1L
            }
        }

        if (mymax<n2) {simpMat <- mat[-(mymax:n2),]} else {simpMat <- mat}
        lenSimp <- nrow(simpMat)
        if (is.null(lenSimp)) {lenSimp <- 0L}
        mycols <- 1:n1

        if (lenSimp>1L) {
            ## "Diagonalizing" Matrix
            for (i in 1:lenSimp) {
                if (all(simpMat[i,]==0L)) {simpMat <- simpMat[-i,]; next}
                if (!simpMat[i,i]==1L) {
                    t <- min(which(simpMat[i,]==1L))
                    vec <- simpMat[,i]; tempCol <- mycols[i]
                    simpMat[,i] <- simpMat[,t]; mycols[i] <- mycols[t]
                    simpMat[,t] <- vec; mycols[t] <- tempCol
                }
            }

            lenSimp <- nrow(simpMat); MyList <- vector("list",length=n1)
            MyFree <- mycols[which((1:n1)>lenSimp)];  for (i in MyFree) {MyList[[i]] <- i}
            if (is.null(lenSimp)) {lenSimp <- 0L}

            if (lenSimp>1L) {
                for (i in lenSimp:1L) {
                    t <- which(simpMat[i,]==1L)
                    if (length(t)==1L) {
                        simpMat[ ,t] <- 0L
                        MyList[[mycols[i]]] <- 0L
                    } else {
                        t1 <- t[t>i]
                        if (all(t1 > lenSimp)) {
                            MyList[[mycols[i]]] <- MyList[[mycols[t1[1]]]]
                            if (length(t1)>1) {
                                for (j in 2:length(t1)) {MyList[[mycols[i]]] <- c(MyList[[mycols[i]]], MyList[[mycols[t1[j]]]])}
                            }
                        }
                        else {
                            for (j in t1) {
                                if (length(MyList[[mycols[i]]])==0L) {MyList[[mycols[i]]] <- MyList[[mycols[j]]]}
                                else {
                                    e1 <- which(MyList[[mycols[i]]]%in%MyList[[mycols[j]]])
                                    if (length(e1)==0) {
                                        MyList[[mycols[i]]] <- c(MyList[[mycols[i]]],MyList[[mycols[j]]])
                                    } else {
                                        e2 <- which(!MyList[[mycols[j]]]%in%MyList[[mycols[i]]])
                                        MyList[[mycols[i]]] <- MyList[[mycols[i]]][-e1]
                                        if (length(e2)>0L) {MyList[[mycols[i]]] <- c(MyList[[mycols[i]]], MyList[[mycols[j]]][e2])}
                                    }
                                }
                            }
                        }
                    }
                }
                TheList <- lapply(MyList, function(x) {if (length(x)==0L) {0} else {x}})
                list(TheList,MyFree)
            } else {
                list(NULL,NULL)
            }
        } else {
            list(NULL,NULL)
        }
    }

    GetFacs <- function(vec1, vec2, n) {
        x <- mod.bigz(prod.bigz(vec1),n)
        y <- mod.bigz(prod.bigz(vec2),n)
        MyAns <- c(gcd.bigz(x-y,n),gcd.bigz(x+y,n))
        MyAns[sort.list(asNumeric(MyAns))]
    }

    SolutionSearch <- function(mymat, M2, n, FB) {

        colTest <- which(apply(mymat, 2, sum) == 0)
        if (length(colTest) > 0) {solmat <- mymat[ ,-colTest]} else {solmat <- mymat}

        if (length(nrow(solmat)) > 0) {
            nullMat <- reduceMatrix(t(solmat %% 2L))
            listSol <- nullMat[[1]]; freeVar <- nullMat[[2]]; LF <- length(freeVar)
        } else {LF <- 0L}

        if (LF > 0L) {
            for (i in 2:min(10^8,(2^LF + 1L))) {
                PosAns <- MyIntToBit(i, LF)
                posVec <- sapply(listSol, function(x) {
                    t <- which(freeVar %in% x)
                    if (length(t)==0L) {
                        0
                    } else {
                        sum(PosAns[t])%%2L
                    }
                })
                ansVec <- which(posVec==1L)
                if (length(ansVec)>0) {

                    if (length(ansVec) > 1L) {
                        myY <- apply(mymat[ansVec,],2,sum)
                    } else {
                        myY <- mymat[ansVec,]
                    }

                    if (sum(myY %% 2) < 1) {
                        myY <- as.integer(myY/2)
                        myY <- pow.bigz(FB,myY[-1])
                        temp <- GetFacs(M2[ansVec], myY, n)
                        if (!(1==temp[1]) && !(1==temp[2])) {
                            return(temp)
                        }
                    }
                }
            }
        }
    }

### Below is the main portion of the Quadratic Sieve

    BegTime <- Sys.time(); MyNum <- as.bigz(MyN); DigCount <- nchar(as.character(MyN))
    P <- PrimeSieve(10^5)
    SqrtInt <- .mpfr2bigz(trunc(sqrt(mpfr(MyNum,sizeinbase(MyNum,b=2)+5L))))

    if (DigCount < 24) {
        DigSize <- c(4,10,15,20,23)
        f_Pos <- c(0.5,0.25,0.15,0.1,0.05)
        MSize <- c(5000,7000,10000,12500,15000)

        if (fudge1==0L) {
            LM1 <- lm(f_Pos ~ DigSize)
            m1 <- summary(LM1)$coefficients[2,1]
            b1 <- summary(LM1)$coefficients[1,1]
            fudge1 <- DigCount*m1 + b1
        }

        if (LenB==0L) {
            LM2 <- lm(MSize ~ DigSize)
            m2 <- summary(LM2)$coefficients[2,1]
            b2 <- summary(LM2)$coefficients[1,1]
            LenB <- ceiling(DigCount*m2 + b2)
        }

        LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
        B <- P[P<=LimB]; B <- B[-1]
        facBase <- P[which(sapply(B, function(x) ExpBySquaringBig(MyNum,(x-1)/2,x)==1L))+1L]
        LenFBase <- length(facBase)+1L
    } else if (DigCount < 67) {
        ## These values were obtained from "The Multiple Polynomial
        ## Quadratic Sieve" by Robert D. Silverman
        DigSize <- c(24,30,36,42,48,54,60,66)
        FBSize <- c(100,200,400,900,1200,2000,3000,4500)
        MSize <- c(5,25,25,50,100,250,350,500)

        LM1 <- loess(FBSize ~ DigSize)
        LM2 <- loess(MSize ~ DigSize)

        if (fudge1==0L) {
            fudge1 <- -0.4
            LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
            myTarget <- ceiling(predict(LM1, DigCount))

            while (LimB < myTarget) {
                LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
                fudge1 <- fudge1+0.001
            }
            B <- P[P<=LimB]; B <- B[-1]
            facBase <- P[which(sapply(B, function(x) ExpBySquaringBig(MyNum,(x-1)/2,x)==1L))+1L]
            LenFBase <- length(facBase)+1L

            while (LenFBase < myTarget) {
                fudge1 <- fudge1+0.005
                LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
                myind <- which(P==max(B))+1L
                myset <- tempP <- P[myind]
                while (tempP < LimB) {
                    myind <- myind + 1L
                    tempP <- P[myind]
                    myset <- c(myset, tempP)
                }

                for (p in myset) {
                    t <- ExpBySquaringBig(MyNum,(p-1)/2,p)==1L
                    if (t) {facBase <- c(facBase,p)}
                }
                B <- c(B, myset)
                LenFBase <- length(facBase)+1L
            }
        } else {
            LimB <- trunc(exp((.5+fudge1)*sqrt(log(MyNum)*log(log(MyNum)))))
            B <- P[P<=LimB]; B <- B[-1]
            facBase <- P[which(sapply(B, function(x) ExpBySquaringBig(MyNum,(x-1)/2,x)==1L))+1L]
            LenFBase <- length(facBase)+1L
        }
        if (LenB==0L) {LenB <- 1000*ceiling(predict(LM2, DigCount))}
    } else {
        return("The number you've entered is currently too big for this algorithm!!")
    }

    SieveDist <- lapply(facBase, function(x) TonelliShanks(MyNum,x))
    SieveDist <- c(1L,SieveDist); SieveDist[[1]] <- c(SieveDist[[1]],1L); facBase <- c(2L,facBase)
    Lower <- -LenB; Upper <- LenB; LenB2 <- 2*LenB+1L; MyInterval <- Lower:Upper
    M <- MyInterval + SqrtInt ## Set that will be tested
    SqrDiff <- matrix(sub.bigz(pow.bigz(M,2),MyNum),nrow=length(M),ncol=1L)
    maxM <- max(MyInterval)
    LnFB <- log(facBase)

    ## N.B. primo uses 0.735, as his siever
    ## is more efficient than the one employed here
    if (fudge2==0L) {
        if (DigCount < 8) {
            fudge2 <- 0
        } else if (DigCount < 12) {
            fudge2 <- .7
        } else if (DigCount < 20) {
            fudge2 <- 1.3
        } else {
            fudge2 <- 1.6
        }
    }

    TheCut <- log10(maxM*sqrt(2*asNumeric(MyNum)))*fudge2
    myPrimes <- as.bigz(facBase)

    CoolList <- SieveLists(LenFBase, facBase, LenB2, SieveDist, MyInterval)
    GetMatrix <- SieverMod(LenFBase, facBase, LenB2, SqrDiff, M, CoolList, LnFB, TheCut, LenFBase+1L)

    if (GetMatrix[[4]]) {
        newmat <- GetMatrix[[1]]; NewSD <- GetMatrix[[2]]; M <- GetMatrix[[3]]
        NonSplitFacs <- which(abs(NewSD[,1L])>1L)
        newmat <- newmat[-NonSplitFacs, ]
        M <- M[-NonSplitFacs]
        lenM <- length(M)

        if (class(newmat) == "matrix") {
            if (nrow(newmat) > 0) {
                PosAns <- SolutionSearch(newmat,M,MyNum,myPrimes)
            } else {
                PosAns <- vector()
            }
        } else {
            newmat <- matrix(newmat, nrow = 1)
            PosAns <- vector()
        }
    } else {
        newmat <- matrix(integer(0),ncol=(LenFBase+1L))
        PosAns <- vector()
    }

    Atemp <- .mpfr2bigz(trunc(sqrt(sqrt(mpfr(2*MyNum))/maxM)))
    if (Atemp < max(facBase)) {Atemp <- max(facBase)}; myPoly <- 0L

    while (length(PosAns)==0L) {LegTest <- TRUE
        while (LegTest) {
            Atemp <- nextprime(Atemp)
            Legendre <- asNumeric(ExpBySquaringBig(MyNum,(Atemp-1L)/2,Atemp))
            if (Legendre == 1) {LegTest <- FALSE}
        }

        A <- Atemp^2
        Btemp <- max(TonelliShanks(MyNum, Atemp))
        B2 <- (Btemp + (MyNum - Btemp^2) * inv.bigz(2*Btemp,Atemp))%%A
        C <- as.bigz((B2^2 - MyNum)/A)
        myPoly <- myPoly + 1L

        polySieveD <- lapply(1:LenFBase, function(x) {
            AInv <- inv.bigz(A,facBase[x])
            asNumeric(c(((SieveDist[[x]][1]-B2)*AInv)%%facBase[x],
                        ((SieveDist[[x]][2]-B2)*AInv)%%facBase[x]))
        })

        M1 <- A*MyInterval + B2
        SqrDiff <- matrix(A*pow.bigz(MyInterval,2) + 2*B2*MyInterval + C,nrow=length(M1),ncol=1L)
        CoolList <- SieveLists(LenFBase, facBase, LenB2, polySieveD, MyInterval)
        myPrimes <- c(myPrimes,Atemp)
        LenP <- length(myPrimes)
        GetMatrix <- SieverMod(LenFBase, facBase, LenB2, SqrDiff, M1, CoolList, LnFB, TheCut, LenP+1L)

        if (GetMatrix[[4]]) {
            n2mat <- GetMatrix[[1]]; N2SD <- GetMatrix[[2]]; M1 <- GetMatrix[[3]]
            n2mat[,LenP+1L] <- rep(2L,nrow(N2SD))
            if (length(N2SD) > 0) {NonSplitFacs <- which(abs(N2SD[,1L])>1L)} else {NonSplitFacs <- LenB2}
            if (length(NonSplitFacs)<2*LenB) {
                M1 <- M1[-NonSplitFacs]; lenM1 <- length(M1)
                n2mat <- n2mat[-NonSplitFacs,]
                if (lenM1==1L) {n2mat <- matrix(n2mat,nrow=1)}
                if (ncol(newmat) < (LenP+1L)) {
                    numCol <- (LenP + 1L) - ncol(newmat)
                    newmat <-     cbind(newmat,matrix(rep(0L,numCol*nrow(newmat)),ncol=numCol))
                }
                newmat <- rbind(newmat,n2mat); lenM <- lenM+lenM1; M <- c(M,M1)
                if (class(newmat) == "matrix") {
                    if (nrow(newmat) > 0) {
                        PosAns <- SolutionSearch(newmat,M,MyNum,myPrimes)
                    }
                }
            }
        }
    }

    EndTime <- Sys.time()
    TotTime <- EndTime - BegTime
    print(format(TotTime))
    return(PosAns)
}
Run Code Online (Sandbox Code Playgroud)

使用旧QS算法

> library(gmp)
> library(Rmpfr)

> n3 <- prod(nextprime(urand.bigz(2, 40, 17)))
> system.time(t5 <- QuadSieveAll(n3,0.1,myps))
  user  system elapsed 
164.72    0.77  165.63 
> system.time(t6 <- factorize(n3))
user  system elapsed 
0.1     0.0     0.1 
> all(t5[sort.list(asNumeric(t5))]==t6[sort.list(asNumeric(t6))])
[1] TRUE
Run Code Online (Sandbox Code Playgroud)

使用New Muli-Polynomial QS算法

> QuadSieveMultiPolysAll(n3)
[1] "4.952 secs"
Big Integer ('bigz') object of length 2:
[1] 342086446909 483830424611

> n4 <- prod(nextprime(urand.bigz(2,50,5)))
> QuadSieveMultiPolysAll(n4)   ## With old algo, it took over 4 hours
[1] "1.131717 mins"
Big Integer ('bigz') object of length 2:
[1] 166543958545561 880194119571287

> n5 <- as.bigz("94968915845307373740134800567566911")   ## 35 digits
> QuadSieveMultiPolysAll(n5)
[1] "3.813167 mins"
Big Integer ('bigz') object of length 2:
[1] 216366620575959221 438925910071081891

> system.time(factorize(n5))   ## It appears we are reaching the limits of factorize
   user  system elapsed 
 131.97    0.00  131.98
Run Code Online (Sandbox Code Playgroud)

旁注:上面的数字n5是一个非常有趣的数字.检查出来这里

突破点!!!!

> n6 <- factorialZ(38) + 1L   ## 45 digits
> QuadSieveMultiPolysAll(n6)
[1] "22.79092 mins"
Big Integer ('bigz') object of length 2:
[1] 14029308060317546154181 37280713718589679646221

> system.time(factorize(n6))   ## Shut it down after 2 days of running
Run Code Online (Sandbox Code Playgroud)

最新的胜利(50位数)

> n9 <- prod(nextprime(urand.bigz(2,82,42)))
> QuadSieveMultiPolysAll(n9)
[1] "12.9297 hours"
Big Integer ('bigz') object of length 2:
[1] 2128750292720207278230259 4721136619794898059404993

## Based off of some crude test, factorize(n9) would take more than a year.
Run Code Online (Sandbox Code Playgroud)

应该注意的是,QS通常在较小数字上的性能不如Pollard的rho算法,并且随着数字变大,QS的功率开始变得明显.任何人,一如既往,我希望这会激励某人!干杯!


Jos*_*ood 7

主要更新

下面是我最新的R分解算法.它的速度更快,并向rle功能致敬.

算法3(更新)

library(gmp)
MyFactors <- function(MyN) {
    myRle <- function (x1) {
        n1 <- length(x1)
        y1 <- x1[-1L] != x1[-n1]
        i <- c(which(y1), n1)
        list(lengths = diff(c(0L, i)), values = x1[i], uni = sum(y1)+1L)
    }

    if (MyN==1L) return(MyN)
    else {
        pfacs <- myRle(factorize(MyN))
        unip <- pfacs$values
        pv <- pfacs$lengths
        n <- pfacs$uni
        myf <- unip[1L]^(0L:pv[1L])
        if (n > 1L) {
            for (j in 2L:n) {
                myf <- c(myf, do.call(c,lapply(unip[j]^(1L:pv[j]), function(x) x*myf)))
            }
        }
    }
    myf[order(asNumeric(myf))]  ## 'order' is faster than 'sort.list'
}
Run Code Online (Sandbox Code Playgroud)

以下是新基准(As Dirk Eddelbuettel 在这里说," 不能与经验争论."):

案例1(大素因子)

set.seed(100)
myList <- lapply(1:10^3, function(x) sample(10^6, 10^5))
benchmark(SortList=lapply(myList, function(x) sort.list(x)),
            OrderFun=lapply(myList, function(x) order(x)),
            replications=3,
            columns = c("test", "replications", "elapsed", "relative"))
      test replications elapsed relative
2 OrderFun            3   59.41    1.000
1 SortList            3   61.52    1.036

## The times are limited by "gmp::factorize" and since it relies on
## pseudo-random numbers, the times can vary (i.e. one pseudo random
## number may lead to a factorization faster than others). With this
## in mind, any differences less than a half of second
## (or so) should be viewed as the same. 
x <- pow.bigz(2,256)+1
system.time(z1 <- MyFactors(x))
user  system elapsed
14.94    0.00   14.94
system.time(z2 <- all_divisors(x))      ## system.time(factorize(x))
user  system elapsed                    ##  user  system elapsed
14.94    0.00   14.96                   ## 14.94    0.00   14.94 
all(z1==z2)
[1] TRUE

x <- as.bigz("12345678987654321321")
system.time(x1 <- MyFactors(x^2))
user  system elapsed 
20.66    0.02   20.71
system.time(x2 <- all_divisors(x^2))    ## system.time(factorize(x^2))
user  system elapsed                    ##  user  system elapsed
20.69    0.00   20.69                   ## 20.67    0.00   20.67
all(x1==x2)
[1] TRUE
Run Code Online (Sandbox Code Playgroud)

案例2(较小的数字)

set.seed(199)
samp <- sample(10^9, 10^5)
benchmark(JosephDivs=sapply(samp, MyFactors),
            DontasDivs=sapply(samp, all_divisors),
            OldDontas=sapply(samp, Oldall_divisors),
            replications=10,
            columns = c("test", "replications", "elapsed", "relative"),
            order = "relative")
        test replications elapsed relative
1 JosephDivs           10  470.31    1.000
2 DontasDivs           10  567.10    1.206  ## with vapply(..., USE.NAMES = FALSE)
3  OldDontas           10  626.19    1.331  ## with sapply
Run Code Online (Sandbox Code Playgroud)

案例3(完全彻底)

set.seed(97)
samp <- sample(10^6, 10^4)
benchmark(JosephDivs=sapply(samp, MyFactors),
            DontasDivs=sapply(samp, all_divisors),
            CottonDivs=sapply(samp, get_all_factors),
            ChaseDivs=sapply(samp, FUN),
            replications=5,
            columns = c("test", "replications", "elapsed", "relative"),
            order = "relative")
        test replications elapsed relative
1 JosephDivs            5   22.68    1.000
2 DontasDivs            5   27.66    1.220
3 CottonDivs            5  126.66    5.585
4  ChaseDivs            5  554.25   24.438
Run Code Online (Sandbox Code Playgroud)


原帖

Cotton先生的算法是一个非常好的R实现.蛮力方法只会让你到目前为止并且失败的数量很大(它也非常慢).我提供了三种能满足不同需求的算法.第一个(我在1月15日发布并且稍微更新的原始算法)是一个独立的分解算法,它提供了一种高效,准确且可以轻松翻译成其他语言的组合方法.当您需要快速分解数千个数字时,第二个算法更像是一个非常快速且非常有用的筛子.第三个是一个简短的(上面发布的)但功能强大的独立算法,对于任何小于2 ^ 70的数字都是优越的(我几乎从原始代码中删除了所有内容).我从Richie Cotton的使用中汲取灵感plyr::count函数(它激发了我编写自己的rle函数,具有非常相似的回报plyr::count),George Dontas处理琐碎案例(即if (n==1) return(1))的简洁方法,以及Zelazny7提供的关于bigz向量的问题的解决方案.

算法1(原创)

library(gmp)
factor2 <- function(MyN) {
    if (MyN == 1) return(1L)
    else {
        max_p_div <- factorize(MyN)
        prime_vec <- max_p_div <- max_p_div[sort.list(asNumeric(max_p_div))]
        my_factors <- powers <- as.bigz(vector())
        uni_p <- unique(prime_vec); maxp <- max(prime_vec)
        for (i in 1:length(uni_p)) {
            temp_size <- length(which(prime_vec == uni_p[i]))
            powers <- c(powers, pow.bigz(uni_p[i], 1:temp_size))
        }
        my_factors <- c(as.bigz(1L), my_factors, powers)
        temp_facs <- powers; r <- 2L
        temp_facs2 <- max_p_div2 <- as.bigz(vector())
        while (r <= length(uni_p)) {
            for (i in 1:length(temp_facs)) {
                a <- which(prime_vec >  max_p_div[i])
                temp <- mul.bigz(temp_facs[i], powers[a])
                temp_facs2 <- c(temp_facs2, temp)
                max_p_div2 <- c(max_p_div2, prime_vec[a])
            }
            my_sort <- sort.list(asNumeric(max_p_div2))
            temp_facs <- temp_facs2[my_sort]
            max_p_div <- max_p_div2[my_sort]
            my_factors <- c(my_factors, temp_facs)
            temp_facs2 <- max_p_div2 <- as.bigz(vector()); r <- r+1L
        }
    }
    my_factors[sort.list(asNumeric(my_factors))]
}
Run Code Online (Sandbox Code Playgroud)

算法2(筛子)

EfficientFactorList <- function(n) {
    MyFactsList <- lapply(1:n, function(x) 1)
    for (j in 2:n) {
        for (r in seq.int(j, n, j)) {MyFactsList[[r]] <- c(MyFactsList[[r]], j)}
    }; MyFactsList}
Run Code Online (Sandbox Code Playgroud)

它在不到2秒的时间内将每个数字的因子分解为1到100,000.为了让您了解此算法的效率,使用强力方法计算1到100,000的时间需要近3分钟.

system.time(t1 <- EfficientFactorList(10^5))
user  system elapsed 
1.04    0.00    1.05 
system.time(t2 <- sapply(1:10^5, MyFactors))
user  system elapsed 
39.21    0.00   39.23 
system.time(t3 <- sapply(1:10^5, all_divisors))
user  system elapsed 
49.03    0.02   49.05

TheTest <- sapply(1:10^5, function(x) all(t2[[x]]==t3[[x]]) && all(asNumeric(t2[[x]])==t1[[x]]) && all(asNumeric(t3[[x]])==t1[[x]]))
all(TheTest)
[1] TRUE
Run Code Online (Sandbox Code Playgroud)



最后的想法

Dontas先生关于分解大数字的原始评论让我思考,真正非常大的数字......比如大于2 ^ 200的数字.您将看到在此页面上选择的算法,它们都需要很长时间,因为大多数算法都依赖于gmp::factorize使用Pollard-Rho算法.从这个问题来看,该算法仅适用于小于2 ^ 70的数字.我目前正在研究自己的factorize算法,它将实现Quadratic Sieve,它应该将所有这些算法提升到一个新的水平.


Geo*_*tas 7

即使在非常大的数字(应该作为字符串传递)的情况下,以下方法也能提供正确的结果.它真的很快.

# TEST
# x <- as.bigz("12345678987654321")
# all_divisors(x)
# all_divisors(x*x)

# x <- pow.bigz(2,89)-1
# all_divisors(x)

library(gmp)
  options(scipen =30)

  sort_listz <- function(z) {
  #==========================
    z <- z[order(as.numeric(z))] # sort(z)
  } # function  sort_listz  


  mult_listz <- function(x,y) {
   do.call('c', lapply(y, function(i) i*x)) 
  } 


  all_divisors <- function(x) {
  #==========================  
  if (abs(x)<=1) return(x) 
  else {

    factorsz <- as.bigz(factorize(as.bigz(x))) # factorize returns up to
    # e.g. x= 12345678987654321  factors: 3 3 3 3 37 37 333667 333667

    factorsz <- sort_listz(factorsz) # vector of primes, sorted

    prime_factorsz <- unique(factorsz)
    #prime_ekt <- sapply(prime_factorsz, function(i) length( factorsz [factorsz==i]))
    prime_ekt <- vapply(prime_factorsz, function(i) sum(factorsz==i), integer(1), USE.NAMES=FALSE)
    spz <- vector() # keep all divisors 
    all <-1
    n <- length(prime_factorsz)
    for (i in 1:n) {
      pr <- prime_factorsz[i]
      pe <- prime_ekt[i]
      all <- all*(pe+1) #counts all divisors 

      prz <- as.bigz(pr)
      pse <- vector(mode="raw",length=pe+1) 
      pse <- c( as.bigz(1), prz)

      if (pe>1) {
        for (k in 2:pe) {
          prz <- prz*pr
          pse[k+1] <- prz
        } # for k
      } # if pe>1

      if (i>1) {
       spz <- mult_listz (spz, pse)         
      } else {
       spz <- pse;
      } # if i>1
    } #for n
    spz <- sort_listz (spz)

    return (spz)
  }  
  } # function  factors_all_divisors  

  #====================================
Run Code Online (Sandbox Code Playgroud)

精致版,非常快.代码仍然简单,易读和干净.

测试

#Test 4 (big prime factor)
x <- pow.bigz(2,256)+1 # = 1238926361552897 * 93461639715357977769163558199606896584051237541638188580280321
 system.time(z2 <- all_divisors(x))
#   user  system elapsed 
 #  19.27    1.27   20.56


 #Test 5 (big prime factor)
x <- as.bigz("12345678987654321321") # = 3 * 19 * 216590859432531953

 system.time(x2 <- all_divisors(x^2))
#user  system elapsed 
 #25.65    0.00   25.67  
Run Code Online (Sandbox Code Playgroud)

  • George,考虑更换'mult_listz < - function(x,y){z < - x; for(j in 2:length(y)){#y [1] == 1 z < - c(z,y [j]*x)}#for j return(z)}`WITH`mult_listz < - function (x,y){z < - do.call('c',lapply(y,function(i)i*x))return(z)}`!(灵感来自约瑟夫的代码).它将提升其性能! (2认同)

Jos*_*ood 5

自从最初提出此问题以来,R语言已经发生了许多变化。在0.6-3numbers软件包的版本中,divisors包含了该函数,该函数对于获取数字的所有因子非常有用。它可以满足大多数用户的需求,但是,如果您正在寻找原始速度或使用更大的速度,则将需要另一种方法。我已经编写了两个新的程序包(部分是受这个问题的启发,我可能会补充),其中包含针对此类问题的高度优化的函数。第一个是RcppAlgos,另一个是bigIntegerAlgos

RcppAlgos

RcppAlgos包含两个函数,用于获得小于以下数字的除数2^53 - 1:(divisorsRcpp一个矢量化函数,用于快速获得许多数字的完整因式分解)和divisorsSieve(快速生成一个范围内的完整因式分解)。首先,我们使用divisorsRcpp以下公式分解许多随机数:

library(gmp)  ## for all_divisors by @GeorgeDontas
library(RcppAlgos)
library(numbers)
options(scipen = 999)
set.seed(42)
testSamp <- sample(10^10, 10)

## vectorized so you can pass the entire vector as an argument
testRcpp <- divisorsRcpp(testSamp)
testDontas <- lapply(testSamp, all_divisors)

identical(lapply(testDontas, as.numeric), testRcpp)
[1] TRUE
Run Code Online (Sandbox Code Playgroud)

现在,使用divisorsSieve

system.time(testSieve <- divisorsSieve(10^13, 10^13 + 10^5))
 user  system elapsed 
0.586   0.014   0.602 

system.time(testDontasSieve <- lapply((10^13):(10^13 + 10^5), all_divisors))
  user  system elapsed 
54.327   0.187  54.655 

identical(lapply(testDontasSieve, asNumeric), testSieve)
[1] TRUE
Run Code Online (Sandbox Code Playgroud)

divisorsRcppdivisorsSieve都是很好的功能,既灵活又高效,但是仅限于2^53 - 1

bigIntegerAlgos

bigIntegerAlgos程序包具有两个函数divisorsBigquadraticSieve,它们是为大量数字设计的。它们直接链接到C库gmp。对于divisorsBig,我们有:

library(bigIntegerAlgos)
## testSamp is defined above... N.B. divisorsBig is not quite as
## efficient as divisorsRcpp. This is so because divisorsRcpp
## can take advantage of more efficient data types..... it is
## still blazing fast!! See the benchmarks below for reference.
testBig <- divisorsBig(testSamp)

identical(testDontas, testBig)
[1] TRUE
Run Code Online (Sandbox Code Playgroud)

这是我原始帖子中定义的基准(NB MyFactorsdivisorsRcpp和代替divisorsBig)。

## Case 2
library(rbenchmark)
set.seed(199)
samp <- sample(10^9, 10^5)
benchmark(RcppAlgos=divisorsRcpp(samp),
          bigIntegerAlgos=divisorsBig(samp),
          DontasDivs=lapply(samp, all_divisors),
          replications=10,
          columns = c("test", "replications", "elapsed", "relative"),
          order = "relative")

             test replications elapsed relative
1       RcppAlgos           10   8.021    1.000
2 bigIntegerAlgos           10  15.246    1.901
3      DontasDivs           10 400.284   49.905

## Case 3
set.seed(97)
samp <- sample(10^6, 10^4)
benchmark(RcppAlgos=divisorsRcpp(samp),
          bigIntegerAlgos=divisorsBig(samp),
          numbers=lapply(samp, divisors),      ## From the numbers package
          DontasDivs=lapply(samp, all_divisors),
          CottonDivs=lapply(samp, get_all_factors),
          ChaseDivs=lapply(samp, FUN),
          replications=5,
          columns = c("test", "replications", "elapsed", "relative"),
          order = "relative")

             test replications elapsed relative
1       RcppAlgos            5   0.098    1.000
2 bigIntegerAlgos            5   0.330    3.367
3         numbers            5  11.613  118.500
4      DontasDivs            5  16.093  164.214
5      CottonDivs            5  60.428  616.612
6       ChaseDivs            5 342.608 3496.000
Run Code Online (Sandbox Code Playgroud)

下一个基准测试将演示该divisorsBig函数中基础算法的真正功能。要分解的数字是的幂10,因此几乎可以完全忽略素数分解步骤(例如,我机器上的system.time(factorize(pow.bigz(10,30)))寄存器0)。因此,时间上的差异仅归因于主要因子可以多快地结合起来以产生所有因子。

library(microbenchmark)
powTen <- pow.bigz(10,30)
microbenchmark(divisorsBig(powTen), all_divisors(powTen), unit = "relative")
Unit: relative
                expr      min      lq     mean   median       uq      max neval
 divisorsBig(powTen)  1.00000  1.0000  1.00000  1.00000  1.00000  1.00000   100
all_divisors(powTen) 32.35507 33.3054 33.28786 33.31253 32.11571 40.39236   100

## Negative numbers show an even greater increase in efficiency
negPowTen <- powTen * -1
microbenchmark(divisorsBig(negPowTen), all_divisors(negPowTen), unit = "relative")
Unit: relative
                   expr      min       lq    mean   median       uq      max neval
 divisorsBig(negPowTen)  1.00000  1.00000  1.0000  1.00000  1.00000  1.00000   100
all_divisors(negPowTen) 46.42795 46.22408 43.7964 47.93228 45.33406 26.64657   100
Run Code Online (Sandbox Code Playgroud)


quadraticSieve

我将给您做两次关于的演示quadraticSieve

n5 <- as.bigz("94968915845307373740134800567566911")
system.time(print(quadraticSieve(n5)))
Big Integer ('bigz') object of length 2:
[1] 216366620575959221 438925910071081891
 user  system elapsed 
4.154   0.021   4.175   ## original time was 3.813167 mins or 228.8 seconds ~ 50x slower

n9 <- prod(nextprime(urand.bigz(2, 82, 42)))
system.time(print(quadraticSieve(n9)))
Big Integer ('bigz') object of length 2:
[1] 2128750292720207278230259 4721136619794898059404993
    user   system  elapsed 
1010.404    2.715 1013.184   ## original time was 12.9297 hours or 46,547 seconds ~ 46x slower
Run Code Online (Sandbox Code Playgroud)

如您所见,quadraticSieve它比原始速度快得多QuadSieveMultiPolysAll,但是仍然有很多工作要做。目前正在进行研究以改进此功能,而当前目标是n9在一分钟内将其分解。还有矢量化的计划quadraticSieve,以及综合divisorsBig使用quadraticSieve,因为目前,它被限制到相同的算法gmp::factorize利用。