根据事件对数字进行分组?

Xeo*_*oss 34 algorithm go graph-algorithm

鉴于以下三个数字序列,我想弄清楚如何对数字进行分组以找到它们之间最密切的关系.

1,2,3,4
4,3,5
2,1,3
...
Run Code Online (Sandbox Code Playgroud)

我不确定我正在寻找的算法是什么,但我们可以看到与某些数字的关系比与其他数字的关系更强.

这些数字一起出现两次:

1 & 2
1 & 3
2 & 3
3 & 4
Run Code Online (Sandbox Code Playgroud)

一起一次:

1 & 4
2 & 4
3 & 5
4 & 5
Run Code Online (Sandbox Code Playgroud)

因此,例如,我们可以看到必须存在关系,1, 2, & 3因为它们一起出现至少两次.你也可以说3 & 4它们密切相关,因为它们也会出现两次.但是,算法可能会选择[1,2,3](过度[3,4]),因为它是一个更大的分组(更具包容性).

如果我们将最常用的数字组合在一个组中,我们可以形成以下任何分组:

[1,2,3] & [4,5]
[1,2]   & [3,4]   & [5]
[1,2]   & [3,4,5]
[1,2]   & [3,4]   & [5]
Run Code Online (Sandbox Code Playgroud)

如果允许重复,您甚至可以得到以下组:

[1,2,3,4] [1,2,3] [3,4] [5]
Run Code Online (Sandbox Code Playgroud)

我不能说哪种分组最"正确",但所有这四种组合都找到了不同的方法来对这些数字进行半正确分组.我不是在寻找一个特定的分组 - 只是一个通用的集群算法,它运行得相当好并且易于理解.

我确信还有很多其他方法可以使用事件计数来对它们进行分组.什么是这些良好的基础分组算法?Go,Javascript或PHP中的样本是首选.

Mic*_*zlo 31

你的三个序列中的每一个都可以被理解为一个集团多重图.在一个集团中,每个顶点都连接到每个其他顶点.

下图表示您的样本案例,每个集团的边缘分别为红色,蓝色和绿色.

多图有五个顶点和三个派系

如您所示,我们可以根据它们之间的边数对顶点对进行分类.在图示中,我们可以看到四对顶点各自由两条边连接,另外四对顶点各自连接一条边.

我们可以根据它们出现的派系数量继续对顶点进行分类.在某种意义上,我们根据它们的连通性对顶点进行排序.出现在k派系中的顶点可以被认为与出现在k派系中的其他顶点连接的程度相同.在图像中,我们看到三组顶点:顶点3出现在三个派系中; 顶点1,2和4分别出现在两个派系中; 顶点5出现在一个集团中.

下面的Go程序计算边缘分类以及顶点分类.程序的输入在第一行包含顶点数n和派系数m.我们假设顶点的编号从1到n.每个后续m输入行是一个以空格分隔的属于一个集团的顶点列表.因此,问题中给出的问题实例由此输入表示:

5 3
1 2 3 4
4 3 5
2 1 3
Run Code Online (Sandbox Code Playgroud)

相应的输出是:

Number of edges between pairs of vertices:
    2 edges: (1, 2) (1, 3) (2, 3) (3, 4)
    1 edge:  (1, 4) (2, 4) (3, 5) (4, 5)

Number of cliques in which a vertex appears:
    3 cliques: 3
    2 cliques: 1 2 4
    1 clique:  5
Run Code Online (Sandbox Code Playgroud)

这是Go计划:

package main

import (
        "bufio"
        "fmt"
        "os"
        "strconv"
        "strings"
)

func main() {
        // Set up input and output.
        reader := bufio.NewReader(os.Stdin)
        writer := bufio.NewWriter(os.Stdout)
        defer writer.Flush()

        // Get the number of vertices and number of cliques from the first line.
        line, err := reader.ReadString('\n')
        if err != nil {
                fmt.Fprintf(os.Stderr, "Error reading first line: %s\n", err)
                return
        }
        var numVertices, numCliques int
        numScanned, err := fmt.Sscanf(line, "%d %d", &numVertices, &numCliques)
        if numScanned != 2 || err != nil {
                fmt.Fprintf(os.Stderr, "Error parsing input parameters: %s\n", err)   
                return
        }

        // Initialize the edge counts and vertex counts.
        edgeCounts := make([][]int, numVertices+1)
        for u := 1; u <= numVertices; u++ {
                edgeCounts[u] = make([]int, numVertices+1)
        }
        vertexCounts := make([]int, numVertices+1)

        // Read each clique and update the edge counts.
        for c := 0; c < numCliques; c++ {
                line, err = reader.ReadString('\n')
                if err != nil {
                        fmt.Fprintf(os.Stderr, "Error reading clique: %s\n", err)
                        return
                }
                tokens := strings.Split(strings.TrimSpace(line), " ")
                clique := make([]int, len(tokens))
                for i, token := range tokens {
                        u, err := strconv.Atoi(token)
                        if err != nil {
                                fmt.Fprintf(os.Stderr, "Atoi error: %s\n", err)
                                return
                        }
                        vertexCounts[u]++
                        clique[i] = u
                        for j := 0; j < i; j++ {
                                v := clique[j]
                                edgeCounts[u][v]++
                                edgeCounts[v][u]++
                        }
                }
        }

        // Compute the number of edges between each pair of vertices.
        count2edges := make([][][]int, numCliques+1)
        for u := 1; u < numVertices; u++ {
                for v := u + 1; v <= numVertices; v++ {
                        count := edgeCounts[u][v]
                        count2edges[count] = append(count2edges[count],
                                []int{u, v})
                }
        }
        writer.WriteString("Number of edges between pairs of vertices:\n")
        for count := numCliques; count >= 1; count-- {
                edges := count2edges[count]
                if len(edges) == 0 {
                        continue
                }
                label := "edge"
                if count > 1 {
                        label += "s:"
                } else {
                        label += ": "
                }
                writer.WriteString(fmt.Sprintf("%5d %s", count, label))
                for _, edge := range edges {
                        writer.WriteString(fmt.Sprintf(" (%d, %d)",
                                edge[0], edge[1]))
                }
                writer.WriteString("\n")
        }

        // Group vertices according to the number of clique memberships.
        count2vertices := make([][]int, numCliques+1)
        for u := 1; u <= numVertices; u++ {
                count := vertexCounts[u]
                count2vertices[count] = append(count2vertices[count], u)
        }
        writer.WriteString("\nNumber of cliques in which a vertex appears:\n")
        for count := numCliques; count >= 1; count-- {
                vertices := count2vertices[count]
                if len(vertices) == 0 {
                        continue
                }
                label := "clique"
                if count > 1 {
                        label += "s:"
                } else {
                        label += ": "
                }
                writer.WriteString(fmt.Sprintf("%5d %s", count, label))
                for _, u := range vertices {
                        writer.WriteString(fmt.Sprintf(" %d", u))
                }
                writer.WriteString("\n")
        }
}
Run Code Online (Sandbox Code Playgroud)


Uve*_*tel 10

正如已经提到的那样,这是关于集团的.如果你想要准确的答案,你将面临NP-complete的最大集团问题.因此,只有符号(数字)的字母表具有合理的大小,以下所有内容才有意义.在这种情况下,伪代码中的最大Clique问题的前向,非优化算法将是

Function Main
    Cm ? ?                   // the maximum clique
    Clique(?,V)              // V vertices set
    return Cm
End function Main

Function Clique(set C, set P) // C the current clique, P candidat set
    if (|C| > |Cm|) then
        Cm ? C
    End if
    if (|C|+|P|>|Cm|)then
        for all p ? P in predetermined order, do
            P ? P \ {p}
            Cp ?C ? {p}
            Pp ?P ? N(p)        //N(p) set of the vertices adjacent to p
            Clique(Cp,Pp)
        End for
    End if
End function Clique
Run Code Online (Sandbox Code Playgroud)

因为Go是我选择的语言,这里是实现

package main

import (
    "bufio"
    "fmt"
    "sort"
    "strconv"
    "strings"
)

var adjmatrix map[int]map[int]int = make(map[int]map[int]int)
var Cm []int = make([]int, 0)
var frequency int


//For filter
type resoult [][]int
var res resoult
var filter map[int]bool = make(map[int]bool)
var bf int
//For filter


//That's for sorting
func (r resoult) Less(i, j int) bool {
    return len(r[i]) > len(r[j])
}

func (r resoult) Swap(i, j int) {
    r[i], r[j] = r[j], r[i]
}

func (r resoult) Len() int {
    return len(r)
}
//That's for sorting


//Work done here
func Clique(C []int, P map[int]bool) {
    if len(C) >= len(Cm) {

        Cm = make([]int, len(C))
        copy(Cm, C)
    }
    if len(C)+len(P) >= len(Cm) {
        for k, _ := range P {
            delete(P, k)
            Cp := make([]int, len(C)+1)
            copy(Cp, append(C, k))
            Pp := make(map[int]bool)
            for n, m := range adjmatrix[k] {
                _, ok := P[n]
                if ok && m >= frequency {
                    Pp[n] = true
                }
            }
            Clique(Cp, Pp)

            res = append(res, Cp)
            //Cleanup resoult
            bf := 0
            for _, v := range Cp {
                bf += 1 << uint(v)
            }
            _, ok := filter[bf]
            if !ok {
                filter[bf] = true
                res = append(res, Cp)
            }
            //Cleanup resoult
        }
    }
}
//Work done here

func main() {
    var toks []string
    var numbers []int
    var number int


//Input parsing
    StrReader := strings.NewReader(`1,2,3
4,3,5
4,1,6
4,2,7
4,1,7
2,1,3
5,1,2
3,6`)
    scanner := bufio.NewScanner(StrReader)
    for scanner.Scan() {
        toks = strings.Split(scanner.Text(), ",")
        numbers = []int{}
        for _, v := range toks {
            number, _ = strconv.Atoi(v)
            numbers = append(numbers, number)

        }
        for k, v := range numbers {
            for _, m := range numbers[k:] {
                _, ok := adjmatrix[v]
                if !ok {
                    adjmatrix[v] = make(map[int]int)
                }
                _, ok = adjmatrix[m]
                if !ok {
                    adjmatrix[m] = make(map[int]int)
                }
                if m != v {
                    adjmatrix[v][m]++
                    adjmatrix[m][v]++
                    if adjmatrix[v][m] > frequency {
                        frequency = adjmatrix[v][m]
                    }
                }

            }
        }
    }
    //Input parsing

    P1 := make(map[int]bool)


    //Iterating for frequency of appearance in group
    for ; frequency > 0; frequency-- {
        for k, _ := range adjmatrix {
            P1[k] = true
        }
        Cm = make([]int, 0)
        res = make(resoult, 0)
        Clique(make([]int, 0), P1)
        sort.Sort(res)
        fmt.Print(frequency, "x-times ", res, " ")
    }
    //Iterating for frequency of appearing together
}
Run Code Online (Sandbox Code Playgroud)

在这里你可以看到它的工作原理https://play.golang.org/p/ZiJfH4Q6GJ并使用输入数据.但再一次,这种方法适用于合理大小的字母表(以及任何大小的输入数据).