查找字符串句子的组合 - 频率表与目标频率表的组合

Question

查找字符串句子的组合 - 频率表与目标频率表的组合

Big*_*ief 8 c++ string algorithm computer-science data-structures

该问题将在下面的文章中进行解释。

\n

我有一个句子列表，例如 1000 个句子的列表。

\n

我想找到一个句子组合来匹配/“匹配最接近”某个频率表：

\n

[a：100，b：80，c：90，d：150，e：100，f：100，g：47，h：10 ..... z：900]

\n

我考虑过使用像 in\n这样的组合从句子列表中找到所有可能的组合（所以将comb(1000, 1);改为comb(1000, 1000);），然后将每个组合与频率表进行比较，这样距离是最小值。因此，将可能组合中的所有频率表相加，并将该总和与目标进行比较，应记录与目标差异最小的组合。可能有多种最接近的组合。

\n

问题是所有组合的计算需要很长时间才能完成，显然需要几天的时间。是否有一种已知的算法可以有效地解决这个问题？最好最多几分钟？

\n

输入句子：

\n

\n
停车场里的房车比露营地里的房车还要多。
\n

\n

\n
她尽力帮助他。\n曾经有几天我希望与我的身体分离，但今天不是这样的日子之一。
\n

\n

\n
漩涡棒棒糖与冰糖有问题。
\n

\n

\n
两人沿着狭缝峡谷走下去，没有注意到远处的雷声。
\n

\n

\n
州际公路两旁种植着数英亩的杏树，与疯狂的驾驶狂相得益彰。
\n

\n

\n
他不是詹姆斯·邦德；他不是詹姆斯·邦德。他的名字叫罗杰摩尔。
\n

\n

\n
风滚草拒绝翻滚，但却非常愿意跳跃。
\n

\n

\n
她很反感他无法区分柠檬水和柠檬水。
\n

\n

\n
他不想去看牙医，但他还是去了。
\n

\n

查找与以下频率表最接近的句子组合：

\n

[a:5、b:5、c:5、d:5、e:5、f:5、g:5、h:5 ..... z:5]

\n

例子：

\n

第六句频数表

\n

\n
他不是詹姆斯·邦德；他不是詹姆斯·邦德。他的名字叫罗杰摩尔。
\n

\n

是 [a:2、e:5、g:1、h:1、i:3、j:1、m:3、n:3、o:5、r:3、s:4]

\n

频数表上下相等，排除特殊字符。

\n

Answer 1

Maj*_*aba 5

每当有人从以下句子中找到包含 3c、3a、3b、3d 或 30c、30a、30b、30d 的句子组合且以上或以下 5% 时，即可解决。

S1: aaaaaaaaaaaaaaaaaa bbbbbb c
S2: aaaaaaaa bbbbbbbb d
S3: aaaaaaaaaaa bbbbbbbbb c dd
S4: aaaaaaaaaa bbbbbbbb

Run Code Online (Sandbox Code Playgroud)

Be realistic. There is No solution, not NP-hard nor NP-complete, No solution. The number of occurrence of letters in a sentence (for example vowels like i or a) is not equal to others (like x or w). We can just find best matches like the code provided here or change the requirement. I tried to solve this with KnapSack algorithm and Euclidean distance and Standard deviation, but none gives me such answer since there is no sentence with the same size of letters.

归档时间：	4 年，2 月前
查看次数：	755 次
最近记录：	4 年，2 月前