构建给定文本中最常用单词的ASCII图表

Chr*_*heD 156 language-agnostic code-golf

挑战:

构建给定文本中最常用单词的ASCII图表.

规则:

  • 只接受a-zA-Z(字母字符)作为单词的一部分.
  • 忽略套管(She== she为了我们的目的).
  • 忽略以下的话(非常苛刻,我知道): the, and, of, to, a, i, it, in, or, is
  • 澄清:考虑don't:这将被视为在范围2不同"单词" a-zA-Z:(dont).

  • 可选(现在正式更改规范为时已晚)您可以选择删除所有单字母"单词"(这可能会缩短忽略列表).

解析给定的text(读取通过命令行参数指定的文件或管道输入;假设us-ascii)并构建word frequency chart具有以下特征的a:

  • 显示22个最常见单词的图表(另请参见下面的示例)(按降序频率排序).
  • 条形width表示单词的出现次数(频率)(按比例).附加一个空格并打印单词.
  • 确保这些条形(加上空格 - 单词空格)始终适合:bar+ [space]+ word+ [space]应始终<= 80字符(确保考虑可能不同的条形和字长:例如:第二个最常见的单词可能要长得多)第一个虽然频率差别不大).在这些约束条件下最大化条宽,并适当缩放条(根据它们所代表的频率).

一个例子:

这个例子的文本可以在这里找到(爱丽丝梦游仙境,刘易斯卡罗尔).

此特定文本将产生以下图表:

 _________________________________________________________________________
|_________________________________________________________________________| she 
|_______________________________________________________________| you 
|____________________________________________________________| said 
|____________________________________________________| alice 
|______________________________________________| was 
|__________________________________________| that 
|___________________________________| as 
|_______________________________| her 
|____________________________| with 
|____________________________| at 
|___________________________| s 
|___________________________| t 
|_________________________| on 
|_________________________| all 
|______________________| this 
|______________________| for 
|______________________| had 
|_____________________| but 
|____________________| be 
|____________________| not 
|___________________| they 
|__________________| so 


供您参考:这些是以上图表构建的频率:

[('she', 553), ('you', 481), ('said', 462), ('alice', 403), ('was', 358), ('that
', 330), ('as', 274), ('her', 248), ('with', 227), ('at', 227), ('s', 219), ('t'
, 218), ('on', 204), ('all', 200), ('this', 181), ('for', 179), ('had', 178), ('
but', 175), ('be', 167), ('not', 166), ('they', 155), ('so', 152)]

第二个例子(检查你是否实现了完整的规范): 用以下内容替换you链接的Alice in Wonderland文件中的每个出现superlongstringstring:

 ________________________________________________________________
|________________________________________________________________| she 
|_______________________________________________________| superlongstringstring 
|_____________________________________________________| said 
|______________________________________________| alice 
|________________________________________| was 
|_____________________________________| that 
|______________________________| as 
|___________________________| her 
|_________________________| with 
|_________________________| at 
|________________________| s 
|________________________| t 
|______________________| on 
|_____________________| all 
|___________________| this 
|___________________| for 
|___________________| had 
|__________________| but 
|_________________| be 
|_________________| not 
|________________| they 
|________________| so 

获胜者,冠军:

最短的解决方案(按字符数,每种语言).玩得开心!


编辑:表总结了迄今为止的结果(2012-02-15)(最初由用户Nas Banov添加):

Language          Relaxed  Strict
=========         =======  ======
GolfScript          130     143
Perl                        185
Windows PowerShell  148     199
Mathematica                 199
Ruby                185     205
Unix Toolchain      194     228
Python              183     243
Clojure                     282
Scala                       311
Haskell                     333
Awk                         336
R                   298
Javascript          304     354
Groovy              321
Matlab                      404
C#                          422
Smalltalk           386
PHP                 450
F#                          452
TSQL                483     507

数字代表特定语言中最短解的长度."严格"是指完全实现规范的解决方案(绘制|____|条形图,用____线条关闭顶部的第一个条形图,说明高频率长字的可能性等)."放松"意味着采取一些自由来缩短解决方案.

仅包括短于500个字符的解决方案.语言列表按"严格"解决方案的长度排序.'Unix Toolchain'用于表示使用传统*nix shell 以及混合工具(如grep,tr,sort,uniq,head,perl,awk)的各种解决方案.

Joe*_*e Z 123

LabVIEW 51节点,5个结构,10个图表

教大象踢踏舞从来都不是很好.我会啊,跳过字符数.

labVIEW代码

结果

该程序从左向右流动:

labVIEW代码解释

  • 我见过最好的代码高尔夫答案.+1在盒子外面思考! (19认同)
  • 这是不值得的 (10认同)
  • LabVIEW对其硬件控制和测量领域非常满意,但对于字符串操作来说真的非常糟糕. (4认同)
  • 还没有3D?......:D (2认同)

Ven*_*ero 42

Ruby 1.9,185个字符

(严重基于其他Ruby解决方案)

w=($<.read.downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).group_by{|x|x}.map{|x,y|[-y.size,x]}.sort[0,22]
k,l=w[0]
puts [?\s+?_*m=76-l.size,w.map{|f,x|?|+?_*(f*m/k)+"| "+x}]
Run Code Online (Sandbox Code Playgroud)

您可以简单地将文件名作为参数传递,而不是像其他解决方案那样使用任何命令行开关.(即ruby1.9 wordfrequency.rb Alice.txt)

由于我在这里使用字符文字,因此该解决方案仅适用于Ruby 1.9.

编辑:用换行符替换分号以获得"可读性".:P

编辑2:Shtééf指出我忘记了尾随空间 - 修正了.

编辑3:再次删除尾随空格;)

  • 这看起来非常可维护. (2认同)

Nab*_*abb 39

GolfScript,177 175 173 167 164 163 144 131 130字符

慢 - 示例文本为3分钟(130)

{32|.123%97<n@if}%]''*n%"oftoitinorisa"2/-"theandi"3/-$(1@{.3$>1{;)}if}/]2/{~~\;}$22<.0=~:2;,76\-:1'_':0*' '\@{"
|"\~1*2/0*'| '@}/
Run Code Online (Sandbox Code Playgroud)

说明:

{           #loop through all characters
 32|.       #convert to uppercase and duplicate
 123%97<    #determine if is a letter
 n@if       #return either the letter or a newline
}%          #return an array (of ints)
]''*        #convert array to a string with magic
n%          #split on newline, removing blanks (stack is an array of words now)
"oftoitinorisa"   #push this string
2/          #split into groups of two, i.e. ["of" "to" "it" "in" "or" "is" "a"]
-           #remove any occurrences from the text
"theandi"3/-#remove "the", "and", and "i"
$           #sort the array of words
(1@         #takes the first word in the array, pushes a 1, reorders stack
            #the 1 is the current number of occurrences of the first word
{           #loop through the array
 .3$>1{;)}if#increment the count or push the next word and a 1
}/
]2/         #gather stack into an array and split into groups of 2
{~~\;}$     #sort by the latter element - the count of occurrences of each word
22<         #take the first 22 elements
.0=~:2;     #store the highest count
,76\-:1     #store the length of the first line
'_':0*' '\@ #make the first line
{           #loop through each word
"
|"\~        #start drawing the bar
1*2/0       #divide by zero
*'| '@      #finish drawing the bar
}/
Run Code Online (Sandbox Code Playgroud)

"正确"(希望如此).(143)

{32|.123%97<n@if}%]''*n%"oftoitinorisa"2/-"theandi"3/-$(1@{.3$>1{;)}if}/]2/{~~\;}$22<..0=1=:^;{~76@,-^*\/}%$0=:1'_':0*' '\@{"
|"\~1*^/0*'| '@}/
Run Code Online (Sandbox Code Playgroud)

慢一点 - 半分钟.(162)

'"'/' ':S*n/S*'"#{%q
'\+"
.downcase.tr('^a-z','
')}\""+~n%"oftoitinorisa"2/-"theandi"3/-$(1@{.3$>1{;)}if}/]2/{~~\;}$22<.0=~:2;,76\-:1'_':0*S\@{"
|"\~1*2/0*'| '@}/
Run Code Online (Sandbox Code Playgroud)

输出在修订日志中可见.

  • "除以零"...... GolfScript允许这样做? (5认同)
  • 关于GolfScript:http://www.golfscript.com/golfscript/ (2认同)
  • 不正确,因为如果第二个单词真的很长,它将换行到下一行. (2认同)

小智 35

206

shell,grep,tr,grep,sort,uniq,sort,head,perl

~ % wc -c wfg
209 wfg
~ % cat wfg
egrep -oi \\b[a-z]+|tr A-Z a-z|egrep -wv 'the|and|of|to|a|i|it|in|or|is'|sort|uniq -c|sort -nr|head -22|perl -lape'($f,$w)=@F;$.>1or($q,$x)=($f,76-length$w);$b="_"x($f/$q*$x);$_="|$b| $w ";$.>1or$_=" $b\n$_"'
~ % # usage:
~ % sh wfg < 11.txt
Run Code Online (Sandbox Code Playgroud)

嗯,刚看到上面:sort -nr- > sort -n然后head- > tail=> 208 :)
update2:嗯,当然上面是愚蠢的,因为它会被颠倒过来.所以,209.
update3:优化了排除正则表达式 - > 206

egrep -oi \\b[a-z]+|tr A-Z a-z|egrep -wv 'the|and|o[fr]|to|a|i[tns]?'|sort|uniq -c|sort -nr|head -22|perl -lape'($f,$w)=@F;$.>1or($q,$x)=($f,76-length$w);$b="_"x($f/$q*$x);$_="|$b| $w ";$.>1or$_=" $b\n$_"'
Run Code Online (Sandbox Code Playgroud)



为了好玩,这是一个perl-only版本(更快):

~ % wc -c pgolf
204 pgolf
~ % cat pgolf
perl -lne'$1=~/^(the|and|o[fr]|to|.|i[tns])$/i||$f{lc$1}++while/\b([a-z]+)/gi}{@w=(sort{$f{$b}<=>$f{$a}}keys%f)[0..21];$Q=$f{$_=$w[0]};$B=76-y///c;print" "."_"x$B;print"|"."_"x($B*$f{$_}/$Q)."| $_"for@w'
~ % # usage:
~ % sh pgolf < 11.txt
Run Code Online (Sandbox Code Playgroud)

  • 印象深刻的打高尔夫! (3认同)
  • 最令人印象深刻的 (3认同)

Mar*_*ith 35

基于Transact SQL set的解决方案(SQL Server 2005)1063 892 873 853 827 820 783 683 647 644 630个字符

感谢Gabe提供一些有用的建议来减少字符数.

注意:添加换行符以避免滚动条只需要最后一个换行符.

DECLARE @ VARCHAR(MAX),@F REAL SELECT @=BulkColumn FROM OPENROWSET(BULK'A',
SINGLE_BLOB)x;WITH N AS(SELECT 1 i,LEFT(@,1)L UNION ALL SELECT i+1,SUBSTRING
(@,i+1,1)FROM N WHERE i<LEN(@))SELECT i,L,i-RANK()OVER(ORDER BY i)R INTO #D
FROM N WHERE L LIKE'[A-Z]'OPTION(MAXRECURSION 0)SELECT TOP 22 W,-COUNT(*)C
INTO # FROM(SELECT DISTINCT R,(SELECT''+L FROM #D WHERE R=b.R FOR XML PATH
(''))W FROM #D b)t WHERE LEN(W)>1 AND W NOT IN('the','and','of','to','it',
'in','or','is')GROUP BY W ORDER BY C SELECT @F=MIN(($76-LEN(W))/-C),@=' '+
REPLICATE('_',-MIN(C)*@F)+' 'FROM # SELECT @=@+' 
|'+REPLICATE('_',-C*@F)+'| '+W FROM # ORDER BY C PRINT @
Run Code Online (Sandbox Code Playgroud)

可读版本

DECLARE @  VARCHAR(MAX),
        @F REAL
SELECT @=BulkColumn
FROM   OPENROWSET(BULK'A',SINGLE_BLOB)x; /*  Loads text file from path
                                             C:\WINDOWS\system32\A  */

/*Recursive common table expression to
generate a table of numbers from 1 to string length
(and associated characters)*/
WITH N AS
     (SELECT 1 i,
             LEFT(@,1)L

     UNION ALL

     SELECT i+1,
            SUBSTRING(@,i+1,1)
     FROM   N
     WHERE  i<LEN(@)
     )
  SELECT   i,
           L,
           i-RANK()OVER(ORDER BY i)R
           /*Will group characters
           from the same word together*/
  INTO     #D
  FROM     N
  WHERE    L LIKE'[A-Z]'OPTION(MAXRECURSION 0)
             /*Assuming case insensitive accent sensitive collation*/

SELECT   TOP 22 W,
         -COUNT(*)C
INTO     #
FROM     (SELECT DISTINCT R,
                          (SELECT ''+L
                          FROM    #D
                          WHERE   R=b.R FOR XML PATH('')
                          )W
                          /*Reconstitute the word from the characters*/
         FROM             #D b
         )
         T
WHERE    LEN(W)>1
AND      W NOT IN('the',
                  'and',
                  'of' ,
                  'to' ,
                  'it' ,
                  'in' ,
                  'or' ,
                  'is')
GROUP BY W
ORDER BY C

/*Just noticed this looks risky as it relies on the order of evaluation of the 
 variables. I'm not sure that's guaranteed but it works on my machine :-) */
SELECT @F=MIN(($76-LEN(W))/-C),
       @ =' '      +REPLICATE('_',-MIN(C)*@F)+' '
FROM   #

SELECT @=@+' 
|'+REPLICATE('_',-C*@F)+'| '+W
             FROM     #
             ORDER BY C

PRINT @
Run Code Online (Sandbox Code Playgroud)

产量

 _________________________________________________________________________ 
|_________________________________________________________________________| she
|_______________________________________________________________| You
|____________________________________________________________| said
|_____________________________________________________| Alice
|_______________________________________________| was
|___________________________________________| that
|____________________________________| as
|________________________________| her
|_____________________________| at
|_____________________________| with
|__________________________| on
|__________________________| all
|_______________________| This
|_______________________| for
|_______________________| had
|_______________________| but
|______________________| be
|_____________________| not
|____________________| they
|____________________| So
|___________________| very
|__________________| what
Run Code Online (Sandbox Code Playgroud)

并用长串

 _______________________________________________________________ 
|_______________________________________________________________| she
|_______________________________________________________| superlongstringstring
|____________________________________________________| said
|______________________________________________| Alice
|________________________________________| was
|_____________________________________| that
|_______________________________| as
|____________________________| her
|_________________________| at
|_________________________| with
|_______________________| on
|______________________| all
|____________________| This
|____________________| for
|____________________| had
|____________________| but
|___________________| be
|__________________| not
|_________________| they
|_________________| So
|________________| very
|________________| what
Run Code Online (Sandbox Code Playgroud)

  • 我给你+1了,因为你是用T-SQL做的,引用美国队 - "你有球.我喜欢球." (12认同)
  • 哈!拿吧,Java! (4认同)
  • 那段代码尖叫着我!:o (3认同)

arc*_*oon 34

Ruby 207 213 211 210 207 203 201 200字符

对Anurag的改进,结合了rfusca的建议.同时删除了排序和其他一些小型高尔夫球的参数.

w=(STDIN.read.downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).group_by{|x|x}.map{|x,y|[-y.size,x]}.sort.take 22;k,l=w[0];m=76.0-l.size;puts' '+'_'*m;w.map{|f,x|puts"|#{'_'*(m*f/k)}| #{x} "}
Run Code Online (Sandbox Code Playgroud)

执行为:

ruby GolfedWordFrequencies.rb < Alice.txt
Run Code Online (Sandbox Code Playgroud)

编辑:把'puts'放回去,需要在那里避免输出中的引号.
编辑2:更改文件 - > IO
编辑3:删除/ i
编辑4:删除括号(f*1.0),重述
编辑5:对第一行使用字符串添加; s就地扩张.
编辑6:制作浮动,删除1.0.编辑:不起作用,改变长度.编辑:没有比
Edit7 之前更糟糕:使用STDIN.read.


Dr.*_*ius 28

Mathematica(297 284 248 244 242 199字符)纯功能

和Zipf的法律测试

Look Mamma ... no vars, no hands, .. no head

Edit 1> some shorthands defined (284 chars)

f[x_, y_] := Flatten[Take[x, All, y]]; 

BarChart[f[{##}, -1], 
         BarOrigin -> Left, 
         ChartLabels -> Placed[f[{##}, 1], After], 
         Axes -> None
] 
& @@
Take[
  SortBy[
     Tally[
       Select[
        StringSplit[ToLowerCase[Import[i]], RegularExpression["\\W+"]], 
       !MemberQ[{"the", "and", "of", "to", "a", "i", "it", "in", "or","is"}, #]&]
     ], 
  Last], 
-22]
Run Code Online (Sandbox Code Playgroud)

Some explanations

Import[] 
   # Get The File

ToLowerCase []
   # To Lower Case :)

StringSplit[ STRING , RegularExpression["\\W+"]]
   # Split By Words, getting a LIST

Select[ LIST, !MemberQ[{LIST_TO_AVOID}, #]&]
   #  Select from LIST except those words in LIST_TO_AVOID
   #  Note that !MemberQ[{LIST_TO_AVOID}, #]& is a FUNCTION for the test

Tally[LIST]
   # Get the LIST {word,word,..} 
     and produce another  {{word,counter},{word,counter}...}

SortBy[ LIST ,Last]
   # Get the list produced bt tally and sort by counters
     Note that counters are the LAST element of {word,counter}

Take[ LIST ,-22]
   # Once sorted, get the biggest 22 counters

BarChart[f[{##}, -1], ChartLabels -> Placed[f[{##}, 1], After]] &@@ LIST
   # Get the list produced by Take as input and produce a bar chart

f[x_, y_] := Flatten[Take[x, All, y]]
   # Auxiliary to get the list of the first or second element of lists of lists x_
     dependending upon y
   # So f[{##}, -1] is the list of counters
   # and f[{##}, 1] is the list of words (labels for the chart)
Run Code Online (Sandbox Code Playgroud)

Output

alt text http://i49.tinypic.com/2n8mrer.jpg

Mathematica is not well suited for golfing, and that is just because of the long, descriptive function names. Functions like "RegularExpression[]" or "StringSplit[]" just make me sob :(.

Zipf's Law Testing

The Zipf's law predicts that for a natural language text, the Log (Rank) vs Log (occurrences) Plot follows a linear relationship.

The law is used in developing algorithms for criptography and data compression. (But it's NOT the "Z" in the LZW algorithm).

In our text, we can test it with the following

 f[x_, y_] := Flatten[Take[x, All, y]]; 
 ListLogLogPlot[
     Reverse[f[{##}, -1]], 
     AxesLabel -> {"Log (Rank)", "Log Counter"}, 
     PlotLabel -> "Testing Zipf's Law"]
 & @@
 Take[
  SortBy[
    Tally[
       StringSplit[ToLowerCase[b], RegularExpression["\\W+"]]
    ], 
   Last],
 -1000]
Run Code Online (Sandbox Code Playgroud)

The result is (pretty well linear)

替代文字http://i46.tinypic.com/33fcmdk.jpg

编辑6>(242个字符)

重构正则表达式(不再选择函数)
删除1个字词
更有效的函数定义"f"

f = Flatten[Take[#1, All, #2]]&; 
BarChart[
     f[{##}, -1], 
     BarOrigin -> Left, 
     ChartLabels -> Placed[f[{##}, 1], After], 
     Axes -> None] 
& @@
  Take[
    SortBy[
       Tally[
         StringSplit[ToLowerCase[Import[i]], 
          RegularExpression["(\\W|\\b(.|the|and|of|to|i[tns]|or)\\b)+"]]
       ],
    Last],
  -22]
Run Code Online (Sandbox Code Playgroud)

编辑7→199个字符

BarChart[#2, BarOrigin->Left, ChartLabels->Placed[#1, After], Axes->None]&@@ 
  Transpose@Take[SortBy[Tally@StringSplit[ToLowerCase@Import@i, 
    RegularExpression@"(\\W|\\b(.|the|and|of|to|i[tns]|or)\\b)+"],Last], -22]
Run Code Online (Sandbox Code Playgroud)
  • 替换fTransposeSlot(#1/ #2)参数.
  • 我们不需要stinkin'括号(使用f@x而不是f[x]尽可能使用)

  • 你认为"RegularExpression"不好吗?我在C#版本中键入"System.Text.RegularExpressions.Regex.Split"时哭了,直到我看到Objective-C代码:"stringWithContentsOfFile","enumerateSubstringsInRange","NSStringEnumerationByWords","sortedArrayUsingComparator"等等. (9认同)
  • @Gabe谢谢......我现在感觉好多了.在西班牙语中我们说"mal de muchos,consuelo de tontos"..像"许多困扰,傻瓜松了一口气":D (2认同)

Pau*_*sey 27

C# - 510 451 436 446 434 426 422个字符(缩小)

不是那么短,但现在可能是正确的!注意,以前的版本没有显示条形的第一行,没有正确缩放条形,下载文件而不是从stdin获取它,并且没有包含所有必需的C#详细程度.如果C#不需要那么多额外的废话,你可以很容易地刮掉很多笔画.也许Powershell可以做得更好.

using C=System.Console;   // alias for Console
using System.Linq;  // for Split, GroupBy, Select, OrderBy, etc.

class Class // must define a class
{
    static void Main()  // must define a Main
    {
        // split into words
        var allwords = System.Text.RegularExpressions.Regex.Split(
                // convert stdin to lowercase
                C.In.ReadToEnd().ToLower(),
                // eliminate stopwords and non-letters
                @"(?:\b(?:the|and|of|to|a|i[tns]?|or)\b|\W)+")
            .GroupBy(x => x)    // group by words
            .OrderBy(x => -x.Count()) // sort descending by count
            .Take(22);   // take first 22 words

        // compute length of longest bar + word
        var lendivisor = allwords.Max(y => y.Count() / (76.0 - y.Key.Length));

        // prepare text to print
        var toPrint = allwords.Select(x=> 
            new { 
                // remember bar pseudographics (will be used in two places)
                Bar = new string('_',(int)(x.Count()/lendivisor)), 
                Word=x.Key 
            })
            .ToList();  // convert to list so we can index into it

        // print top of first bar
        C.WriteLine(" " + toPrint[0].Bar);
        toPrint.ForEach(x =>  // for each word, print its bar and the word
            C.WriteLine("|" + x.Bar + "| " + x.Word));
    }
}
Run Code Online (Sandbox Code Playgroud)

带有lendivisor的 422个字符内联(这使得它慢22倍)在下面的表单中(用于选择空格的换行符):

using System.Linq;using C=System.Console;class M{static void Main(){var
a=System.Text.RegularExpressions.Regex.Split(C.In.ReadToEnd().ToLower(),@"(?:\b(?:the|and|of|to|a|i[tns]?|or)\b|\W)+").GroupBy(x=>x).OrderBy(x=>-x.Count()).Take(22);var
b=a.Select(x=>new{p=new string('_',(int)(x.Count()/a.Max(y=>y.Count()/(76d-y.Key.Length)))),t=x.Key}).ToList();C.WriteLine(" "+b[0].p);b.ForEach(x=>C.WriteLine("|"+x.p+"| "+x.t));}}
Run Code Online (Sandbox Code Playgroud)

  • 规范说文件必须通过管道输入或作为args传递.如果你假设args [0]包含本地文件名,你可以使用args [0]代替(new WebClient())来大大缩短它.DownloadString(@"http://www.gutenberg.org/ files/11/11.txt") - >它可以节省大约70个字符 (2认同)

JSB*_*ոգչ 25

Perl,237 229 209字符

(再次更新以使用更多肮脏的高尔夫技巧击败Ruby版本,替换split/[^a-z/,lclc=~/[a-z]+/g,并在另一个地方取消对空字符串的检查.这些受到Ruby版本的启发,因此信用到期.)

更新:现在使用Perl 5.10!更换printsay,并使用~~避免map.必须在命令行上调用它perl -E '<one-liner>' alice.txt.由于整个脚本都在一行上,因此将其写为单行应该不会有任何困难:).

 @s=qw/the and of to a i it in or is/;$c{$_}++foreach grep{!($_~~@s)}map{lc=~/[a-z]+/g}<>;@s=sort{$c{$b}<=>$c{$a}}keys%c;$f=76-length$s[0];say" "."_"x$f;say"|"."_"x($c{$_}/$c{$s[0]}*$f)."| $_ "foreach@s[0..21];
Run Code Online (Sandbox Code Playgroud)

请注意,此版本针对大小写进行了规范化 这并没有缩短任何解决方案,因为删除,lc(对于较低的外壳)需要你添加A-Z到拆分正则表达式,所以它是一个清洗.

如果你的系统中换行是一个字符而不是两个字符,你可以用另一个字符缩短它,用一个文字换行代替\n.但是,我没有这样写过上面的样本,因为它更"清楚"(哈!).


这是一个大多数正确的,但不是远程短的perl解决方案:

use strict;
use warnings;

my %short = map { $_ => 1 } qw/the and of to a i it in or is/;
my %count = ();

$count{$_}++ foreach grep { $_ && !$short{$_} } map { split /[^a-zA-Z]/ } (<>);
my @sorted = (sort { $count{$b} <=> $count{$a} } keys %count)[0..21];
my $widest = 76 - (length $sorted[0]);

print " " . ("_" x $widest) . "\n";
foreach (@sorted)
{
    my $width = int(($count{$_} / $count{$sorted[0]}) * $widest);
    print "|" . ("_" x $width) . "| $_ \n";
}
Run Code Online (Sandbox Code Playgroud)

以下内容尽可能短,同时保持相对可读性.(392个字符).

%short = map { $_ => 1 } qw/the and of to a i it in or is/;
%count;

$count{$_}++ foreach grep { $_ && !$short{$_} } map { split /[^a-z]/, lc } (<>);
@sorted = (sort { $count{$b} <=> $count{$a} } keys %count)[0..21];
$widest = 76 - (length $sorted[0]);

print " " . "_" x $widest . "\n";
print"|" . "_" x int(($count{$_} / $count{$sorted[0]}) * $widest) . "| $_ \n" foreach @sorted;
Run Code Online (Sandbox Code Playgroud)

  • 这不包括第二个单词比第一个单词长得多的情况,对吗? (4认同)

Joe*_*oey 20

Windows PowerShell,199个字符

$x=$input-split'\P{L}'-notmatch'^(the|and|of|to|.?|i[tns]|or)$'|group|sort *
filter f($w){' '+'_'*$w
$x[-1..-22]|%{"|$('_'*($w*$_.Count/$x[-1].Count))| "+$_.Name}}
f(76..1|?{!((f $_)-match'.'*80)})[0]
Run Code Online (Sandbox Code Playgroud)

(最后一个换行符不是必需的,但为了便于阅读,此处包括在内.)

(我的SVN存储库中提供当前代码和我的测试文件.我希望我的测试用例能够捕获最常见的错误(条形长度,正则表达式匹配问题以及其他一些错误))

假设:

  • US ASCII作为输入.Unicode可能会让人觉得奇怪.
  • 文本中至少有两个不间断的单词

历史

轻松的版本(137),因为现在单独计算,显然:

($x=$input-split'\P{L}'-notmatch'^(the|and|of|to|.?|i[tns]|or)$'|group|sort *)[-1..-22]|%{"|$('_'*(76*$_.Count/$x[-1].Count))| "+$_.Name}
Run Code Online (Sandbox Code Playgroud)
  • 没有关闭第一个酒吧
  • 不考虑非第一个单词的单词长度

与其他解决方案相比,一个字符的条形长度的变化是由于PowerShell在将浮点数转换为整数时使用舍入而不是截断.由于任务只需要比例长度,这应该没问题.

与其他解决方案相比,我采用略微不同的方法来确定最长的条长度,只需尝试并获取最高的长度,其中没有行超过80个字符.

可以在此处找到解释的旧版本.


Anu*_*rag 19

红宝石,215,216,218,221,224,236,237个字符

更新1:华友世纪!这与JS Bangs解决方案相关联.想不出再削减的方法了:)

更新2:玩了一个肮脏的高尔夫球技巧.改变eachmap保存1个字符:)

更新3:更改File.readIO.read+2.Array.group_by不是很有成效,改为reduce+6.downcase在regex +1 下套管后不需要不区分大小写检查.通过否定值+6可以轻松地按降序排序.总节省+15

更新4:[0]而不是.first+3.(@Shtééf)

更新5:l就地扩展变量,+ 1.s就地扩展变量,+ 2.(@Shtééf)

更新6:对第一行+2使用字符串加法而不是插值.(@Shtééf)

w=(IO.read($_).downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).reduce(Hash.new 0){|m,o|m[o]+=1;m}.sort_by{|k,v|-v}.take 22;m=76-w[0][0].size;puts' '+'_'*m;w.map{|x,f|puts"|#{'_'*(f*1.0/w[0][1]*m)}| #{x} "}
Run Code Online (Sandbox Code Playgroud)

更新7:我经历了大量的喧嚣,使用实例变量检测循环的第一次迭代.我得到的只是+1,尽管可能有潜力.保留以前的版本,因为我相信这一个是黑魔法.(@Shtééf)

(IO.read($_).downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).reduce(Hash.new 0){|m,o|m[o]+=1;m}.sort_by{|k,v|-v}.take(22).map{|x,f|@f||(@f=f;puts' '+'_'*(@m=76-x.size));puts"|#{'_'*(f*1.0/@f*@m)}| #{x} "}
Run Code Online (Sandbox Code Playgroud)

可读版本

string = File.read($_).downcase

words = string.scan(/[a-z]+/i)
allowed_words = words - %w{the and of to a i it in or is}
sorted_words = allowed_words.group_by{ |x| x }.map{ |x,y| [x, y.size] }.sort{ |a,b| b[1] <=> a[1] }.take(22)
highest_frequency = sorted_words.first
highest_frequency_count = highest_frequency[1]
highest_frequency_word = highest_frequency[0]

word_length = highest_frequency_word.size
widest = 76 - word_length

puts " #{'_' * widest}"    
sorted_words.each do |word, freq|
  width = (freq * 1.0 / highest_frequency_count) * widest
  puts "|#{'_' * width}| #{word} "
end
Run Code Online (Sandbox Code Playgroud)

使用:

echo "Alice.txt" | ruby -ln GolfedWordFrequencies.rb
Run Code Online (Sandbox Code Playgroud)

输出:

 _________________________________________________________________________
|_________________________________________________________________________| she 
|_______________________________________________________________| you 
|____________________________________________________________| said 
|_____________________________________________________| alice 
|_______________________________________________| was 
|___________________________________________| that 
|____________________________________| as 
|________________________________| her 
|_____________________________| with 
|_____________________________| at 
|____________________________| s 
|____________________________| t 
|__________________________| on 
|__________________________| all 
|_______________________| this 
|_______________________| for 
|_______________________| had 
|_______________________| but 
|______________________| be 
|_____________________| not 
|____________________| they 
|____________________| so 
Run Code Online (Sandbox Code Playgroud)

  • 哇,Ruby正在击败Perl. (4认同)
  • "p"不是"put"的捷径吗?这可能会刮掉一些. (3认同)
  • 我想知道为什么这仍然是在收集选票.解决方案是不正确的(在一般情况下),现在已经有两个更短的Ruby解决方案了. (3认同)
  • 您需要缩放条形,以便最长的单词加上它的条形符合80个字符.正如布莱恩建议的那样,长篇大论将会破坏你的计划. (2认同)

Nas*_*nov 19

Python 2.x,自由主义方法= 227 183个字符

import sys,re
t=re.split('\W+',sys.stdin.read().lower())
r=sorted((-t.count(w),w)for w in set(t)if w not in'andithetoforinis')[:22]
for l,w in r:print(78-len(r[0][1]))*l/r[0][0]*'=',w
Run Code Online (Sandbox Code Playgroud)

允许在实现中自由,我构造了一个字符串连接,其中包含所有要求排除的单词(the, and, of, to, a, i, it, in, or, is) - 此外它还排除了两个臭名昭着的"单词" st示例 - 我免费提供了排除an, for, he.我尝试将这些词语的所有连接与爱丽丝,詹姆斯国王的圣经和术语文件中的单词语料库联系起来,以查看是否有任何单词会被字符串误排除.这就是我用两个排除字符串结束的方式:itheandtoforinisandithetoforinis.

PS.借用其他解决方案来缩短代码.

=========================================================================== she 
================================================================= you
============================================================== said
====================================================== alice
================================================ was
============================================ that
===================================== as
================================= her
============================== at
============================== with
=========================== on
=========================== all
======================== this
======================== had
======================= but
====================== be
====================== not
===================== they
==================== so
=================== very
=================== what
================= little
Run Code Online (Sandbox Code Playgroud)

胡言乱语

关于要忽略的词,人们会认为这些词将取自英语中最常用词的列表.该列表取决于使用的文本语料库.每最流行的列表中的一个(http://en.wikipedia.org/wiki/Most_common_words_in_English,http://www.english-for-students.com/Frequently-Used-Words.html,HTTP:// WWW. sporcle.com/games/common_english_words.php),前10个单词是:the be(am/are/is/was/were) to of and a in that have I

来自爱丽丝梦游仙境文本the and to a of it she i you said
的前10个单词是来自术语文件(v4.4.7)的前10个单词the a of to and in is that or for

所以问题是为什么or被包含在问题的忽略列表中,当这个词that(第8个最常用的词)没有时,它的受欢迎程度约为30 .因此我认为应该动态地提供忽略列表(或者可以省略).

另一种想法是简单地跳过结果中的前10个单词 - 这实际上会缩短解决方案(初级 - 只需要显示第11到第32个条目).


Python 2.x,punctilious approach = 277 243个字符

上面代码中绘制的图表被简化(对于条形图仅使用一个字符).如果想要从问题描述中完全重现图表(这不是必需的),则此代码将执行此操作:

import sys,re
t=re.split('\W+',sys.stdin.read().lower())
r=sorted((-t.count(w),w)for w in set(t)-set(sys.argv))[:22]
h=min(9*l/(77-len(w))for l,w in r)
print'',9*r[0][0]/h*'_'
for l,w in r:print'|'+9*l/h*'_'+'|',w
Run Code Online (Sandbox Code Playgroud)

我对10个单词的随机选择有疑问,the, and, of, to, a, i, it, in, or, is因此要将它们作为命令行参数传递,如下所示:
python WordFrequencyChart.py the and of to a i it in or is <"Alice's Adventures in Wonderland.txt"

如果我们考虑在命令行= 243上传递的"原始"忽略列表,则这是213个字符+30

PS.第二个代码也对所有顶部单词的长度进行"调整",因此在退化情况下它们都不会溢出.

 _______________________________________________________________
|_______________________________________________________________| she
|_______________________________________________________| superlongstringstring
|_____________________________________________________| said
|______________________________________________| alice
|_________________________________________| was
|______________________________________| that
|_______________________________| as
|____________________________| her
|__________________________| at
|__________________________| with
|_________________________| s
|_________________________| t
|_______________________| on
|_______________________| all
|____________________| this
|____________________| for
|____________________| had
|____________________| but
|___________________| be
|___________________| not
|_________________| they
|_________________| so
Run Code Online (Sandbox Code Playgroud)


Tho*_*mas 12

Haskell - 366 351 344 337 333个字符

(main为了便于阅读,添加了一行换行符,并且在最后一行末尾不需要换行符.)

import Data.List
import Data.Char
l=length
t=filter
m=map
f c|isAlpha c=toLower c|0<1=' '
h w=(-l w,head w)
x!(q,w)='|':replicate(minimum$m(q?)x)'_'++"| "++w
q?(g,w)=q*(77-l w)`div`g
b x=m(x!)x
a(l:r)=(' ':t(=='_')l):l:r
main=interact$unlines.a.b.take 22.sort.m h.group.sort
  .t(`notElem`words"the and of to a i it in or is").words.m f
Run Code Online (Sandbox Code Playgroud)

通过阅读interact倒退的论点可以最好地看到它的工作原理:

  • map f 小写字母,用空格替换其他所有内容.
  • words 生成一个单词列表,删除分隔的空格.
  • filter (notElem words "the and of to a i it in or is")丢弃所有带有禁词的条目.
  • group . sort 对单词进行排序,并将相同的单词分组到列表中.
  • map h将每个相同单词列表映射到表单的元组(-frequency, word).
  • take 22 . sort 通过降频(第一个元组条目)对元组进行排序,并仅保留前22个元组.
  • b 将元组映射到条形图(见下文).
  • a 在第一行下划线前面,以完成最顶部的栏.
  • unlines 将所有这些行与新行一起加入.

棘手的一点就是让杠杆长度合适.我假设只有下划线计算到条的长度,所以||将是一个零长度的条.该函数b映射c xx,这里x是直方图的名单.传递整个列表c,以便每次调用都c可以通过调用计算自身的比例因子u.通过这种方式,我避免使用浮点数学或有理数,其转换函数和导入会吃掉许多字符.

注意使用的技巧-frequency.这消除了需要reversesort,因为排序(升序)-frequency将放置单词,最大频率一.之后,在函数中u,两个-frequency值相乘,这将取消否定.


Mat*_*att 11

JavaScript 1.8(SpiderMonkey) - 354

x={};p='|';e=' ';z=[];c=77
while(l=readline())l.toLowerCase().replace(/\b(?!(the|and|of|to|a|i[tns]?|or)\b)\w+/g,function(y)x[y]?x[y].c++:z.push(x[y]={w:y,c:1}))
z=z.sort(function(a,b)b.c-a.c).slice(0,22)
for each(v in z){v.r=v.c/z[0].c
c=c>(l=(77-v.w.length)/v.r)?l:c}for(k in z){v=z[k]
s=Array(v.r*c|0).join('_')
if(!+k)print(e+s+e)
print(p+s+p+e+v.w)}
Run Code Online (Sandbox Code Playgroud)

可悲的是,for([k,v]in z)来自Rhino版本似乎并不想在SpiderMonkey中工作,并且readFile()比使用更容易,readline()但升级到1.8允许我们使用功能闭包来削减更多的线....

添加空格以提高可读性:

x={};p='|';e=' ';z=[];c=77
while(l=readline())
  l.toLowerCase().replace(/\b(?!(the|and|of|to|a|i[tns]?|or)\b)\w+/g,
   function(y) x[y] ? x[y].c++ : z.push( x[y] = {w: y, c: 1} )
  )
z=z.sort(function(a,b) b.c - a.c).slice(0,22)
for each(v in z){
  v.r=v.c/z[0].c
  c=c>(l=(77-v.w.length)/v.r)?l:c
}
for(k in z){
  v=z[k]
  s=Array(v.r*c|0).join('_')
  if(!+k)print(e+s+e)
  print(p+s+p+e+v.w)
}
Run Code Online (Sandbox Code Playgroud)

用法: js golf.js < input.txt

输出:

 _________________________________________________________________________ 
|_________________________________________________________________________| she
|_______________________________________________________________| you
|____________________________________________________________| said
|____________________________________________________| alice
|______________________________________________| was
|___________________________________________| that
|___________________________________| as
|________________________________| her
|_____________________________| at
|_____________________________| with
|____________________________| s
|____________________________| t
|__________________________| on
|_________________________| all
|_______________________| this
|______________________| for
|______________________| had
|______________________| but
|_____________________| be
|_____________________| not
|___________________| they
|___________________| so

(基本版 - 不能正确处理条宽)

JavaScript(Rhino) - 405 395 387 377 368 343 304个字符

我认为我的排序逻辑是关闭的,但是..我duno. Brainfart修复了.

缩小(滥用\n' ;有时被解释为):

x={};p='|';e=' ';z=[]
readFile(arguments[0]).toLowerCase().replace(/\b(?!(the|and|of|to|a|i[tns]?|or)\b)\w+/g,function(y){x[y]?x[y].c++:z.push(x[y]={w:y,c:1})})
z=z.sort(function(a,b){return b.c-a.c}).slice(0,22)
for([k,v]in z){s=Array((v.c/z[0].c)*70|0).join('_')
if(!+k)print(e+s+e)
print(p+s+p+e+v.w)}
Run Code Online (Sandbox Code Playgroud)

  • 顺便说一句 - 我喜欢`我[tns]?位.非常鬼鬼祟祟. (2认同)

pde*_*aan 11

perl,205 191 189个字符/ 205个字符(完全实现)

有些部分受到早期perl/ruby​​提交的启发,一些类似的想法是独立出现的,其他部分是原创的.较短的版本还包含了我从其他提交中看到/学到的一些内容.

原版的:

$k{$_}++for grep{$_!~/^(the|and|of|to|a|i|it|in|or|is)$/}map{lc=~/[a-z]+/g}<>;@t=sort{$k{$b}<=>$k{$a}}keys%k;$l=76-length$t[0];printf" %s
",'_'x$l;printf"|%s| $_
",'_'x int$k{$_}/$k{$t[0]}*$l for@t[0..21];
Run Code Online (Sandbox Code Playgroud)

最新版本低至 191个字符:

/^(the|and|of|to|.|i[tns]|or)$/||$k{$_}++for map{lc=~/[a-z]+/g}<>;@e=sort{$k{$b}<=>$k{$a}}keys%k;$n=" %s
";$r=(76-y///c)/$k{$_=$e[0]};map{printf$n,'_'x($k{$_}*$r),$_;$n="|%s| %s
"}@e[0,0..21]
Run Code Online (Sandbox Code Playgroud)

最新版本低至189个字符:

/^(the|and|of|to|.|i[tns]|or)$/||$k{$_}++for map{lc=~/[a-z]+/g}<>;@_=sort{$k{$b}<=>$k{$a}}keys%k;$n=" %s
";$r=(76-m//)/$k{$_=$_[0]};map{printf$n,'_'x($k{$_}*$r),$_;$n="|%s| %s
"}@_[0,0..21]
Run Code Online (Sandbox Code Playgroud)

这个版本(205个字符)说明的字符长于后面的字符.

/^(the|and|of|to|.|i[tns]|or)$/||$k{$_}++for map{lc=~/[a-z]+/g}<>;($r)=sort{$a<=>$b}map{(76-y///c)/$k{$_}}@e=sort{$k{$b}<=>$k{$a}}keys%k;$n=" %s
";map{printf$n,'_'x($k{$_}*$r),$_;$n="|%s| %s
";}@e[0,0..21]
Run Code Online (Sandbox Code Playgroud)


Sam*_*lan 11

Python 3.1 - 245 229个字符

我想使用Counter是一种欺骗行为:)我刚刚在一周前阅读过它,所以这是查看它是如何工作的绝佳机会.

import re,collections
o=collections.Counter([w for w in re.findall("[a-z]+",open("!").read().lower())if w not in"a and i in is it of or the to".split()]).most_common(22)
print('\n'.join('|'+76*v//o[0][1]*'_'+'| '+k for k,v in o))
Run Code Online (Sandbox Code Playgroud)

打印出来:

|____________________________________________________________________________| she
|__________________________________________________________________| you
|_______________________________________________________________| said
|_______________________________________________________| alice
|_________________________________________________| was
|_____________________________________________| that
|_____________________________________| as
|__________________________________| her
|_______________________________| with
|_______________________________| at
|______________________________| s
|_____________________________| t
|____________________________| on
|___________________________| all
|________________________| this
|________________________| for
|________________________| had
|________________________| but
|______________________| be
|______________________| not
|_____________________| they
|____________________| so
Run Code Online (Sandbox Code Playgroud)

一些代码是从AKX的解决方案中"借用"的.


小智 11

PHP CLI版本(450个字符)

该解决方案考虑了大多数纯粹主义者已经方便地选择忽略的最后一个要求.花费了170个字符!

用法: php.exe <this.php> <file.txt>

精缩:

<?php $a=array_count_values(array_filter(preg_split('/[^a-z]/',strtolower(file_get_contents($argv[1])),-1,1),function($x){return !preg_match("/^(.|the|and|of|to|it|in|or|is)$/",$x);}));arsort($a);$a=array_slice($a,0,22);function R($a,$F,$B){$r=array();foreach($a as$x=>$f){$l=strlen($x);$r[$x]=$b=$f*$B/$F;if($l+$b>76)return R($a,$f,76-$l);}return$r;}$c=R($a,max($a),76-strlen(key($a)));foreach($a as$x=>$f)echo '|',str_repeat('-',$c[$x]),"| $x\n";?>
Run Code Online (Sandbox Code Playgroud)

人类可读:

<?php

// Read:
$s = strtolower(file_get_contents($argv[1]));

// Split:
$a = preg_split('/[^a-z]/', $s, -1, PREG_SPLIT_NO_EMPTY);

// Remove unwanted words:
$a = array_filter($a, function($x){
       return !preg_match("/^(.|the|and|of|to|it|in|or|is)$/",$x);
     });

// Count:
$a = array_count_values($a);

// Sort:
arsort($a);

// Pick top 22:
$a=array_slice($a,0,22);


// Recursive function to adjust bar widths
// according to the last requirement:
function R($a,$F,$B){
    $r = array();
    foreach($a as $x=>$f){
        $l = strlen($x);
        $r[$x] = $b = $f * $B / $F;
        if ( $l + $b > 76 )
            return R($a,$f,76-$l);
    }
    return $r;
}

// Apply the function:
$c = R($a,max($a),76-strlen(key($a)));


// Output:
foreach ($a as $x => $f)
    echo '|',str_repeat('-',$c[$x]),"| $x\n";

?>
Run Code Online (Sandbox Code Playgroud)

输出:

|-------------------------------------------------------------------------| she
|---------------------------------------------------------------| you
|------------------------------------------------------------| said
|-----------------------------------------------------| alice
|-----------------------------------------------| was
|-------------------------------------------| that
|------------------------------------| as
|--------------------------------| her
|-----------------------------| at
|-----------------------------| with
|--------------------------| on
|--------------------------| all
|-----------------------| this
|-----------------------| for
|-----------------------| had
|-----------------------| but
|----------------------| be
|---------------------| not
|--------------------| they
|--------------------| so
|-------------------| very
|------------------| what
Run Code Online (Sandbox Code Playgroud)

当有一个长字时,条形调整正确:

|--------------------------------------------------------| she
|---------------------------------------------------| thisisareallylongwordhere
|-------------------------------------------------| you
|-----------------------------------------------| said
|-----------------------------------------| alice
|------------------------------------| was
|---------------------------------| that
|---------------------------| as
|-------------------------| her
|-----------------------| with
|-----------------------| at
|--------------------| on
|--------------------| all
|------------------| this
|------------------| for
|------------------| had
|-----------------| but
|-----------------| be
|----------------| not
|---------------| they
|---------------| so
|--------------| very
Run Code Online (Sandbox Code Playgroud)


Syn*_*era 10

Perl:203 202 201 198 195 208 203/231字符

$/=\0;/^(the|and|of|to|.|i[tns]|or)$/i||$x{lc$_}++for<>=~/[a-z]+/gi;map{$z=$x{$_};$y||{$y=(76-y///c)/$z}&&warn" "."_"x($z*$y)."\n";printf"|%.78s\n","_"x($z*$y)."| $_"}(sort{$x{$b}<=>$x{$a}}keys%x)[0..21]
Run Code Online (Sandbox Code Playgroud)

替代的,完整的实现,包括指示行为(全局条形压缩)的病理案例,其中次要词既流行又足够长,可以组合超过80个字符(此实现是231个字符):

$/=\0;/^(the|and|of|to|.|i[tns]|or)$/i||$x{lc$_}++for<>=~/[a-z]+/gi;@e=(sort{$x{$b}<=>$x{$a}}keys%x)[0..21];for(@e){$p=(76-y///c)/$x{$_};($y&&$p>$y)||($y=$p)}warn" "."_"x($x{$e[0]}*$y)."\n";for(@e){warn"|"."_"x($x{$_}*$y)."| $_\n"}
Run Code Online (Sandbox Code Playgroud)

该规范并没有任何地方指出,这不得不去STDOUT,所以我用Perl的警告(),而不是打印 - 保存有四个大字.使用map而不是foreach,但我觉得在split(join())中仍然可以节省更多.仍然,把它降到203 - 可能会睡在它上面.至少有Perl的,现在下 "壳,grep的时间tr,grep的,排序的uniq,排序,头,perl" 的字符数现在;)

PS:Reddit说"嗨";)

更新:删除了join()以支持赋值和隐式标量转换连接.下到202.请注意我已利用可选的"忽略1个字母的单词"规则刮胡子2个字符了,所以牢记频率计数将反映这一点.

更新2:交换分配和隐式联接以杀死$ /以首先使用<>获取文件.相同的尺寸,但更肮脏.如果(!$ y){}为$ y || {} &&,则换掉,再保存1个char => 201.

更新3:注意到早(LC <>)通过移动LC出来的图块的lowercasing的控制-换出两个正则表达式到不再使用/ i选项,因为不再需要.传统的perlgolf ||交换显式条件x?y:z构造 隐含的条件结构- /^...$/i?1:$x{$ } ++为/^...$/||$x{$ } ++挽救了三个字符!=> 198,打破了200障碍.可能很快就会睡觉......也许吧.

更新4:睡眠剥夺让我疯了.好.更疯狂.确定这只需要解析正常的快乐文本文件,如果它达到空值,我会放弃它.保存了两个字符.将"长度"替换为1-char更短(更多高尔夫球)y /// c - 你听到我,GolfScript ?? 我来找你!!! 哭泣

更新5:Sleep dep让我忘记了22row限制和后续行限制.回复到208处理.还不错,处理它的13个字符不是世界末日.使用perl的正则表达式内联eval,但无法兼顾工作保存字符...大声笑.更新了示例以匹配当前输出.

更新6:删除了不需要的大括号保护(...),因为语法糖果++允许快乐地推动它.感谢Chas的意见.欧文斯(提醒我疲惫的大脑),在那里得到了人物类别[tns]解决方案.回到203.

更新7:新增第二件作品,全面实施规范的(包括二次长字的完整的酒吧,压扁行为,而不是截断,大多数人都在做,在原有基础上不规范的病理例子情况下)

例子:

 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|___________________________________________| that
|____________________________________| as
|________________________________| her
|_____________________________| with
|_____________________________| at
|__________________________| on
|__________________________| all
|_______________________| this
|_______________________| for
|_______________________| had
|_______________________| but
|______________________| be
|_____________________| not
|____________________| they
|____________________| so
|___________________| very
|__________________| what
Run Code Online (Sandbox Code Playgroud)

病理案例中的替代实施:

 _______________________________________________________________
|_______________________________________________________________| she
|_______________________________________________________| superlongstringstring
|____________________________________________________| said
|______________________________________________| alice
|________________________________________| was
|_____________________________________| that
|_______________________________| as
|____________________________| her
|_________________________| with
|_________________________| at
|_______________________| on
|______________________| all
|____________________| this
|____________________| for
|____________________| had
|____________________| but
|___________________| be
|__________________| not
|_________________| they
|_________________| so
|________________| very
|________________| what
Run Code Online (Sandbox Code Playgroud)


Bri*_*ian 9

F#,452个字符

明确:获得一系列a字数对,找到每列最佳字数k,然后打印结果.

let a=
 stdin.ReadToEnd().Split(" .?!,\":;'\r\n".ToCharArray(),enum 1)
 |>Seq.map(fun s->s.ToLower())|>Seq.countBy id
 |>Seq.filter(fun(w,n)->not(set["the";"and";"of";"to";"a";"i";"it";"in";"or";"is"].Contains w))
 |>Seq.sortBy(fun(w,n)-> -n)|>Seq.take 22
let k=a|>Seq.map(fun(w,n)->float(78-w.Length)/float n)|>Seq.min
let u n=String.replicate(int(float(n)*k)-2)"_"
printfn" %s "(u(snd(Seq.nth 0 a)))
for(w,n)in a do printfn"|%s| %s "(u n)w
Run Code Online (Sandbox Code Playgroud)

示例(我的频率计数与您不同,不确定原因):

% app.exe < Alice.txt

 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|_____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|___________________________________________| that
|___________________________________| as
|________________________________| her
|_____________________________| with
|_____________________________| at
|____________________________| t
|____________________________| s
|__________________________| on
|_________________________| all
|_______________________| this
|______________________| had
|______________________| for
|_____________________| but
|_____________________| be
|____________________| not
|___________________| they
|__________________| so
Run Code Online (Sandbox Code Playgroud)

  • (@Rotsor:Ironic,因为我的是最老的解决方案.) (2认同)

AKX*_*AKX 8

Python 2.6,347字符

import re
W,x={},"a and i in is it of or the to".split()
[W.__setitem__(w,W.get(w,0)-1)for w in re.findall("[a-z]+",file("11.txt").read().lower())if w not in x]
W=sorted(W.items(),key=lambda p:p[1])[:22]
bm=(76.-len(W[0][0]))/W[0][1]
U=lambda n:"_"*int(n*bm)
print "".join(("%s\n|%s| %s "%((""if i else" "+U(n)),U(n),w))for i,(w,n)in enumerate(W))
Run Code Online (Sandbox Code Playgroud)

输出:

 _________________________________________________________________________
|_________________________________________________________________________| she 
|_______________________________________________________________| you 
|____________________________________________________________| said 
|_____________________________________________________| alice 
|_______________________________________________| was 
|___________________________________________| that 
|____________________________________| as 
|________________________________| her 
|_____________________________| with 
|_____________________________| at 
|____________________________| s 
|____________________________| t 
|__________________________| on 
|__________________________| all 
|_______________________| this 
|_______________________| for 
|_______________________| had 
|_______________________| but 
|______________________| be 
|_____________________| not 
|____________________| they 
|____________________| so 
Run Code Online (Sandbox Code Playgroud)


dmc*_*kee 7

Gawk - 336(原为507)个字符

(在修复输出格式之后;修复收缩的东西;调整;再次调整;删除一个完全不必要的排序步骤;再次调整;再次(哎呀这个打破了格式化);再调整一些;接受Matt的挑战我拼命调整所以更多;发现另一个地方可以节省一些,但给了两个回来修复条长度的bug)

嘿嘿!我暂时领先于[Matt的JavaScript] [1]解决方案挑战!;) [AKX的蟒蛇] [2].

这个问题似乎要求一种实现原生关联数组的语言,所以我当然选择了一种运算符非常不足的算法.特别是,你无法控制awk提供哈希映射元素的顺序,所以我反复扫描整个地图以找到当前最多的项目,打印它并从数组中删除它.

这一切都非常低效,我所做的所有高尔夫球场也变得非常糟糕.

精缩:

{gsub("[^a-zA-Z]"," ");for(;NF;NF--)a[tolower($NF)]++}
END{split("the and of to a i it in or is",b," ");
for(w in b)delete a[b[w]];d=1;for(w in a){e=a[w]/(78-length(w));if(e>d)d=e}
for(i=22;i;--i){e=0;for(w in a)if(a[w]>e)e=a[x=w];l=a[x]/d-2;
t=sprintf(sprintf("%%%dc",l)," ");gsub(" ","_",t);if(i==22)print" "t;
print"|"t"| "x;delete a[x]}}
Run Code Online (Sandbox Code Playgroud)

仅为清晰起见换行:它们不是必需的,不应计算在内.


输出:

$ gawk -f wordfreq.awk.min < 11.txt 
 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|____________________________________________________________| said
|____________________________________________________| alice
|______________________________________________| was
|__________________________________________| that
|___________________________________| as
|_______________________________| her
|____________________________| with
|____________________________| at
|___________________________| s
|___________________________| t
|_________________________| on
|_________________________| all
|______________________| this
|______________________| for
|______________________| had
|_____________________| but
|____________________| be
|____________________| not
|___________________| they
|__________________| so
$ sed 's/you/superlongstring/gI' 11.txt | gawk -f wordfreq.awk.min
 ______________________________________________________________________
|______________________________________________________________________| she
|_____________________________________________________________| superlongstring
|__________________________________________________________| said
|__________________________________________________| alice
|____________________________________________| was
|_________________________________________| that
|_________________________________| as
|______________________________| her
|___________________________| with
|___________________________| at
|__________________________| s
|__________________________| t
|________________________| on
|________________________| all
|_____________________| this
|_____________________| for
|_____________________| had
|____________________| but
|___________________| be
|___________________| not
|__________________| they
|_________________| so
Run Code Online (Sandbox Code Playgroud)

可读; 633个字符(原文949):

{
    gsub("[^a-zA-Z]"," ");
    for(;NF;NF--)
    a[tolower($NF)]++
}
END{
    # remove "short" words
    split("the and of to a i it in or is",b," ");
    for (w in b) 
    delete a[b[w]];
    # Find the bar ratio
    d=1;
    for (w in a) {
    e=a[w]/(78-length(w));
    if (e>d)
        d=e
    }
    # Print the entries highest count first
    for (i=22; i; --i){               
    # find the highest count
    e=0;
    for (w in a) 
        if (a[w]>e)
        e=a[x=w];
        # Print the bar
    l=a[x]/d-2;
    # make a string of "_" the right length
    t=sprintf(sprintf("%%%dc",l)," ");
    gsub(" ","_",t);
    if (i==22) print" "t;
    print"|"t"| "x;
    delete a[x]
    }
}
Run Code Online (Sandbox Code Playgroud)


Fra*_*mer 7

*sh(+ curl),部分解决方案

这是不完整的,但对于它的地狱,这里的字频率计数问题的一半是192字节:

curl -s http://www.gutenberg.org/files/11/11.txt|sed -e 's@[^a-z]@\n@gi'|tr '[:upper:]' '[:lower:]'|egrep -v '(^[^a-z]*$|\b(the|and|of|to|a|i|it|in|or|is)\b)' |sort|uniq -c|sort -n|tail -n 22
Run Code Online (Sandbox Code Playgroud)


650*_*502 7

常见的LISP,670个字符

我是一个LISP新手,这是一个尝试使用哈希表进行计数(所以可能不是最紧凑的方法).

(flet((r()(let((x(read-char t nil)))(and x(char-downcase x)))))(do((c(
make-hash-table :test 'equal))(w NIL)(x(r)(r))y)((not x)(maphash(lambda
(k v)(if(not(find k '("""the""and""of""to""a""i""it""in""or""is"):test
'equal))(push(cons k v)y)))c)(setf y(sort y #'> :key #'cdr))(setf y
(subseq y 0(min(length y)22)))(let((f(apply #'min(mapcar(lambda(x)(/(-
76.0(length(car x)))(cdr x)))y))))(flet((o(n)(dotimes(i(floor(* n f)))
(write-char #\_))))(write-char #\Space)(o(cdar y))(write-char #\Newline)
(dolist(x y)(write-char #\|)(o(cdr x))(format t "| ~a~%"(car x))))))
(cond((char<= #\a x #\z)(push x w))(t(incf(gethash(concatenate 'string(
reverse w))c 0))(setf w nil)))))
Run Code Online (Sandbox Code Playgroud)

可以运行例如 cat alice.txt | clisp -C golf.lisp.

以可读的形式

(flet ((r () (let ((x (read-char t nil)))
               (and x (char-downcase x)))))
  (do ((c (make-hash-table :test 'equal))  ; the word count map
       w y                                 ; current word and final word list
       (x (r) (r)))  ; iteration over all chars
       ((not x)

        ; make a list with (word . count) pairs removing stopwords
        (maphash (lambda (k v)
                   (if (not (find k '("" "the" "and" "of" "to"
                                      "a" "i" "it" "in" "or" "is")
                                  :test 'equal))
                       (push (cons k v) y)))
                 c)

        ; sort and truncate the list
        (setf y (sort y #'> :key #'cdr))
        (setf y (subseq y 0 (min (length y) 22)))

        ; find the scaling factor
        (let ((f (apply #'min
                        (mapcar (lambda (x) (/ (- 76.0 (length (car x)))
                                               (cdr x)))
                                y))))
          ; output
          (flet ((outx (n) (dotimes (i (floor (* n f))) (write-char #\_))))
             (write-char #\Space)
             (outx (cdar y))
             (write-char #\Newline)
             (dolist (x y)
               (write-char #\|)
               (outx (cdr x))
               (format t "| ~a~%" (car x))))))

       ; add alphabetic to current word, and bump word counter
       ; on non-alphabetic
       (cond
        ((char<= #\a x #\z)
         (push x w))
        (t
         (incf (gethash (concatenate 'string (reverse w)) c 0))
         (setf w nil)))))
Run Code Online (Sandbox Code Playgroud)


Shi*_*zou 6

C(828)

它看起来很像混淆代码,并使用glib作为字符串,列表和哈希.字数用828wc -m表示.它不考虑单字词.要计算条的最大长度,它会考虑所有条中最长的单词,而不仅仅是前22条.这是否与规范有偏差?

它不处理故障,也不释放已用内存.

#include <glib.h>
#define S(X)g_string_##X
#define H(X)g_hash_table_##X
GHashTable*h;int m,w=0,z=0;y(const void*a,const void*b){int*A,*B;A=H(lookup)(h,a);B=H(lookup)(h,b);return*B-*A;}void p(void*d,void*u){int *v=H(lookup)(h,d);if(w<22){g_printf("|");*v=*v*(77-z)/m;while(--*v>=0)g_printf("=");g_printf("| %s\n",d);w++;}}main(c){int*v;GList*l;GString*s=S(new)(NULL);h=H(new)(g_str_hash,g_str_equal);char*n[]={"the","and","of","to","it","in","or","is"};while((c=getchar())!=-1){if(isalpha(c))S(append_c)(s,tolower(c));else{if(s->len>1){for(c=0;c<8;c++)if(!strcmp(s->str,n[c]))goto x;if((v=H(lookup)(h,s->str))!=NULL)++*v;else{z=MAX(z,s->len);v=g_malloc(sizeof(int));*v=1;H(insert)(h,g_strdup(s->str),v);}}x:S(truncate)(s,0);}}l=g_list_sort(H(get_keys)(h),y);m=*(int*)H(lookup)(h,g_list_first(l)->data);g_list_foreach(l,p,NULL);}
Run Code Online (Sandbox Code Playgroud)


mob*_*mob 6

Perl,185 char

200(稍微破碎) 199 197 195 193 187 185个字符.最后两个新行很重要.符合规范.

map$X{+lc}+=!/^(.|the|and|to|i[nst]|o[rf])$/i,/[a-z]+/gfor<>;
$n=$n>($:=$X{$_}/(76-y+++c))?$n:$:for@w=(sort{$X{$b}-$X{$a}}%X)[0..21];
die map{$U='_'x($X{$_}/$n);" $U
"x!$z++,"|$U| $_
"}@w
Run Code Online (Sandbox Code Playgroud)

第一行加载有效单词的计数%X.

第二行计算最小缩放因子,以便所有输出行<= 80个字符.

第三行(包含两个换行符)生成输出.


Bal*_*usC 5

Java - 886 865 756 744 742 744 752 742 714 680字符

  • 在742之前的更新:改进了正则表达式,删除了多余的参数化类型,删除了多余的空格.

  • 更新742> 744个字符:修复了固定长度的黑客攻击.它只依赖于第一个单词,而不是其他单词(尚未).找到了几个缩短代码的地方(\\s用正则表达式代替 and ArrayList替换Vector).我现在正在寻找一种简短的方法来删除Commons IO依赖和从stdin读取.

  • 更新744> 752个字符:我删除了公共依赖项.它现在从stdin读取.将文本粘贴到stdin中并点击Ctrl+Z以获得结果.

  • 更新752> 742个字符:我删除public了一个空格,使得classname 1 char而不是2,它现在忽略了一个字母的单词.

  • 更新742> 714个字符:根据Carl的评论更新:删除了冗余分配(742> 730),替换m.containsKey(k)m.get(k)!=null(730> 728),引入了子行的子串(728> 714).

  • 更新714> 680个字符:根据Rotsor的评论更新:改进了条形尺寸计算以删除不必要的铸造并改进split()以删除不必要的replaceAll().


import java.util.*;class F{public static void main(String[]a)throws Exception{StringBuffer b=new StringBuffer();for(int c;(c=System.in.read())>0;b.append((char)c));final Map<String,Integer>m=new HashMap();for(String w:b.toString().toLowerCase().split("(\\b(.|the|and|of|to|i[tns]|or)\\b|\\W)+"))m.put(w,m.get(w)!=null?m.get(w)+1:1);List<String>l=new Vector(m.keySet());Collections.sort(l,new Comparator(){public int compare(Object l,Object r){return m.get(r)-m.get(l);}});int c=76-l.get(0).length();String s=new String(new char[c]).replace('\0','_');System.out.println(" "+s);for(String w:l.subList(0,22))System.out.println("|"+s.substring(0,m.get(w)*c/m.get(l.get(0)))+"| "+w);}}
Run Code Online (Sandbox Code Playgroud)

更易阅读的版本:

import java.util.*;
class F{
 public static void main(String[]a)throws Exception{
  StringBuffer b=new StringBuffer();for(int c;(c=System.in.read())>0;b.append((char)c));
  final Map<String,Integer>m=new HashMap();for(String w:b.toString().toLowerCase().split("(\\b(.|the|and|of|to|i[tns]|or)\\b|\\W)+"))m.put(w,m.get(w)!=null?m.get(w)+1:1);
  List<String>l=new Vector(m.keySet());Collections.sort(l,new Comparator(){public int compare(Object l,Object r){return m.get(r)-m.get(l);}});
  int c=76-l.get(0).length();String s=new String(new char[c]).replace('\0','_');System.out.println(" "+s);
  for(String w:l.subList(0,22))System.out.println("|"+s.substring(0,m.get(w)*c/m.get(l.get(0)))+"| "+w);
 }
}
Run Code Online (Sandbox Code Playgroud)

输出:

 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|___________________________________________| that
|____________________________________| as
|________________________________| her
|_____________________________| with
|_____________________________| at
|__________________________| on
|__________________________| all
|_______________________| this
|_______________________| for
|_______________________| had
|_______________________| but
|______________________| be
|_____________________| not
|____________________| they
|____________________| so
|___________________| very
|__________________| what

很糟糕的是Java没有String#join()闭包(还).

由Rotsor编辑:

我对您的解决方案进行了一些更改:

  • 用String []替换List
  • 重用'args'参数而不是声明我自己的String数组.也用它作为.ToArray()的参数
  • 用字符串替换StringBuffer(是的,是的,可怕的性能)
  • 用早期停止的选择排序替换Java排序(只需要找到前22个元素)
  • 将一些int声明聚合到一个语句中
  • 实现了非作弊算法,找到了最受限制的输出线.没有FP实现它.
  • 修复了文本中少于22个不同单词时程序崩溃的问题
  • 实现了一种新的读取输入算法,该算法速度快,只比慢速输入长9个字符.

精简代码长度为688 711 684个字符:

import java.util.*;class F{public static void main(String[]l)throws Exception{Map<String,Integer>m=new HashMap();String w="";int i=0,k=0,j=8,x,y,g=22;for(;(j=System.in.read())>0;w+=(char)j);for(String W:w.toLowerCase().split("(\\b(.|the|and|of|to|i[tns]|or)\\b|\\W)+"))m.put(W,m.get(W)!=null?m.get(W)+1:1);l=m.keySet().toArray(l);x=l.length;if(x<g)g=x;for(;i<g;++i)for(j=i;++j<x;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}for(;k<g;k++){x=76-l[k].length();y=m.get(l[k]);if(k<1||y*i>x*j){i=x;j=y;}}String s=new String(new char[m.get(l[0])*i/j]).replace('\0','_');System.out.println(" "+s);for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/j)+"| "+w);}}}
Run Code Online (Sandbox Code Playgroud)

快速版(720 693个字符)

import java.util.*;class F{public static void main(String[]l)throws Exception{Map<String,Integer>m=new HashMap();String w="";int i=0,k=0,j=8,x,y,g=22;for(;j>0;){j=System.in.read();if(j>90)j-=32;if(j>64&j<91)w+=(char)j;else{if(!w.matches("^(|.|THE|AND|OF|TO|I[TNS]|OR)$"))m.put(w,m.get(w)!=null?m.get(w)+1:1);w="";}}l=m.keySet().toArray(l);x=l.length;if(x<g)g=x;for(;i<g;++i)for(j=i;++j<x;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}for(;k<g;k++){x=76-l[k].length();y=m.get(l[k]);if(k<1||y*i>x*j){i=x;j=y;}}String s=new String(new char[m.get(l[0])*i/j]).replace('\0','_');System.out.println(" "+s);for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/j)+"| "+w);}}}
Run Code Online (Sandbox Code Playgroud)

更易阅读的版本:

import java.util.*;class F{public static void main(String[]l)throws Exception{
    Map<String,Integer>m=new HashMap();String w="";
    int i=0,k=0,j=8,x,y,g=22;
    for(;j>0;){j=System.in.read();if(j>90)j-=32;if(j>64&j<91)w+=(char)j;else{
        if(!w.matches("^(|.|THE|AND|OF|TO|I[TNS]|OR)$"))m.put(w,m.get(w)!=null?m.get(w)+1:1);w="";
    }}
    l=m.keySet().toArray(l);x=l.length;if(x<g)g=x;
    for(;i<g;++i)for(j=i;++j<x;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}
    for(;k<g;k++){x=76-l[k].length();y=m.get(l[k]);if(k<1||y*i>x*j){i=x;j=y;}}
    String s=new String(new char[m.get(l[0])*i/j]).replace('\0','_');
    System.out.println(" "+s);
    for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/j)+"| "+w);}}
}
Run Code Online (Sandbox Code Playgroud)

没有行为改进的版本是615个字符:

import java.util.*;class F{public static void main(String[]l)throws Exception{Map<String,Integer>m=new HashMap();String w="";int i=0,k=0,j=8,g=22;for(;j>0;){j=System.in.read();if(j>90)j-=32;if(j>64&j<91)w+=(char)j;else{if(!w.matches("^(|.|THE|AND|OF|TO|I[TNS]|OR)$"))m.put(w,m.get(w)!=null?m.get(w)+1:1);w="";}}l=m.keySet().toArray(l);for(;i<g;++i)for(j=i;++j<l.length;)if(m.get(l[i])<m.get(l[j])){w=l[i];l[i]=l[j];l[j]=w;}i=76-l[0].length();String s=new String(new char[i]).replace('\0','_');System.out.println(" "+s);for(k=0;k<g;k++){w=l[k];System.out.println("|"+s.substring(0,m.get(w)*i/m.get(l[0]))+"| "+w);}}}
Run Code Online (Sandbox Code Playgroud)

  • 假设最长的条形正好是75个字符,你就有点作弊.你必须确保没有bar + word超过80个字符. (5认同)
  • @ st0le:不再了. (2认同)