pro*_*ian 2 visualization wolfram-mathematica
我目前正在查看大型文本信息数据库中的单词和短语频率(大约108MB分布在307个文本文件中).我的目标是有一种方法可以快速查看哪些文件最相关并且具有视觉吸引力的格式(尽管这个项目可能也会证明文本表示总是更清晰).
现在我有以下内容:
SetDirectory["/MYMATHEMATICADIRECTORY/"];
filelist = FileNames[];
viewerCount1 = {0};
viewerCount2 = {0};
word1 = "freedom";
word2 = "liberty";
Do[
searchDB = StringSplit[Import[filename]];
AppendTo[viewerCount1, Count[searchDB, word1]];
AppendTo[viewerCount2, Count[searchDB, word2]];
, {filename, filelist}]
list3 = Take[viewerCount1, {2, -1}]
list4 = Take[viewerCount2, {2, -1}]
Run Code Online (Sandbox Code Playgroud)
文件名[]产生一个列表,诸如:{ "001ABbenevolat.txt-cleaned.txt", "002abnature.txt-cleaned.txt", "003aboriginaldocs.txt-cleaned.txt", "004ABpresse.txt-cleaned.txt" ,"005acadian.txt-packaged.txt","006acadiedelile.txt-cleared.txt","007acfa.txt-cleared.txt"} [除了307条目,全部编号].
list3生成一个列表,如:{0,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 2,0,0,0,2,0,0,0,1,0,0,0,0,0,0,0,1,0,2,0,0,0,10,1,7, 0,0,0,0,23,3,0,0,0,0,0,0,0,0,0,0,0,9,0,1,0,1,0,5, 0,13,0,0,0,0,0,0,0,0,0,1,0,2,0,4,0,0,0,1,11,0,0,0, 0,2,7,1,4,1,0,0,0,0,0,0,0,13,...}等等.
命令:
BarChart3D[{list3, list4}, BarSpacing -> {0.5, 0}, ChartLayout -> "Grid"]
Run Code Online (Sandbox Code Playgroud)
生成接近我想要的东西(将它们想象为文件夹粘贴).但是,我想添加有意义的工具提示.默认情况下,它会出现频率.是否还有一种快速方法可以包括频率附加的文件名以及频率?即一个工具提示,它会调出'007acfa.txt-cleared.txt - 32',其中32个出现在文件7中?
例如,假设您的数据类似于
list3 = RandomInteger[30, 30];
list4 = RandomInteger[30, 30];
filelist = Table["file " <> ToString[i], {i, 30}];
Run Code Online (Sandbox Code Playgroud)
然后你可以做类似的事情
BarChart3D[{
MapThread[Tooltip[#2, Row[{#, " -- ", #2}]] &, {filelist, list3}],
MapThread[Tooltip[#2, Row[{#, " -- ", #2}]] &, {filelist, list4}]},
BarSpacing -> {0.5, 0}, ChartLayout -> "Grid"]
Run Code Online (Sandbox Code Playgroud)
编辑
另一种方法是使用LabelingFunction:
BarChart3D[{list3, list4},
LabelingFunction ->
(Placed[Row[{filelist[[Last[#2]]], " -- ", #1}], Tooltip] &),
ChartLayout -> "Grid", BarSpacing -> {0.5, 0}]
Run Code Online (Sandbox Code Playgroud)