标签: text-processing

内联LaTeX\input命令

我正在寻找一个程序来递归内联\input{}LaTeX文件中的所有命令.通过"递归",我的意思是迭代地进行内联,直到\input{}最终的LaTeX文件中没有命令.

我已经遇到了扁平的包裹.但是,出于某种原因,我的TeXLive发行版没有安装它.当我执行命令时sudo tlmgr show flatten,我收到错误消息:tlmgr: cannot find flatten.所以,我正在寻找更标准,更容易安装的替代工具.

bash shell latex text-processing tex

2
推荐指数

1
解决办法

1175
查看次数

是否有任何众所周知的算法来检测名称的存在？

例如,给定一个字符串:

"鲍勃和他的朋友吉姆史密斯一起钓鱼."

鲍勃和吉姆史密斯都是名字,但鲍勃和史密斯都是文字.如果不是因为他们是大写的,那么除了我们对句子的了解之外,对此的指示就会减少.是否有任何众所周知的算法来检测名称的存在,至少是西方名称？

algorithm text-processing nlp

2
推荐指数

1
解决办法

195
查看次数

生成和打印身份证时模糊的文字

我正在通过.NET生成身份证,我遇到了一个问题,即我插入的动态文本显得如此模糊,以至于我不得不使用粗体字来让它被勉强接受.

我目前在做什么:

抓取图像"框架".
抓住员工的照片.
合并他们.
从生成的图像创建新的位图.
在位图顶部添加两组文本(FontBrush颜色设置为黑色).
保存图像,PNG并以最高质量保存.

在生成图像以改善PVC身份证上的打印时,有什么要做的吗？

    public TextOnImage AddText(string message, Font font, PointF point)
    {
        using (Graphics g = Graphics.FromImage(Image))
        {
            g.CompositingQuality = CompositingQuality.HighQuality;
            g.SmoothingMode = SmoothingMode.HighQuality;
            g.InterpolationMode = InterpolationMode.HighQualityBicubic;
            //g.TextContrast = 0;
            //g.TextRenderingHint = TextRenderingHint.AntiAlias; <-- Still didn't work
            g.DrawString(message, font, Brush, point, StringFormat);
        }

        return this;
    }

Run Code Online (Sandbox Code Playgroud)

.net c# asp.net text-processing image-processing

2
推荐指数

1
解决办法

2354
查看次数

如何从"<"和">"之间提取电子邮件地址？

我有一个来自Outlook的分组电子邮件和名称,分号分隔,如下所示:

fname lname <email>; fname2 lname2 <email2>; ... ; fnameN lnameN <emailN>

Run Code Online (Sandbox Code Playgroud)

我想提取电子邮件和分号分隔它们像这样:

email1; email2; ... ; emailN

Run Code Online (Sandbox Code Playgroud)

我怎么能用Python做到这一点？

python email text-processing string-formatting

2
推荐指数

1
解决办法

660
查看次数

在*nix环境中,如何将列组合在一起？

我有以下文本文件:

A,B,C
A,B,C
A,B,C

Run Code Online (Sandbox Code Playgroud)

有没有办法,使用标准的*nix工具(cut,grep,awk,sed等)来处理这样的文本文件并获得以下输出:

A
A
A
B
B
B
C
C
C

Run Code Online (Sandbox Code Playgroud)

unix linux text-processing command-line-interface

2
推荐指数

1
解决办法

96
查看次数

排除匹配的字符串python re.findall

我使用python的re.findall方法在输入字符串中查找某些字符串值的出现.例如,从"ABCdef"字符串中搜索,我有两个搜索要求.

从Single Capital字母开始查找字符串.
1找到包含所有大写字母的字符串.

例如输入字符串和预期输出将是:

'USA' -- output: ['USA']
'BObama' -- output: ['B', 'Obama']
'Institute20CSE' -- output: ['Institute', '20', 'CSE']

所以我的期望来自

>>> matched_value_list = re.findall ( '[A-Z][a-z]+|[A-Z]+' , 'ABCdef' )

Run Code Online (Sandbox Code Playgroud)

是要回来了['AB', 'Cdef'].

但这似乎并没有发生.我得到的是['ABC']返回值,它将正则表达式的后续部分与完整字符串匹配.

那么我们有什么方法可以忽略找到的匹配.所以一旦'Cdef'匹配'[A-Z][a-z]+'.正则表达式的第二部分(即'[A-Z]+')只与剩余的字符串匹配'AB'？

python regex string text-processing

2
推荐指数

1
解决办法

2515
查看次数

将文本功能名称链接到其tfidf值

我正在使用scikit-learn从一个"文字袋"文本中提取文本特征(文本在单个单词上标记).为此,我使用TfidfVectorizer来减轻非常频繁的单词的重量(即:"a","the"等).

text = 'Some text, with a lot of words...'
tfidf_vectorizer = TfidfVectorizer(
    min_df=1,  # min count for relevant vocabulary
    max_features=4000,  # maximum number of features
    strip_accents='unicode',  # replace all accented unicode char
    # by their corresponding  ASCII char
    analyzer='word',  # features made of words
    token_pattern=r'\w{4,}',  # tokenize only words of 4+ chars
    ngram_range=(1, 1),  # features made of a single tokens
    use_idf=True,  # enable inverse-document-frequency reweighting
    smooth_idf=True,  # prevents zero division for unseen words
    sublinear_tf=False)

# vectorize …

Run Code Online (Sandbox Code Playgroud)

python text-processing machine-learning scikit-learn

2
推荐指数

1
解决办法

4561
查看次数

Shell/Bash解析文本文件

我有这个文本文件,看起来像这样

Item:
SubItem01
SubItem02
SubItem03
Item2:
SubItem0201
SubItem0202
Item3:
SubItem0301
...etc...

Run Code Online (Sandbox Code Playgroud)

我需要的是让它看起来像这样:

Item=>SubItem01
Item=>SubItem02
Item=>SubItem03
Item2=>SubItem0201
Item2=>SubItem0202
Item3=>SubItem0301

Run Code Online (Sandbox Code Playgroud)

我知道这个事实,我需要两个for循环才能得到它.我做了一些测试,但是......好吧,它并没有结束.

for(( c=1; c<=lineCount; c++ ))
do

   var=`sed -n "${c}p" TMPFILE`
   echo "$var"

   if [[ "$var" == *:* ]];
   then
   printf "%s->" $var
   else
   printf "%s\n"
   fi
done

Run Code Online (Sandbox Code Playgroud)

谁能请我回到路上？我尝试了各种各样的方式,但我没有得到任何地方.谢谢.

bash shell awk parsing text-processing

2
推荐指数

1
解决办法

1万
查看次数

拆分由给定对象类型分隔的字符串

如何将字符串拆分为子字符串列表,其中要拆分的分隔符是MATLAB对象类型？

例如:

>> splitByType('a1b2c3',type=integer)
['a','b','c']

Run Code Online (Sandbox Code Playgroud)

要么:

>> splitByType('a1b2c3',type=character)
['1','2','3']

Run Code Online (Sandbox Code Playgroud)

string matlab text-processing split delimiter

2
推荐指数

1
解决办法

126
查看次数

是否有命令在Linux中仅输出部分命令结果？

我的问题是,当我们grep在终端输入命令时,我们得到输出以及标题:

例如:

lscpu | grep MHz

Run Code Online (Sandbox Code Playgroud)

将输出:

CPU MHz:               1216.851

Run Code Online (Sandbox Code Playgroud)

但是,如果我只想要:

1216.851

Run Code Online (Sandbox Code Playgroud)

作为输出？还有其他命令来执行此任务吗？

linux bash shell ubuntu text-processing

2
推荐指数

1
解决办法

511
查看次数

标签统计

text-processing ×10

bash ×3

.net ×1

awk ×1

c# ×1

command-line-interface ×1

image-processing ×1

machine-learning ×1

nlp ×1

scikit-learn ×1

string-formatting ×1

tex ×1

unix ×1

«
1
…
19
20
21
22
23
…
28
»