如何用C#解析文本文件

Question

如何用C#解析文本文件

通过文本格式我意味着更复杂的东西.

起初我开始手动将我问这个问题的文本文件中的5000行添加到我的项目中.

文本文件有5000行,长度不同.例如:

1   1   ITEM_ETC_GOLD_01    ??(?)   xxx xxx xxx_TT_DESC 0   0   3   3   5   0   180000  3   0   1   0   0   255 1   1   0   0   0   0   0   0   0   0   0   0   -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_money_small.bsr    xxx xxx xxx 0   2   0   0   1   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1   ??? ??? ?(param1??) -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0

1   4   ITEM_ETC_HP_POTION_01   HP ?? ??    xxx SN_ITEM_ETC_HP_POTION_01    SN_ITEM_ETC_HP_POTION_01_TT_DESC    0   0   3   3   1   1   180000  3   0   1   1   1   255 3   1   0   0   1   0   60  0   0   0   1   21  -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_bag.bsr    item\etc\hp_potion_01.ddj   xxx xxx 50  2   0   0   1   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 120 HP???   0   HP???(%)    0   MP???   0   MP???(%)    -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0

1   5   ITEM_ETC_HP_POTION_02   HP ??? (?)  xxx SN_ITEM_ETC_HP_POTION_02    SN_ITEM_ETC_HP_POTION_02_TT_DESC    0   0   3   3   1   1   180000  3   0   1   1   1   255 3   1   0   0   1   0   110 0   0   0   2   39  -1  0   -1  0   -1  0   -1  0   -1  0   0   0   0   0   0   0   100 0   0   0   xxx item\etc\drop_ch_bag.bsr    item\etc\hp_potion_02.ddj   xxx xxx 50  2   0   0   2   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0   0   0   0   0   0   0   0   0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 220 HP???   0   HP???(%)    0   MP???   0   MP???(%)    -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx -1  xxx 0   0

Run Code Online (Sandbox Code Playgroud)

第一个字符(1)和第二个字符(1/4/5)之间的文本不是空格,它是一个制表符.该文本文件中没有空格.

我想要的是:

我想获得第二个整数(在我上面发布的三行中,第二个整数是1,4和5),每行中间的字符串表示路径(以"item"开头,以文件扩展名".ddj").

我的问题:

当我谷歌"文本格式化C#" - 我得到的是如何打开文本文件以及如何在C#中编写文本文件.我不知道如何在文本文件中搜索文本.我也无法搜索对于第一个整数,因为如果它是一个像我上面发布的三行中的小整数,我将无法找到正确的位置,因为例如"1"可能存在于不同的位置.

我的问题:

这将是最好的如果我写一个程序,将删除任何东西,但我需要什么.

在我的脑海中另一种方式是直接搜索该文件,但正如我上面提到的 - 如果它太低,我可能会得到第二个整数的错误位置.

请提出建议,我不能手工格式化这一切.

Answer 1

Sam*_*war 53

好的,这就是我们的工作:打开文件,逐行读取,然后按标签拆分.然后我们抓住第二个整数并遍历其余整数以找到路径.

StreamReader reader = File.OpenText("filename.txt");
string line;
while ((line = reader.ReadLine()) != null) 
{
    string[] items = line.Split('\t');
    int myInteger = int.Parse(items[1]);   // Here's your integer.

    // Now let's find the path.
    string path = null;
    foreach (string item in items) 
    {
        if (item.StartsWith("item\\") && item.EndsWith(".ddj"))
            path = item;
    }

    // At this point, `myInteger` and `path` contain the values we want
    // for the current line. We can then store those values or print them,
    // or anything else we like.
}

Run Code Online (Sandbox Code Playgroud)

看起来现在在.NET 4中,您需要实例化一个FileInfo对象,然后在其上调用OpenText().即,`FileInfo fi = new FileInfo("filename.txt"); StreamReader reader = fi.OpenText();` (10认同)
大.我在这台机器上没有C#编译器,所以我不得不放弃它.很高兴听到它开箱即用. (2认同)

Answer 2

Sam*_*war 35

另一个解决方案,这次使用正则表达式:

using System.Text.RegularExpressions;

...

Regex parts = new Regex(@"^\d+\t(\d+)\t.+?\t(item\\[^\t]+\.ddj)");

StreamReader reader = FileInfo.OpenText("filename.txt");
string line;
while ((line = reader.ReadLine()) != null) {
    Match match = parts.Match(line);
    if (match.Success) {
        int number = int.Parse(match.Group(1).Value);
        string path = match.Group(2).Value;

        // At this point, `number` and `path` contain the values we want
        // for the current line. We can then store those values or print them,
        // or anything else we like.
    }
}

Run Code Online (Sandbox Code Playgroud)

那个表达有点复杂,所以在这里分解:

^        Start of string
\d+      "\d" means "digit" - 0-9. The "+" means "one or more."
         So this means "one or more digits."
\t       This matches a tab.
(\d+)    This also matches one or more digits. This time, though, we capture it
         using brackets. This means we can access it using the Group method.
\t       Another tab.
.+?      "." means "anything." So "one or more of anything". In addition, it's lazy.
         This is to stop it grabbing everything in sight - it'll only grab as much
         as it needs to for the regex to work.
\t       Another tab.

(item\\[^\t]+\.ddj)
    Here's the meat. This matches: "item\<one or more of anything but a tab>.ddj"

Run Code Online (Sandbox Code Playgroud)

我不知道你接受哪些答案,两者都很有效.我更喜欢这个,因为你解释了为什么我以前从未见过这个! (2认同)

Answer 3

eri*_*len 5

你可以这样做:

using (TextReader rdr = OpenYourFile()) {
    string line;
    while ((line = rdr.ReadLine()) != null) {
        string[] fields = line.Split('\t'); // THIS LINE DOES THE MAGIC
        int theInt = Convert.ToInt32(fields[1]);
    }
}

Run Code Online (Sandbox Code Playgroud)

搜索"格式化"时未找到相关结果的原因是您正在执行的操作称为"解析".

归档时间：	16 年，3 月前
查看次数：	141152 次
最近记录：	7 年，3 月前