使用拆分在Java中使用分隔符选项卡"\ t"进行字符串解析

Question

使用拆分在Java中使用分隔符选项卡"\ t"进行字符串解析

我正在处理一个以制表符分隔的字符串.我正在使用该split功能完成此功能,并且它适用于大多数情况.当字段丢失时会出现问题,因此我不会在该字段中获取null,而是获得下一个值.我将解析后的值存储在字符串数组中.

String[] columnDetail = new String[11];
columnDetail = column.split("\t");

Run Code Online (Sandbox Code Playgroud)

任何帮助,将不胜感激.如果可能的话,我想将解析后的字符串存储到字符串数组中,以便我可以轻松访问解析后的数据.

Answer 1

Fil*_*erg 84

String.split使用正则表达式,您也不需要为拆分分配额外的数组.

split方法会给你一个列表.,问题是你试图预先定义一个标签出现的次数,但你怎么会真的知道呢？尝试使用Scanner或StringTokenizer,只需了解分割字符串的工作原理.

让我解释为什么不起作用以及为什么你需要\\\\逃避\\.

好的,所以当你使用Split时,它实际上需要一个正则表达式(正则表达式),而在正则表达式中,你想要定义要分割的字符,如果你写了\ t实际上并不意味着\t你想要分割的东西是\t吧？因此,通过写作,\t你告诉你的正则表达式处理器"嘿被分离出来的角色"而不是 "嘿被所有看起来像的人分开\t".请注意区别？使用\意味着逃避某些事情.而\在正则表达式意味着什么你所想的完全不同.

所以这就是你需要使用这个解决方案的原因:

\\t

Run Code Online (Sandbox Code Playgroud)

告诉正则表达式处理器寻找\ t.好的,那你为什么需要两个em？好吧,第一个\逃脱第二个,这意味着它将如下所示:\ t当你处理文本时!

现在让我们说你要分开\

那么你会留下\\但是看,那不行!因为\会试图逃避以前的char!这就是为什么你希望输出为\\因此你需要\\\\.

我真的希望上面的例子可以帮助您理解为什么您的解决方案不起作用以及如何征服其他解决方案!

现在,我之前已经给你这个答案,也许你现在应该开始看它们.

其他方法

StringTokenizer的

您应该查看StringTokenizer,它是这类工作的一个非常方便的工具.

例

 StringTokenizer st = new StringTokenizer("this is a test");
 while (st.hasMoreTokens()) {
     System.out.println(st.nextToken());
 }

Run Code Online (Sandbox Code Playgroud)

这将输出

 this
 is
 a
 test

Run Code Online (Sandbox Code Playgroud)

您使用StringTokenizer的Second Constructor来设置分隔符:

StringTokenizer(String str, String delim)

扫描器

您也可以使用扫描仪,因为其中一位评论员说这看起来有点像这样

例

 String input = "1 fish 2 fish red fish blue fish";

 Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");

 System.out.println(s.nextInt());
 System.out.println(s.nextInt());
 System.out.println(s.next());
 System.out.println(s.next());

 s.close();

Run Code Online (Sandbox Code Playgroud)

输出将是

 1
 2
 red
 blue

Run Code Online (Sandbox Code Playgroud)

这意味着它会删除"鱼"这个词,然后用"鱼"作为分隔符给你休息.

从Java API获取的示例

输出是相同的,如果你使用"\ t"或"\\ t",我不知道你为什么要使用StringTokenizer和Scanner.此外,String.split比其他两个文档简单得多,并且每个文档"StringTokenizer是一个遗留类,出于兼容性原因而保留,尽管在新代码中不鼓励使用它." (6认同)
但是,正则表达式在分割选项卡时不应该咬你. (3认同)
使用正则表达式解析XML总是错误的. (3认同)
-1 - "\ t"或"\\ t"的错误信息(http://stackoverflow.com/a/3762377/281545) - 请编辑 (2认同)

Answer 2

小智 20

试试这个:

String[] columnDetail = column.split("\t", -1);

Run Code Online (Sandbox Code Playgroud)

阅读String.split上的Javadoc (java.lang.String,int)以获取有关split函数的limit参数的说明:

split

public String[] split(String regex, int limit)
Splits this string around matches of the given regular expression.
The array returned by this method contains each substring of this string that is terminated by another substring that matches the given expression or is terminated by the end of the string. The substrings in the array are in the order in which they occur in this string. If the expression does not match any part of the input then the resulting array has just one element, namely this string.

The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.

The string "boo:and:foo", for example, yields the following results with these parameters:

Regex   Limit   Result
:   2   { "boo", "and:foo" }
:   5   { "boo", "and", "foo" }
:   -2  { "boo", "and", "foo" }
o   5   { "b", "", ":and:f", "", "" }
o   -2  { "b", "", ":and:f", "", "" }
o   0   { "b", "", ":and:f" }

Run Code Online (Sandbox Code Playgroud)

当最后几个字段(我的客人那是您的情况)丢失时,您将获得如下列:

field1\tfield2\tfield3\t\t

Run Code Online (Sandbox Code Playgroud)

如果没有为split()设置限制,则限制为0,这将导致"尾随空字符串将被丢弃".所以你只能得到3个字段,{"field1","field2","field3"}.

当limit设置为-1时,非正值,不会丢弃尾随空字符串.所以你可以获得5个字段,其中最后两个是空字符串,{"field1","field2","field3","",""}.

Answer 3

Mr_*_*s_D 6

好吧没有人回答 - 这部分是问题的错误:输入字符串包含11个字段(这可以推断出很多)但有多少个标签？大多数可能是 10.然后答案是

String s = "\t2\t\t4\t5\t6\t\t8\t\t10\t";
String[] fields = s.split("\t", -1);  // in your case s.split("\t", 11) might also do
for (int i = 0; i < fields.length; ++i) {
    if ("".equals(fields[i])) fields[i] = null;
}
System.out.println(Arrays.asList(fields));
// [null, 2, null, 4, 5, 6, null, 8, null, 10, null]
// with s.split("\t") : [null, 2, null, 4, 5, 6, null, 8, null, 10]

Run Code Online (Sandbox Code Playgroud)

如果字段碰巧包含选项卡,当然这将无法按预期工作.
该-1方法:应用模式作为根据需要多次-所以尾随字段(11日)将被保留(为空字符串("")如果不存在,这就需要进行转向null明确).

另一方面,如果没有缺少字段的选项卡 - 那么"5\t6"只有包含字段5,6的有效输入字符串 - 无法获得fields[]via分割.

Answer 4

Luk*_*ood 5

String.split 如果制表符分隔字段中的数据本身包含换行符,制表符和可能的"字符",则实现将具有严重的限制.

TAB划分的格式已经出现在驴子的年代,但格式不规范且各不相同.许多实现不会转义字段中出现的字符(换行符和制表符).相反,它们遵循CSV约定并将所有非平凡字段包装在"双引号"中.然后他们只逃脱双引号.因此,"线"可以延伸到多条线.

读到我听说"只是重用apache工具",这听起来不错.

最后我个人选择了opencsv.我发现它很轻,因为它提供了转义和引用字符的选项,它应该涵盖大多数流行的逗号和制表符分隔的数据格式.

例:

CSVReader tabFormatReader = new CSVReader(new FileReader("yourfile.tsv"), '\t');

Run Code Online (Sandbox Code Playgroud)

归档时间：	16 年，3 月前
查看次数：	207417 次
最近记录：	7 年前