我有一个逗号分隔文件,其中许多行类似于下面的一行.
Sachin,,M,"Maths,Science,English",Need to improve in these subjects.
Run Code Online (Sandbox Code Playgroud)
引号用于转义用于表示多个值的分隔符逗号.
现在如何使用String.split()if来分割逗号分隔符上的上述值?
Ach*_*Jha 172
public static void main(String[] args) {
String s = "Sachin,,M,\"Maths,Science,English\",Need to improve in these subjects.";
String[] splitted = s.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
System.out.println(Arrays.toString(splitted));
}
Run Code Online (Sandbox Code Playgroud)
输出:
[Sachin, , M, "Maths,Science,English", Need to improve in these subjects.]
Run Code Online (Sandbox Code Playgroud)
Men*_*los 17
由于您的问题/要求并不是那么复杂,因此可以使用自定义方法,其执行速度提高20倍以上并产生相同的结果.这是基于数据大小和解析行数的变量,对于使用正则表达式的更复杂问题是必须的.
import java.util.Arrays;
import java.util.ArrayList;
public class SplitTest {
public static void main(String[] args) {
String s = "Sachin,,M,\"Maths,Science,English\",Need to improve in these subjects.";
String[] splitted = null;
//Measure Regular Expression
long startTime = System.nanoTime();
for(int i=0; i<10; i++)
splitted = s.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
long endTime = System.nanoTime();
System.out.println("Took: " + (endTime-startTime));
System.out.println(Arrays.toString(splitted));
System.out.println("");
ArrayList<String> sw = null;
//Measure Custom Method
startTime = System.nanoTime();
for(int i=0; i<10; i++)
sw = customSplitSpecific(s);
endTime = System.nanoTime();
System.out.println("Took: " + (endTime-startTime));
System.out.println(sw);
}
public static ArrayList<String> customSplitSpecific(String s)
{
ArrayList<String> words = new ArrayList<String>();
boolean notInsideComma = true;
int start =0, end=0;
for(int i=0; i<s.length()-1; i++)
{
if(s.charAt(i)==',' && notInsideComma)
{
words.add(s.substring(start,i));
start = i+1;
}
else if(s.charAt(i)=='"')
notInsideComma=!notInsideComma;
}
words.add(s.substring(start));
return words;
}
Run Code Online (Sandbox Code Playgroud)
}
在我自己的电脑上,这会产生:
Took: 6651100
[Sachin, , M, "Maths,Science,English", Need to improve in these subjects.]
Took: 224179
[Sachin, , M, "Maths,Science,English", Need to improve in these subjects.]
Run Code Online (Sandbox Code Playgroud)
如果您的字符串都是格式良好的,则可以使用以下正则表达式:
String[] res = str.split(",(?=([^\"]|\"[^\"]*\")*$)");
Run Code Online (Sandbox Code Playgroud)
该表达式确保仅在逗号处发生拆分,该逗号后跟偶数(或零)引号(因此不在此类引号内).
然而,使用简单的非正则表达式解析器可能更容易.
| 归档时间: |
|
| 查看次数: |
58121 次 |
| 最近记录: |