Dav*_*542 35 c# c++ python java excel
我有一个150MB的单页excel文件,使用以下内容在一台非常强大的机器上打开大约需要7分钟:
# using python
import xlrd
wb = xlrd.open_workbook(file)
sh = wb.sheet_by_index(0)
Run Code Online (Sandbox Code Playgroud)
有没有办法更快地打开excel文件?我甚至对非常古怪的建议(例如hadoop,spark,c,java等)持开放态度.理想情况下,我正在寻找一种在30秒内打开文件的方法,如果这不是梦想.另外,上面的例子是使用python,但它不一定是python.
注意:这是来自客户端的Excel文件.在收到之前,它无法转换为任何其他格式.这不是我们的档案
更新:回答一个代码的工作示例将在30秒内打开以下200MB excel文件将获得奖励:https://drive.google.com/file/d/0B_CXvCTOo7_2VW9id2VXRWZrbzQ/view? usp =sharing.该文件应该包含字符串(col 1),date(col 9)和number(col 11).
Jer*_*son 14
大多数使用Office产品的编程语言都有一些中间层,这通常是瓶颈所在,一个很好的例子就是使用PIA的/ Interop或Open XML SDK.
将数据置于较低级别(绕过中间层)的一种方法是使用驱动程序.
150MB单页excel文件,大约需要7分钟.
我能做的最好的是135秒的130MB文件,大约快3倍:
Stopwatch sw = new Stopwatch();
sw.Start();
DataSet excelDataSet = new DataSet();
string filePath = @"c:\temp\BigBook.xlsx";
// For .XLSXs we use =Microsoft.ACE.OLEDB.12.0;, for .XLS we'd use Microsoft.Jet.OLEDB.4.0; with "';Extended Properties=\"Excel 8.0;HDR=YES;\"";
string connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='" + filePath + "';Extended Properties=\"Excel 12.0;HDR=YES;\"";
using (OleDbConnection conn = new OleDbConnection(connectionString))
{
conn.Open();
OleDbDataAdapter objDA = new System.Data.OleDb.OleDbDataAdapter
("select * from [Sheet1$]", conn);
objDA.Fill(excelDataSet);
//dataGridView1.DataSource = excelDataSet.Tables[0];
}
sw.Stop();
Debug.Print("Load XLSX tool: " + sw.ElapsedMilliseconds + " millisecs. Records = " + excelDataSet.Tables[0].Rows.Count);
Run Code Online (Sandbox Code Playgroud)
赢得7x64,Intel i5,2.3ghz,8GB内存,SSD250GB.
如果我也可以推荐硬件解决方案,如果您使用的是标准硬盘驱动器,请尝试使用SSD解决它.
注意:我无法下载您的Excel电子表格示例,因为我在公司防火墙后面.
PS.请参阅MSDN - 使用200 MB数据导入xlsx文件的最快方法, OleDB 的共识是最快的.
PS 2.以下是使用python进行操作的方法:http: //code.activestate.com/recipes/440661-read-tabular-data-from-excel-spreadsheets-the-fast/
Ism*_*sma 11
我设法使用.NET核心和Open XML SDK在大约30秒内读取文件.
以下示例返回包含具有匹配类型的所有行和单元格的对象列表,它支持日期,数字和文本单元格.该项目可在此处获得:https://github.com/xferaa/BigSpreadSheetExample/(适用于Windows,Linux和Mac OS,不需要安装Excel或任何Excel组件).
public List<List<object>> ParseSpreadSheet()
{
List<List<object>> rows = new List<List<object>>();
using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(filePath, false))
{
WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
Dictionary<int, string> sharedStringCache = new Dictionary<int, string>();
int i = 0;
foreach (var el in workbookPart.SharedStringTablePart.SharedStringTable.ChildElements)
{
sharedStringCache.Add(i++, el.InnerText);
}
while (reader.Read())
{
if(reader.ElementType == typeof(Row))
{
reader.ReadFirstChild();
List<object> cells = new List<object>();
do
{
if (reader.ElementType == typeof(Cell))
{
Cell c = (Cell)reader.LoadCurrentElement();
if (c == null || c.DataType == null || !c.DataType.HasValue)
continue;
object value;
switch(c.DataType.Value)
{
case CellValues.Boolean:
value = bool.Parse(c.CellValue.InnerText);
break;
case CellValues.Date:
value = DateTime.Parse(c.CellValue.InnerText);
break;
case CellValues.Number:
value = double.Parse(c.CellValue.InnerText);
break;
case CellValues.InlineString:
case CellValues.String:
value = c.CellValue.InnerText;
break;
case CellValues.SharedString:
value = sharedStringCache[int.Parse(c.CellValue.InnerText)];
break;
default:
continue;
}
if (value != null)
cells.Add(value);
}
} while (reader.ReadNextSibling());
if (cells.Any())
rows.Add(cells);
}
}
}
return rows;
}
Run Code Online (Sandbox Code Playgroud)
我在一台三年前的笔记本电脑上运行该程序,该笔记本电脑配备SSD驱动器,8GB内存和Windows 10 64位上的Intel Core i7-4710 CPU @ 2.50GHz(两个内核).
请注意,虽然打开并将整个文件解析为字符串需要的时间少于30秒,但在我上次编辑的示例中使用对象时,使用我的笔记本电脑时,时间会增加到近50秒.使用Linux,您的服务器可能会接近30秒.
诀窍是使用SAX方法,如下所述:
https://msdn.microsoft.com/en-us/library/office/gg575571.aspx
dev*_*aki 10
好吧,如果您的Excel将像您的示例(https://drive.google.com/file/d/0B_CXvCTOo7_2UVZxbnpRaEVnaFk/view?usp=sharing)一样简单,您可以尝试将文件打开为一个zip文件并直接读取每个xml:
Intel i5 4460,12 GB RAM,SSD Samsung EVO PRO.
如果你有很多内存ram: 这段代码需要很多内存,但需要20~25秒.(你需要参数-Xmx7g)
package com.devsaki.opensimpleexcel;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.nio.charset.Charset;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.zip.ZipFile;
public class Multithread {
public static final char CHAR_END = (char) -1;
public static void main(String[] args) throws IOException, ExecutionException, InterruptedException {
String excelFile = "C:/Downloads/BigSpreadsheetAllTypes.xlsx";
ZipFile zipFile = new ZipFile(excelFile);
long init = System.currentTimeMillis();
ExecutorService executor = Executors.newFixedThreadPool(4);
char[] sheet1 = readEntry(zipFile, "xl/worksheets/sheet1.xml").toCharArray();
Future<Object[][]> futureSheet1 = executor.submit(() -> processSheet1(new CharReader(sheet1), executor));
char[] sharedString = readEntry(zipFile, "xl/sharedStrings.xml").toCharArray();
Future<String[]> futureWords = executor.submit(() -> processSharedStrings(new CharReader(sharedString)));
Object[][] sheet = futureSheet1.get();
String[] words = futureWords.get();
executor.shutdown();
long end = System.currentTimeMillis();
System.out.println("only read: " + (end - init) / 1000);
///Doing somethin with the file::Saving as csv
init = System.currentTimeMillis();
try (PrintWriter writer = new PrintWriter(excelFile + ".csv", "UTF-8");) {
for (Object[] rows : sheet) {
for (Object cell : rows) {
if (cell != null) {
if (cell instanceof Integer) {
writer.append(words[(Integer) cell]);
} else if (cell instanceof String) {
writer.append(toDate(Double.parseDouble(cell.toString())));
} else {
writer.append(cell.toString()); //Probably a number
}
}
writer.append(";");
}
writer.append("\n");
}
}
end = System.currentTimeMillis();
System.out.println("Main saving to csv: " + (end - init) / 1000);
}
private static final DateTimeFormatter formatter = DateTimeFormatter.ISO_DATE_TIME;
private static final LocalDateTime INIT_DATE = LocalDateTime.parse("1900-01-01T00:00:00+00:00", formatter).plusDays(-2);
//The number in excel is from 1900-jan-1, so every number time that you get, you have to sum to that date
public static String toDate(double s) {
return formatter.format(INIT_DATE.plusSeconds((long) ((s*24*3600))));
}
public static String readEntry(ZipFile zipFile, String entry) throws IOException {
System.out.println("Initialing readEntry " + entry);
long init = System.currentTimeMillis();
String result = null;
try (BufferedReader br = new BufferedReader(new InputStreamReader(zipFile.getInputStream(zipFile.getEntry(entry)), Charset.forName("UTF-8")))) {
br.readLine();
result = br.readLine();
}
long end = System.currentTimeMillis();
System.out.println("readEntry '" + entry + "': " + (end - init) / 1000);
return result;
}
public static String[] processSharedStrings(CharReader br) throws IOException {
System.out.println("Initialing processSharedStrings");
long init = System.currentTimeMillis();
String[] words = null;
char[] wordCount = "Count=\"".toCharArray();
char[] token = "<t>".toCharArray();
String uniqueCount = extractNextValue(br, wordCount, '"');
words = new String[Integer.parseInt(uniqueCount)];
String nextWord;
int currentIndex = 0;
while ((nextWord = extractNextValue(br, token, '<')) != null) {
words[currentIndex++] = nextWord;
br.skip(11); //you can skip at least 11 chars "/t></si><si>"
}
long end = System.currentTimeMillis();
System.out.println("SharedStrings: " + (end - init) / 1000);
return words;
}
public static Object[][] processSheet1(CharReader br, ExecutorService executorService) throws IOException, ExecutionException, InterruptedException {
System.out.println("Initialing processSheet1");
long init = System.currentTimeMillis();
char[] dimensionToken = "dimension ref=\"".toCharArray();
String dimension = extractNextValue(br, dimensionToken, '"');
int[] sizes = extractSizeFromDimention(dimension.split(":")[1]);
br.skip(30); //Between dimension and next tag c exists more or less 30 chars
Object[][] result = new Object[sizes[0]][sizes[1]];
int parallelProcess = 8;
int currentIndex = br.currentIndex;
CharReader[] charReaders = new CharReader[parallelProcess];
int totalChars = Math.round(br.chars.length / parallelProcess);
for (int i = 0; i < parallelProcess; i++) {
int endIndex = currentIndex + totalChars;
charReaders[i] = new CharReader(br.chars, currentIndex, endIndex, i);
currentIndex = endIndex;
}
Future[] futures = new Future[parallelProcess];
for (int i = charReaders.length - 1; i >= 0; i--) {
final int j = i;
futures[i] = executorService.submit(() -> inParallelProcess(charReaders[j], j == 0 ? null : charReaders[j - 1], result));
}
for (Future future : futures) {
future.get();
}
long end = System.currentTimeMillis();
System.out.println("Sheet1: " + (end - init) / 1000);
return result;
}
public static void inParallelProcess(CharReader br, CharReader back, Object[][] result) {
System.out.println("Initialing inParallelProcess : " + br.identifier);
char[] tokenOpenC = "<c r=\"".toCharArray();
char[] tokenOpenV = "<v>".toCharArray();
char[] tokenAttributS = " s=\"".toCharArray();
char[] tokenAttributT = " t=\"".toCharArray();
String v;
int firstCurrentIndex = br.currentIndex;
boolean first = true;
while ((v = extractNextValue(br, tokenOpenC, '"')) != null) {
if (first && back != null) {
int sum = br.currentIndex - firstCurrentIndex - tokenOpenC.length - v.length() - 1;
first = false;
System.out.println("Adding to : " + back.identifier + " From : " + br.identifier);
back.plusLength(sum);
}
int[] indexes = extractSizeFromDimention(v);
int s = foundNextTokens(br, '>', tokenAttributS, tokenAttributT);
char type = 's'; //3 types: number (n), string (s) and date (d)
if (s == 0) { // Token S = number or date
char read = br.read();
if (read == '1') {
type = 'n';
} else {
type = 'd';
}
} else if (s == -1) {
type = 'n';
}
String c = extractNextValue(br, tokenOpenV, '<');
Object value = null;
switch (type) {
case 'n':
value = Double.parseDouble(c);
break;
case 's':
try {
value = Integer.parseInt(c);
} catch (Exception ex) {
System.out.println("Identifier Error : " + br.identifier);
}
break;
case 'd':
value = c.toString();
break;
}
result[indexes[0] - 1][indexes[1] - 1] = value;
br.skip(7); ///v></c>
}
}
static class CharReader {
char[] chars;
int currentIndex;
int length;
int identifier;
public CharReader(char[] chars) {
this.chars = chars;
this.length = chars.length;
}
public CharReader(char[] chars, int currentIndex, int length, int identifier) {
this.chars = chars;
this.currentIndex = currentIndex;
if (length > chars.length) {
this.length = chars.length;
} else {
this.length = length;
}
this.identifier = identifier;
}
public void plusLength(int n) {
if (this.length + n <= chars.length) {
this.length += n;
}
}
public char read() {
if (currentIndex >= length) {
return CHAR_END;
}
return chars[currentIndex++];
}
public void skip(int n) {
currentIndex += n;
}
}
public static int[] extractSizeFromDimention(String dimention) {
StringBuilder sb = new StringBuilder();
int columns = 0;
int rows = 0;
for (char c : dimention.toCharArray()) {
if (columns == 0) {
if (Character.isDigit(c)) {
columns = convertExcelIndex(sb.toString());
sb = new StringBuilder();
}
}
sb.append(c);
}
rows = Integer.parseInt(sb.toString());
return new int[]{rows, columns};
}
public static int foundNextTokens(CharReader br, char until, char[]... tokens) {
char character;
int[] indexes = new int[tokens.length];
while ((character = br.read()) != CHAR_END) {
if (character == until) {
break;
}
for (int i = 0; i < indexes.length; i++) {
if (tokens[i][indexes[i]] == character) {
indexes[i]++;
if (indexes[i] == tokens[i].length) {
return i;
}
} else {
indexes[i] = 0;
}
}
}
return -1;
}
public static String extractNextValue(CharReader br, char[] token, char until) {
char character;
StringBuilder sb = new StringBuilder();
int index = 0;
while ((character = br.read()) != CHAR_END) {
if (index == token.length) {
if (character == until) {
return sb.toString();
} else {
sb.append(character);
}
} else {
if (token[index] == character) {
index++;
} else {
index = 0;
}
}
}
return null;
}
public static int convertExcelIndex(String index) {
int result = 0;
for (char c : index.toCharArray()) {
result = result * 26 + ((int) c - (int) 'A' + 1);
}
return result;
}
}
Run Code Online (Sandbox Code Playgroud)
旧答案(不需要参数Xms7g,因此需要更少的内存): 使用HDD打开和读取示例文件大约35秒(200MB),SDD需要少一点(30秒).
这里的代码是:https: //github.com/csaki/OpenSimpleExcelFast.git
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.nio.charset.Charset;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.zip.ZipFile;
public class Launcher {
public static final char CHAR_END = (char) -1;
public static void main(String[] args) throws IOException, ExecutionException, InterruptedException {
long init = System.currentTimeMillis();
String excelFile = "D:/Downloads/BigSpreadsheet.xlsx";
ZipFile zipFile = new ZipFile(excelFile);
ExecutorService executor = Executors.newFixedThreadPool(4);
Future<String[]> futureWords = executor.submit(() -> processSharedStrings(zipFile));
Future<Object[][]> futureSheet1 = executor.submit(() -> processSheet1(zipFile));
String[] words = futureWords.get();
Object[][] sheet1 = futureSheet1.get();
executor.shutdown();
long end = System.currentTimeMillis();
System.out.println("Main only open and read: " + (end - init) / 1000);
///Doing somethin with the file::Saving as csv
init = System.currentTimeMillis();
try (PrintWriter writer = new PrintWriter(excelFile + ".csv", "UTF-8");) {
for (Object[] rows : sheet1) {
for (Object cell : rows) {
if (cell != null) {
if (cell instanceof Integer) {
writer.append(words[(Integer) cell]);
} else if (cell instanceof String) {
writer.append(toDate(Double.parseDouble(cell.toString())));
} else {
writer.append(cell.toString()); //Probably a number
}
}
writer.append(";");
}
writer.append("\n");
}
}
end = System.currentTimeMillis();
System.out.println("Main saving to csv: " + (end - init) / 1000);
}
private static final DateTimeFormatter formatter = DateTimeFormatter.ISO_DATE_TIME;
private static final LocalDateTime INIT_DATE = LocalDateTime.parse("1900-01-01T00:00:00+00:00", formatter).plusDays(-2);
//The number in excel is from 1900-jan-1, so every number time that you get, you have to sum to that date
public static String toDate(double s) {
return formatter.format(INIT_DATE.plusSeconds((long) ((s*24*3600))));
}
public static Object[][] processSheet1(ZipFile zipFile) throws IOException {
String entry = "xl/worksheets/sheet1.xml";
Object[][] result = null;
char[] dimensionToken = "dimension ref=\"".toCharArray();
char[] tokenOpenC = "<c r=\"".toCharArray();
char[] tokenOpenV = "<v>".toCharArray();
char[] tokenAttributS = " s=\"".toCharArray();
char[] tokenAttributT = " t=\"".toCharArray();
try (BufferedReader br = new BufferedReader(new InputStreamReader(zipFile.getInputStream(zipFile.getEntry(entry)), Charset.forName("UTF-8")))) {
String dimension = extractNextValue(br, dimensionToken, '"');
int[] sizes = extractSizeFromDimention(dimension.split(":")[1]);
br.skip(30); //Between dimension and next tag c exists more or less 30 chars
result = new Object[sizes[0]][sizes[1]];
String v;
while ((v = extractNextValue(br, tokenOpenC, '"')) != null) {
int[] indexes = extractSizeFromDimention(v);
int s = foundNextTokens(br, '>', tokenAttributS, tokenAttributT);
char type = 's'; //3 types: number (n), string (s) and date (d)
if (s == 0) { // Token S = number or date
char read = (char) br.read();
if (read == '1') {
type = 'n';
} else {
type = 'd';
}
} else if (s == -1) {
type = 'n';
}
String c = extractNextValue(br, tokenOpenV, '<');
Object value = null;
switch (type) {
case 'n':
value = Double.parseDouble(c);
break;
case 's':
value = Integer.parseInt(c);
break;
case 'd':
value = c.toString();
break;
}
result[indexes[0] - 1][indexes[1] - 1] = value;
br.skip(7); ///v></c>
}
}
return result;
}
public static int[] extractSizeFromDimention(String dimention) {
StringBuilder sb = new StringBuilder();
int columns = 0;
int rows = 0;
for (char c : dimention.toCharArray()) {
if (columns == 0) {
if (Character.isDigit(c)) {
columns = convertExcelIndex(sb.toString());
sb = new StringBuilder();
}
}
sb.append(c);
}
rows = Integer.parseInt(sb.toString());
return new int[]{rows, columns};
}
public static String[] processSharedStrings(ZipFile zipFile) throws IOException {
String entry = "xl/sharedStrings.xml";
String[] words = null;
char[] wordCount = "Count=\"".toCharArray();
char[] token = "<t>".toCharArray();
try (BufferedReader br = new BufferedReader(new InputStreamReader(zipFile.getInputStream(zipFile.getEntry(entry)), Charset.forName("UTF-8")))) {
String uniqueCount = extractNextValue(br, wordCount, '"');
words = new String[Integer.parseInt(uniqueCount)];
String nextWord;
int currentIndex = 0;
while ((nextWord = extractNextValue(br, token, '<')) != null) {
words[currentIndex++] = nextWord;
br.skip(11); //you can skip at least 11 chars "/t></si><si>"
}
}
return words;
}
public static int foundNextTokens(BufferedReader br, char until, char[]... tokens) throws IOException {
char character;
int[] indexes = new int[tokens.length];
while ((character = (char) br.read()) != CHAR_END) {
if (character == until) {
break;
}
for (int i = 0; i < indexes.length; i++) {
if (tokens[i][indexes[i]] == character) {
indexes[i]++;
if (indexes[i] == tokens[i].length) {
return i;
}
} else {
indexes[i] = 0;
}
}
}
return -1;
}
public static String extractNextValue(BufferedReader br, char[] token, char until) throws IOException {
char character;
StringBuilder sb = new StringBuilder();
int index = 0;
while ((character = (char) br.read()) != CHAR_END) {
if (index == token.length) {
if (character == until) {
return sb.toString();
} else {
sb.append(character);
}
} else {
if (token[index] == character) {
index++;
} else {
index = 0;
}
}
}
return null;
}
public static int convertExcelIndex(String index) {
int result = 0;
for (char c : index.toCharArray()) {
result = result * 26 + ((int) c - (int) 'A' + 1);
}
return result;
}
}
Run Code Online (Sandbox Code Playgroud)
Python的Pandas库可用于保存和处理您的数据,但使用它来直接加载.xlsx文件将非常缓慢,例如使用read_excel().
一种方法是使用Python自动使用Excel本身将文件转换为CSV,然后使用Pandas加载生成的CSV文件read_csv().这将为您提供良好的加速,但不会低于30秒:
import win32com.client as win32
import pandas as pd
from datetime import datetime
print ("Starting")
start = datetime.now()
# Use Excel to load the xlsx file and save it in csv format
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.Open(r'c:\full path\BigSpreadsheet.xlsx')
excel.DisplayAlerts = False
wb.DoNotPromptForConvert = True
wb.CheckCompatibility = False
print('Saving')
wb.SaveAs(r'c:\full path\temp.csv', FileFormat=6, ConflictResolution=2)
excel.Application.Quit()
# Use Pandas to load the resulting CSV file
print('Loading CSV')
df = pd.read_csv(r'c:\full path\temp.csv', dtype=str)
print(df.shape)
print("Done", datetime.now() - start)
Run Code Online (Sandbox Code Playgroud)
列类型
的类型为你列可以通过传递指定dtype并converters和parse_dates:
df = pd.read_csv(r'c:\full path\temp.csv', dtype=str, converters={10:int}, parse_dates=[8], infer_datetime_format=True)
Run Code Online (Sandbox Code Playgroud)
您还应该指定infer_datetime_format=True,因为这将大大加快日期转换.
nfer_datetime_format:boolean,默认为False如果启用了True和parse_dates,pandas将尝试推断列中日期时间字符串的格式,如果可以推断,请切换到更快的解析方法.在某些情况下,这可以将解析速度提高5-10倍.
dayfirst=True如果日期在表单中,也要添加DD/MM/YYYY.
选择列
如果您实际上只需要处理列1 9 11,那么您可以通过指定usecols=[0, 8, 10]如下进一步减少资源:
df = pd.read_csv(r'c:\full path\temp.csv', dtype=str, converters={10:int}, parse_dates=[1], dayfirst=True, infer_datetime_format=True, usecols=[0, 8, 10])
Run Code Online (Sandbox Code Playgroud)
结果数据帧将只包含这3列数据.
RAM驱动器
使用RAM驱动器存储临时CSV文件,可以进一步加快加载时间.
注意:这假设您使用的是带有Excel的Windows PC.
我创建了一个示例Java程序,它能够在大约 40 秒内加载我的笔记本电脑(英特尔 i7 4 核,16 GB RAM)文件。
https://github.com/skadyan/largefile
该程序使用Apache POI 库通过XSSF SAX API加载 .xlsx 文件。
回调接口com.stackoverlfow.largefile.RecordHandler实现可用于处理从excel加载的数据。这个接口只定义了一个接受三个参数的方法
data map: Map: excel 单元格引用和 excel 格式的单元格值该类com.stackoverlfow.largefile.Main演示了此接口的一个基本实现,它只是在控制台上打印行号。
更新
woodstox解析器似乎比标准解析器具有更好的性能SAXReader。(代码在 repo 中更新)。
此外,为了满足所需的性能要求,您可以考虑重新实现org.apache.poi...XSSFSheetXMLHandler. 在实现中,可以实现更加优化的字符串/文本值处理,可以跳过不必要的文本格式化操作。