Android-从PDF获取文本

Shy*_*dda 9 pdf android

我想从SD卡中的PDF文件中读取文本.如何从存储在SD卡中的PDF文件中获取文本?

我尝试过:

public class MainActivity extends ActionBarActivity implements TextToSpeech.OnInitListener {

    private TextToSpeech tts;
    private String line = null;

    @Override
    public void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);

        tts = new TextToSpeech(getApplicationContext(), this);

        final TextView text1 = (TextView) findViewById(R.id.textView1);

        findViewById(R.id.button1).setOnClickListener(new OnClickListener() {

            private String[] arr;

            @Override
            public void onClick(View v) {
                File sdcard = Environment.getExternalStorageDirectory();

                // Get the text file

                File file = new File(sdcard, "test.pdf");

                // ob.pathh
                // Read text from file

                StringBuilder text = new StringBuilder();
                try {
                    BufferedReader br = new BufferedReader(new                            FileReader(file));

                    // int i=0;
                    List<String> lines = new ArrayList<String>();

                    while ((line = br.readLine()) != null) {
                        lines.add(line);
                        // arr[i]=line;
                        // i++;
                        text.append(line);
                        text.append('\n');
                    }
                    for (String string : lines) {
                        tts.speak(string, TextToSpeech.SUCCESS, null);
                    }
                    arr = lines.toArray(new String[lines.size()]);
                    System.out.println(arr.length);
                    text1.setText(text);

                } catch (Exception e) {
                    e.printStackTrace();
                }

            }
        });

    }

    @Override
    public void onInit(int status) {
        if (status == TextToSpeech.SUCCESS) {
            int result = tts.setLanguage(Locale.US);
            if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED) {
                Log.e("TTS", "This Language is not supported");
            } else {
                // speakOut();
            }

        } else {
            Log.e("TTS", "Initilization Failed!");
        }
    }

}
Run Code Online (Sandbox Code Playgroud)

注意:如果文件是文本文件(test.txt)但不适用于pdf(test.pdf),它工作正常

但是这里的文本并没有像PDF一样,它变得像字节码一样.我怎样才能做到这一点?

提前致谢.

REM*_*ITH 19

我有iText的解决方案.

摇篮,

compile 'com.itextpdf:itextg:5.5.10'
Run Code Online (Sandbox Code Playgroud)

Java中,

  try {
            String parsedText="";
            PdfReader reader = new PdfReader(yourPdfPath);
            int n = reader.getNumberOfPages();
            for (int i = 0; i <n ; i++) {
                parsedText   = parsedText+PdfTextExtractor.getTextFromPage(reader, i+1).trim()+"\n"; //Extracting the content from the different pages
            }
            System.out.println(parsedText);
            reader.close();
        } catch (Exception e) {
            System.out.println(e);
        }
Run Code Online (Sandbox Code Playgroud)