如何从不同于网络的本地文件(通过Amazon S3)读取InputStream对象?

Cla*_*ied 8 java inputstream file local amazon-s3

我认为从本地文件读取的输入流对象与网络源(本例中为Amazon S3)的输入流对象之间没有区别,所以希望有人可以启发我.

这些程序在运行Centos 6.3的VM上运行.两种情况下的测试文件都是10MB.

本地文件代码:

    InputStream is = new FileInputStream("/home/anyuser/test.jpg");

    int read = 0;
    int buf_size = 1024 * 1024 * 2;
    byte[] buf = new byte[buf_size];

    ByteArrayOutputStream baos = new ByteArrayOutputStream(buf_size);

    long t3 = System.currentTimeMillis();
    int i = 0;
    while ((read = is.read(buf)) != -1) {
        baos.write(buf,0,read);
        System.out.println("reading for the " + i + "th time");
        i++;
    }
    long t4 = System.currentTimeMillis();
    System.out.println("Time to read = " + (t4-t3) + "ms");
Run Code Online (Sandbox Code Playgroud)

这段代码的输出是这样的:它读取5次,这是有意义的,因为读入的缓冲区大小是2MB,文件是10MB.

reading for the 0th time
reading for the 1th time
reading for the 2th time
reading for the 3th time
reading for the 4th time
Time to read = 103ms
Run Code Online (Sandbox Code Playgroud)

现在,我们使用相同的10MB测试文件运行相同的代码,除了这次,源来自Amazon S3.在我们完成从S3获取流之前,我们不会开始阅读.然而,这次,读取循环运行了数千次,而它应该只读取5次.

    InputStream is;
    long t1 = System.currentTimeMillis();
    is = getS3().getFileFromBucket(S3Path,input);
    long t2 = System.currentTimeMillis();

    System.out.print("Time to get file " + input + " from S3: ");
    System.out.println((t2-t1) + "ms");

    int read = 0;
    int buf_size = 1024*1024*2;
    byte[] buf = new byte[buf_size];

    ByteArrayOutputStream baos = new ByteArrayOutputStream(buf_size);
    long t3 = System.currentTimeMillis();
    int i = 0;

    while ((read = is.read(buf)) != -1) {
        baos.write(buf,0,read);
        if ((i % 100) == 0)
        System.out.println("reading for the " + i + "th time");
        i++;
    }
    long t4 = System.currentTimeMillis();

    System.out.println("Time to read = " + (t4-t3) + "ms");
Run Code Online (Sandbox Code Playgroud)

输出如下:

Time to get file test.jpg from S3: 2456ms
reading for the 0th time
reading for the 100th time
reading for the 200th time
reading for the 300th time
reading for the 400th time
reading for the 500th time
reading for the 600th time
reading for the 700th time
reading for the 800th time
reading for the 900th time
reading for the 1000th time
reading for the 1100th time
reading for the 1200th time
reading for the 1300th time
reading for the 1400th time
Time to read = 14471ms
Run Code Online (Sandbox Code Playgroud)

读取流所花费的时间从一次运行变为另一次运行.有时需要60秒,有时需要15秒.它不会超过15秒.在程序的每次测试运行中,读取循环仍然会循环1400次以上,即使我认为它应该只是本地文件示例的5倍.

这是源流通过网络时输入流如何工作,即使我们已经从网络源获取文件了?在此先感谢您的帮助.

ime*_*l96 6

我认为它不是特定于java.当您从网络读取时,无论您分配的缓冲区有多大,对操作系统的实际读取调用都将一次返回一个数据包.如果检查读取数据的大小(读取变量),则应显示所使用的网络数据包的大小.

这是人们使用单独的线程从网络读取并通过使用异步i/o技术避免阻塞的原因之一.

  • 我的回答仍符合文件.doc没有说它会在完成读取流之前阻塞,它只会阻塞直到某些输入可用.基础的recv(2)标准表示对于基于流的套接字"数据一旦可用就应该返回给用户,并且不会丢弃任何数据". (2认同)