如何使用Scala从Internet下载和保存文件？

Question

如何使用Scala从Internet下载和保存文件？

基本上我在网上有一个文本文件的URL /链接,我试图在本地下载它.由于某种原因,创建/下载的文本文件是空白的.对任何建议开放.谢谢!

    def downloadFile(token: String, fileToDownload: String) {

    val url = new URL("http://randomwebsite.com/docs?t=" + token + "&p=tsr%2F" + fileToDownload)
    val connection = url.openConnection().asInstanceOf[HttpURLConnection]
    connection.setRequestMethod("GET")
    val in: InputStream = connection.getInputStream
    val fileToDownloadAs = new java.io.File("src/test/resources/testingUpload1.txt")
    val out: OutputStream = new BufferedOutputStream(new FileOutputStream(fileToDownloadAs))
    val byteArray = Stream.continually(in.read).takeWhile(-1 !=).map(_.toByte).toArray
    out.write(byteArray)
    }

Run Code Online (Sandbox Code Playgroud)

Answer 1

Che*_*sin 29

我知道这是一个老问题,但我只是遇到了一个非常好的方法:

import sys.process._
import java.net.URL
import java.io.File

def fileDownloader(url: String, filename: String) = {
    new URL(url) #> new File(filename) !!
}

Run Code Online (Sandbox Code Playgroud)

希望这可以帮助.来源.

您现在可以使用fileDownloader函数下载文件.

fileDownloader("http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words", "stop-words-en.txt")

Run Code Online (Sandbox Code Playgroud)

感谢您添加导入! (6认同)
此外，当文件不可用时，似乎无法捕获异常。 (2认同)

Answer 2

Her*_*lme 9

这是一个天真的实现scala.io.Source.fromURL和java.io.FileWriter

def downloadFile(token: String, fileToDownload: String) {
  try {
    val src = scala.io.Source.fromURL("http://randomwebsite.com/docs?t=" + token + "&p=tsr%2F" + fileToDownload)
    val out = new java.io.FileWriter("src/test/resources/testingUpload1.txt")
    out.write(src.mkString)
    out.close
  } catch {
    case e: java.io.IOException => "error occured"
  }
}

Run Code Online (Sandbox Code Playgroud)

你的代码对我有用......还有其他的可能性使得空文件.

但是,如果文件是二进制格式,那么我猜`src.mkString`可能会失败... (4认同)

Answer 3

Xav*_*hot 9

这是一个更安全的替代方案new URL(url) #> new File(filename) !!：

val url = new URL(urlOfFileToDownload)

val connection = url.openConnection().asInstanceOf[HttpURLConnection]
connection.setConnectTimeout(5000)
connection.setReadTimeout(5000)
connection.connect()

if (connection.getResponseCode >= 400)
  println("error")
else
  url #> new File(fileName) !!

Run Code Online (Sandbox Code Playgroud)

两件事情：

从URL对象下载时，如果404返回错误（例如），则该URL对象将抛出FileNotFoundException. 并且由于此异常是从另一个线程生成的（URL恰好在单独的线程上运行），因此简单Try或try/catch将无法捕获异常。因此初步检查响应代码：if (connection.getResponseCode >= 400)。
由于检查响应代码的结果，在连接处可能有时会卡住打开无限期不当的网页（如解释在这里）。这可以通过在连接上设置超时来避免：connection.setReadTimeout(5000)。

最后一行是否打开第二个连接？ (3认同)

Answer 4

Eri*_*ick 5

刷新缓冲区,然后关闭输出流.

归档时间：	11 年，4 月前
查看次数：	12653 次
最近记录：	7 年，2 月前