绕过 java.net.URL 的弃用

Question

绕过 java.net.URL 的弃用

我正在将代码迁移到 Java 20。

\n

在此版本中，java.net.URL#URL(java.lang.String)已被弃用。不幸的是，我在一个类中找不到旧 URL 构造函数的替代品。

\n

package com.github.bottomlessarchive.loa.url.service.encoder;\n\nimport io.mola.galimatias.GalimatiasParseException;\nimport org.springframework.stereotype.Service;\n\nimport java.net.MalformedURLException;\nimport java.net.URI;\nimport java.net.URISyntaxException;\nimport java.net.URL;\nimport java.util.Optional;\n\n/**\n * This service is responsible for encoding existing {@link URL} instances to valid\n * <a href="https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier">resource identifiers</a>.\n */\n@Service\npublic class UrlEncoder {\n\n    /**\n     * Encodes the provided URL to a valid\n     * <a href="https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier">resource identifier</a> and return\n     * the new identifier as a URL.\n     *\n     * @param link the url to encode\n     * @return the encoded url\n     */\n    public Optional<URL> encode(final String link) {\n        try {\n            final URL url = new URL(link);\n\n            // We need to further validate the URL because the java.net.URL\'s validation is inadequate.\n            validateUrl(url);\n\n            return Optional.of(encodeUrl(url));\n        } catch (GalimatiasParseException | MalformedURLException | URISyntaxException e) {\n            return Optional.empty();\n        }\n    }\n\n    private void validateUrl(final URL url) throws URISyntaxException {\n        // This will trigger an URISyntaxException. It is needed because the constructor of java.net.URL doesn\'t always validate the\n        // passed url correctly.\n        new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());\n    }\n\n    private URL encodeUrl(final URL url) throws GalimatiasParseException, MalformedURLException {\n        return io.mola.galimatias.URL.parse(url.toString()).toJavaURL();\n    }\n}\n

Run Code Online (Sandbox Code Playgroud)\n

幸运的是，我也有班级测试：

\n

package com.github.bottomlessarchive.loa.url.service.encoder;\n\nimport org.junit.jupiter.params.ParameterizedTest;\nimport org.junit.jupiter.params.provider.CsvSource;\n\nimport java.net.MalformedURLException;\nimport java.net.URL;\nimport java.util.Optional;\n\nimport static org.assertj.core.api.Assertions.assertThat;\n\nclass UrlEncoderTest {\n\n    private final UrlEncoder underTest = new UrlEncoder();\n\n    @ParameterizedTest\n    @CsvSource(\n            value = {\n                    "http://www.example.com/?test=Hello world,http://www.example.com/?test=Hello%20world",\n                    "http://www.example.com/?test=\xc5\x90\xc3\x9a\xc5\x91\xc3\xba\xc5\xb0\xc3\x9c\xc5\xb1\xc3\xbc,http://www.example.com/?test=%C5%90%C3%9A%C5%91%C3%BA%C5%B0%C3%9C%C5%B1%C3%BC",\n                    "http://www.example.com/?test=random word \xc2\xa3500 bank $,"\n                            + "http://www.example.com/?test=random%20word%20%C2%A3500%20bank%20$",\n                    "http://www.aquincum.hu/wp-content/uploads/2015/06/Aquincumi-F%C3%BCzetek_14_2008.pdf,"\n                            + "http://www.aquincum.hu/wp-content/uploads/2015/06/Aquincumi-F%C3%BCzetek_14_2008.pdf",\n                    "http://www.aquincum.hu/wp-content/uploads/2015/06/Aquincumi-F%C3%BCzetek_14 _2008.pdf,"\n                            + "http://www.aquincum.hu/wp-content/uploads/2015/06/Aquincumi-F%C3%BCzetek_14%20_2008.pdf"\n            }\n    )\n    void testEncodeWhenUsingValidUrls(final String urlToEncode, final String expected) throws MalformedURLException {\n        final Optional<URL> result = underTest.encode(urlToEncode);\n\n        assertThat(result)\n                .contains(new URL(expected));\n    }\n\n    @ParameterizedTest\n    @CsvSource(\n            value = {\n                    "http://\xd0\xbf\xd1\x80\xd0\xbe\xd0\xbc\xd0\xba\xd0\xb0\xd1\x82\xd0\xb0\xd0\xbb\xd0\xbe\xd0\xb3.\xd1\x80\xd1\x84/PublicDocuments/05-0211-00.pdf"\n            }\n    )\n    void testEncodeWhenUsingInvalidUrls(final String urlToEncode) {\n        final Optional<URL> result = underTest.encode(urlToEncode);\n\n        assertThat(result)\n                .isEmpty();\n    }\n}\n

Run Code Online (Sandbox Code Playgroud)\n

它使用的唯一依赖项是galamatias URL 库。

\n

有谁知道如何new URL(link)在保持功能相同的情况下删除代码片段？

\n

我尝试了各种方法，例如使用java.net.URI#create，但它没有产生与之前的解决方案相同的确切结果。例如，包含非编码字符（如空格）的 URL 会http://www.example.com/?test=Hello world导致 IllegalArgumentException。这是由 URL 类解析的，没有给出错误（我的数据包含很多这样的错误）。http://\xd0\xbf\xd1\x80\xd0\xbe\xd0\xbc\xd0\xba\xd0\xb0\xd1\x82\xd0\xb0\xd0\xbb\xd0\xbe\xd0\xb3.\xd1\x80\xd1\x84/PublicDocuments/05-0211-00.pdf此外，使用 URI.create 成功将URL 转换失败的链接（例如）转换为 URI。

\n

Answer 1

Ant*_*oly 8

问题

\n

主要问题似乎是该UrlEncoder服务正在处理编码、未编码和部分编码的 URL 的混合。更重要的是，没有什么好方法可以知道哪个是哪个。

\n

这会导致歧义，因为某些字符在编码和未编码时可能具有不同的含义。例如，给定一个部分编码的 URL，判断一个字符是否是\'&\'查询参数的一部分（因此应该被编码）或充当分隔符（因此不应该被编码）并不是一件容易的事：

\n

https://www.example.com/test?firstQueryParam=hot%26cold&secondQueryParam=test\n

Run Code Online (Sandbox Code Playgroud)\n

雪上加霜的是，URI由于历史/向后兼容性原因，Java 的实现偏离了 RFC 3986 和 RFC 3987。这里有一篇关于 URI 的一些怪癖的有趣读物：Updating URI support for RFC 3986 and RFC 3987 in the JDK。

\n

在没有正确了解原始 URL 的情况下通过重新编码来“修复”错误编码的 URL 并不是一个小问题。使用充满怪癖的编码器和解码器修复错误编码的 URL 更加困难。我的建议是一个足够好的“尽力而为”启发式。

\n

一个简单的尽力解决方案

\n

好消息是我已经成功实现了一个通过上述所有测试的解决方案。该解决方案利用 Spring WebUriUtils和UriComponentsBuilder. 蛋糕上的樱桃是你可能不再需要加利马提亚斯了。

\n

这是代码：

\n

https://www.example.com/test?firstQueryParam=hot%26cold&secondQueryParam=test\n

Run Code Online (Sandbox Code Playgroud)\n

这是它的要点：

\n

reencode\xe2\x86\x92 通过解码和重新编码来“修复”URL 编码的最佳尝试
parseServerAuthority()\xe2\x86\x92 作为前一种validateUrl(url)方法的替代方法。

\n

双编码 & 符号和其他特殊字符

\n

如前所述，虽然上面的代码通过了所有测试。提出一个“损坏的”测试用例很容易。例如，通过编码器运行上面的 URL 将导致：

\n

https://www.example.com/test?firstQueryParam=hot&cold&secondQueryParam=test\n

Run Code Online (Sandbox Code Playgroud)\n

这是一个完全有效的 URL，但可能不是人们想要的。

\n

我们现在正进入危险的领域，但有一些方法可以实现更“固执己见”的重新编码算法。例如，下面的代码通过确保不被解码%26来处理＆符号：

\n

private final char PERCENT_SIGN = \'%\';\nprivate final String ENCODED_PERCENT_SIGN = "25";\nprivate final String[] CODES_TO_DOUBLE_ENCODE = new String[]{\n        "26" // code for \'&\'\n};\n\nprivate URI reencode(String url) throws URISyntaxException {\n    final String urlWithDoubleEncodedSpecialCharacters = doubleEncodeSpecialCharacters(url);\n    final String decodedUrl = UriUtils.decode(urlWithDoubleEncodedSpecialCharacters, StandardCharsets.UTF_8);\n    final String encodedUrl = UriComponentsBuilder.fromHttpUrl(decodedUrl).toUriString();\n    final String encodedUrlWithSpecialCharacters = decodeDoubleEncodedSpecialCharacters(encodedUrl);\n\n    return URI.create(encodedUrlWithSpecialCharacters);\n}\n\nprivate String doubleEncodeSpecialCharacters(String url) {\n    final StringBuilder sb = new StringBuilder(url);\n    for (String code : CODES_TO_DOUBLE_ENCODE) {\n        final String codeString = PERCENT_SIGN + code;\n        int index = sb.indexOf(codeString);\n        while (index != -1) {\n            sb.insert(index + 1, ENCODED_PERCENT_SIGN);\n            index = sb.indexOf(codeString, index + 3);\n        }\n    }\n    return sb.toString();\n}\n\nprivate String decodeDoubleEncodedSpecialCharacters(String url) {\n    final StringBuilder sb = new StringBuilder(url);\n    for (String code : CODES_TO_DOUBLE_ENCODE) {\n        final String codeString = PERCENT_SIGN + ENCODED_PERCENT_SIGN + code;\n        int index = sb.indexOf(codeString);\n        while (index != -1) {\n            sb.delete(index + 2, index + 4);\n            index = sb.indexOf(codeString, index + 5);\n        }\n    }\n    return sb.toString();\n}\n

Run Code Online (Sandbox Code Playgroud)\n

可以修改上面的解决方案以处理其他转义序列（例如，处理所有RFC 3986 的保留字符），以及使用更复杂的启发式方法（例如，对查询参数执行不同的操作，例如，路径参数）。

\n

然而，作为一个曾经陷入这个兔子洞的人，我可以告诉你，一旦你知道你正在处理超出你控制范围的错误编码的 URL，那么根本就没有完美的解决方案。

\n

归档时间：	2 年，5 月前
查看次数：	2265 次
最近记录：	2 年，2 月前