解析 .NET 中的 URI

Ian*_*oyd 5 c# uri

精简版

.NET 中是否有可以解析 Uri 的类?

背景

Windows 搜索服务通过使用 URI 注册要爬网的内容。使用ISearchCrawlScopeManager您可以枚举各种根 uri:

  • csc://{S-1-5-21-397955417-62688126-188441444-1010}/
  • defaultroot://{S-1-5-21-397955417-62688126-188441444-1010}/
  • file:///C:\
  • file:///D:\
  • iehistory://{S-1-5-21-397955417-62688126-188441444-1010}/
  • mapi://{S-1-5-21-397955417-62688126-188441444-1010}/Outlook2003/Inbox/
  • winrt://{S-1-5-21-397955417-62688126-188441444-1010}/

不幸的是,.NET Uri无法解析这些 Uri ( dotNetFiddle ):

Run-time exception (line 8): Invalid URI: The hostname could not be parsed.

Stack Trace:

[System.UriFormatException: Invalid URI: The hostname could not be parsed.]
   at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
Run Code Online (Sandbox Code Playgroud)

.NET 中是否有可以解析 Uri 的类?

本机 Win32 函数InternetCrackUrl能够正确处理 Uri:

URL_COMPONENTS components;
components.dwStructSize      = sizeof(URL_COMPONENTS );
components.dwSchemeLength    = DWORD(-1);
components.dwHostNameLength  = DWORD(-1);
components.dwUserNameLength  = DWORD(-1);
components.dwPasswordLength  = DWORD(-1);
components.dwUrlPathLength   = DWORD(-1);
components.dwExtraInfoLength = DWORD(-1);

InternetCrackUrl(url, Length(url), 0, ref components);

mapi://{S-1-5-21-397955417-62688126-188441444-1010}/Outlook2003/Inbox/
\__/   \__________________________________________/\_________________/
 |                           |                              _
Scheme                    HostName                       UrlPath

Scheme:   "mapi"
HostName: "{S-1-5-21-397955417-62688126-188441444-1010}"
UrlPath:  "/Outlook2003/Inbox/"
Run Code Online (Sandbox Code Playgroud)

奖金喋喋不休

将 Uri 转义应用于 uri:

  • 之前mapi://{S-1-5-21-397955417-62688126-188441444-1010}/Outlook2003/Inbox/
  • 之后mapi://%7BS-1-5-21-397955417-62688126-188441444-1010%7D/Outlook2003/Inbox/

没有帮助(dotNetFiddle)。

Uri和Url的区别?

Urls 是 Uris 的子集

  • 乌里告诉你一件事
  • Url 告诉你从哪里得到东西

例如:

  • URI :(isbn:1631971727 标识事物)
    • URL :(从isbn://amazon.com/1631971727 哪里得到东西)

网址

URL的细分是:

  foo://iboyd:Trubador@example.com:8042/look/over/there?name=ferret#nose
  \_/   \___/ \______/ \_________/ \__/\______________/\__________/ \__/
   |      |      |         |        |         |            |         |
scheme username password  host     port     path         query    fragment
Run Code Online (Sandbox Code Playgroud)
  • 方案foo
  • 用户名iboyd
  • 密码: Trubador
  • 主持人example.com
  • 端口:8042
  • 路径/look/over/there
  • 查询?name=ferret
  • 片段nose

jon*_*ana 1

ResolveHelper()正如您在堆栈跟踪中看到的那样,调用的方法CreateThis()将其标识为绝对 uri,因此它会引发异常。

更改您的 uri:

mapi://{S-1-5-21-397955417-62688126-188441444-1010}/Outlook2003/Inbox/
Run Code Online (Sandbox Code Playgroud)

到:

mapi:////{S-1-5-21-397955417-62688126-188441444-1010}/Outlook2003/Inbox/
Run Code Online (Sandbox Code Playgroud)

.Net源代码-ResolveHelper()方法

**

来自参考源 .NET Framework 4.7.2:

**

internal static Uri ResolveHelper(Uri baseUri, Uri relativeUri, ref string newUriString, ref bool userEscaped, 
            out UriFormatException e)
        {
            Debug.Assert(!baseUri.IsNotAbsoluteUri && !baseUri.UserDrivenParsing, "Uri::ResolveHelper()|baseUri is not Absolute or is controlled by User Parser.");

            e = null;
            string relativeStr = string.Empty;

            if ((object)relativeUri != null)
            {
                if (relativeUri.IsAbsoluteUri)
                    return relativeUri;

                relativeStr = relativeUri.OriginalString;
                userEscaped = relativeUri.UserEscaped;
            }
            else
                relativeStr = string.Empty;

            // Here we can assert that passed "relativeUri" is indeed a relative one

            if (relativeStr.Length > 0 && (IsLWS(relativeStr[0]) || IsLWS(relativeStr[relativeStr.Length - 1])))
                relativeStr = relativeStr.Trim(_WSchars);

            if (relativeStr.Length == 0)
            {
                newUriString = baseUri.GetParts(UriComponents.AbsoluteUri, 
                    baseUri.UserEscaped ? UriFormat.UriEscaped : UriFormat.SafeUnescaped);
                return null;
            }

            // Check for a simple fragment in relative part
            if (relativeStr[0] == '#' && !baseUri.IsImplicitFile && baseUri.Syntax.InFact(UriSyntaxFlags.MayHaveFragment))
            {
                newUriString = baseUri.GetParts(UriComponents.AbsoluteUri & ~UriComponents.Fragment, 
                    UriFormat.UriEscaped) + relativeStr;
                return null;
            }

            // Check for a simple query in relative part
            if (relativeStr[0] == '?' && !baseUri.IsImplicitFile && baseUri.Syntax.InFact(UriSyntaxFlags.MayHaveQuery))
            {
                newUriString = baseUri.GetParts(UriComponents.AbsoluteUri & ~UriComponents.Query & ~UriComponents.Fragment, 
                    UriFormat.UriEscaped) + relativeStr;
                return null;
            }

            // Check on the DOS path in the relative Uri (a special case)
            if (relativeStr.Length >= 3
                && (relativeStr[1] == ':' || relativeStr[1] == '|')
                && IsAsciiLetter(relativeStr[0])
                && (relativeStr[2] == '\\' || relativeStr[2] == '/'))
            {

                if (baseUri.IsImplicitFile)
                {
                    // It could have file:/// prepended to the result but we want to keep it as *Implicit* File Uri
                    newUriString = relativeStr;
                    return null;
                }
                else if (baseUri.Syntax.InFact(UriSyntaxFlags.AllowDOSPath))
                {
                    // The scheme is not changed just the path gets replaced
                    string prefix;
                    if (baseUri.InFact(Flags.AuthorityFound))
                        prefix = baseUri.Syntax.InFact(UriSyntaxFlags.PathIsRooted) ? ":///" : "://";
                    else
                        prefix = baseUri.Syntax.InFact(UriSyntaxFlags.PathIsRooted) ? ":/" : ":";

                    newUriString = baseUri.Scheme + prefix + relativeStr;
                    return null;
                }
                // If we are here then input like "http://host/path/" + "C:\x" will produce the result  http://host/path/c:/x
            }


            ParsingError err = GetCombinedString(baseUri, relativeStr, userEscaped, ref newUriString);

            if (err != ParsingError.None)
            {
                e = GetException(err);
                return null;
            }

            if ((object)newUriString == (object)baseUri.m_String)
                return baseUri;

            return null;
        }
Run Code Online (Sandbox Code Playgroud)