在C++中解析它的最好方法是什么?

ere*_*eOn 8 c c++ parsing boost stl

在我的程序中,我有一个"服务器地址"列表,格式如下:

host[:port]
Run Code Online (Sandbox Code Playgroud)

此处的括号表示该port选项是可选的.

  • host 可以是主机名,IPv4或IPv6地址(可能是"括号括起"表示法).
  • port,如果存在,可以是数字端口号或服务字符串(如:"http"或"ssh").

如果port存在并且host是IPv6地址,host 必须在"托架封闭"表示法(例如:[::1])

以下是一些有效的例子:

localhost
localhost:11211
127.0.0.1:http
[::1]:11211
::1
[::1]
Run Code Online (Sandbox Code Playgroud)

一个无效的例子:

::1:80 // Invalid: Is this the IPv6 address ::1:80 and a default port, or the IPv6 address ::1 and the port 80 ?
::1:http // This is not ambigous, but for simplicity sake, let's consider this is forbidden as well.
Run Code Online (Sandbox Code Playgroud)

我的目标是将这些条目分为两部分(显然hostport).我不关心,如果任一hostport是只要它们不含有非托架封闭无效:(290.234.34.34.5是确定对host,它将在下一工序被拒绝); 我只是想将这两个部分分开,或者如果没有port部分,就要以某种方式知道它.

我试图做一些事情,std::stringstream但我所遇到的一切看起来都很丑陋而且不是很优雅.

你会怎么做C++

我不介意答案C但是C++更喜欢.任何boost解决方案也是受欢迎的.

谢谢.

sbi*_*sbi 9

你看过boost :: spirit吗?但是,对你的任务来说可能有些过分.


Vit*_*con 5

这是一个简单的类,它使用boost :: xpressive来验证IP地址的类型,然后你可以解析其余的以获得结果.

用法:

const std::string ip_address_str = "127.0.0.1:3282";
IpAddress ip_address = IpAddress::Parse(ip_address_str);
std::cout<<"Input String: "<<ip_address_str<<std::endl;
std::cout<<"Address Type: "<<IpAddress::TypeToString(ip_address.getType())<<std::endl;
if (ip_address.getType() != IpAddress::Unknown)
{
    std::cout<<"Host Address: "<<ip_address.getHostAddress()<<std::endl;
    if (ip_address.getPortNumber() != 0)
    {
        std::cout<<"Port Number: "<<ip_address.getPortNumber()<<std::endl;
    }
}
Run Code Online (Sandbox Code Playgroud)

类的头文件,IpAddress.h

#pragma once
#ifndef __IpAddress_H__
#define __IpAddress_H__


#include <string>

class IpAddress
{
public:
    enum Type
    {
        Unknown,
        IpV4,
        IpV6
    };
    ~IpAddress(void);

    /**
     * \brief   Gets the host address part of the IP address.
     * \author  Abi
     * \date    02/06/2010
     * \return  The host address part of the IP address.
    **/
    const std::string& getHostAddress() const;

    /**
     * \brief   Gets the port number part of the address if any.
     * \author  Abi
     * \date    02/06/2010
     * \return  The port number.
    **/
    unsigned short getPortNumber() const;

    /**
     * \brief   Gets the type of the IP address.
     * \author  Abi
     * \date    02/06/2010
     * \return  The type.
    **/
    IpAddress::Type getType() const;

    /**
     * \fn  static IpAddress Parse(const std::string& ip_address_str)
     *
     * \brief   Parses a given string to an IP address.
     * \author  Abi
     * \date    02/06/2010
     * \param   ip_address_str  The ip address string to be parsed.
     * \return  Returns the parsed IP address. If the IP address is
     *          invalid then the IpAddress instance returned will have its
     *          type set to IpAddress::Unknown
    **/
    static IpAddress Parse(const std::string& ip_address_str);

    /**
     * \brief   Converts the given type to string.
     * \author  Abi
     * \date    02/06/2010
     * \param   address_type    Type of the address to be converted to string.
     * \return  String form of the given address type.
    **/
    static std::string TypeToString(IpAddress::Type address_type);
private:
    IpAddress(void);

    Type m_type;
    std::string m_hostAddress;
    unsigned short m_portNumber;
};

#endif // __IpAddress_H__
Run Code Online (Sandbox Code Playgroud)

类的源文件,IpAddress.cpp

#include "IpAddress.h"
#include <boost/xpressive/xpressive.hpp>

namespace bxp = boost::xpressive;

static const std::string RegExIpV4_IpFormatHost = "^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]+(\\:[0-9]{1,5})?$";
static const std::string RegExIpV4_StringHost = "^[A-Za-z0-9]+(\\:[0-9]+)?$";

IpAddress::IpAddress(void)
:m_type(Unknown)
,m_portNumber(0)
{
}

IpAddress::~IpAddress(void)
{
}

IpAddress IpAddress::Parse( const std::string& ip_address_str )
{
    IpAddress ipaddress;
    bxp::sregex ip_regex = bxp::sregex::compile(RegExIpV4_IpFormatHost);
    bxp::sregex str_regex = bxp::sregex::compile(RegExIpV4_StringHost);
    bxp::smatch match;
    if (bxp::regex_match(ip_address_str, match, ip_regex) || bxp::regex_match(ip_address_str, match, str_regex))
    {
        ipaddress.m_type = IpV4;
        // Anything before the last ':' (if any) is the host address
        std::string::size_type colon_index = ip_address_str.find_last_of(':');
        if (std::string::npos == colon_index)
        {
            ipaddress.m_portNumber = 0;
            ipaddress.m_hostAddress = ip_address_str;
        }else{
            ipaddress.m_hostAddress = ip_address_str.substr(0, colon_index);
            ipaddress.m_portNumber = atoi(ip_address_str.substr(colon_index+1).c_str());
        }
    }
    return ipaddress;
}

std::string IpAddress::TypeToString( Type address_type )
{
    std::string result = "Unknown";
    switch(address_type)
    {
    case IpV4:
        result = "IP Address Version 4";
        break;
    case IpV6:
        result = "IP Address Version 6";
        break;
    }
    return result;
}

const std::string& IpAddress::getHostAddress() const
{
    return m_hostAddress;
}

unsigned short IpAddress::getPortNumber() const
{
    return m_portNumber;
}

IpAddress::Type IpAddress::getType() const
{
    return m_type;
}
Run Code Online (Sandbox Code Playgroud)

我只设置了IPv4的规则,因为我不知道IPv6的正确格式.但我很确定实现它并不难.Boost Xpressive只是一个基于模板的解决方案,因此不需要将任何.lib文件编译到你的exe中,我认为这是一个优点.

顺便说一下,简单地分解正则表达式的格式......
^ =字符串的开头
$ =字符串的结尾
[] =可以出现的一组字母或数字
[0-9] =之间的任何单个数字0和9
[0-9] + = 0到9之间的一个或多个数字
'.' 对正则表达式有特殊意义,但由于我们的格式在ip-address格式中有1个点,我们需要指定我们想要一个'.' 使用'\.'在数字之间.但是由于C++需要'\'的转义序列,我们必须使用"\\".
=可选组件

因此,简而言之,"^ [0-9] + $"表示正则表达式,对于整数,它是正确的.
"^ [0-9] + \.$"表示以"."结尾的整数.
"^ [0-9] + \.[0-9]?$"是一个以'.'结尾的整数.或小数.
对于整数或实数,正则表达式将是"^ [0-9] +(\.[0-9]*)?$".
RegEx一个2到3个数字之间的整数是"^ [0-9] {2,3} $".

现在分解ip地址的格式:

"^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]+(\\:[0-9]{1,5})?$"
Run Code Online (Sandbox Code Playgroud)

这是同义词:"^ [0-9] {1,3} \.[0-9] {1,3} \.[0-9] {1,3} \.[0-9] +( \:[0-9] {1,5})?$",表示:

[start of string][1-3 digits].[1-3 digits].[1-3 digits].[1-3 digits]<:[1-5 digits]>[end of string]
Where, [] are mandatory and <> are optional
Run Code Online (Sandbox Code Playgroud)

第二个RegEx比这简单.它只是一个字母数字值后跟一个可选的冒号和端口号的组合.

顺便说一句,如果你想测试RegEx,你可以使用这个网站.

编辑:我没注意到您可选择使用http而不是端口号.为此,您可以将表达式更改为:

"^[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]+(\\:([0-9]{1,5}|http|ftp|smtp))?$"
Run Code Online (Sandbox Code Playgroud)

这接受如下格式:
127.0.0.1
127.0.0.1:3282
127.0.0.1:http
217.0.0.1:ftp
18.123.2.1:smtp

  • 当人们遇到问题时,他们会说:我知道,我会使用正则表达式.现在他们有两个问题. (7认同)

seh*_*ehe 5

我参加聚会迟到了,但我正在谷歌上搜索如何做到这一点。Spirit 和 C++ 已经成长了很多,所以让我补充一下 2021 年的情况:

实时编译器资源管理器

#include <fmt/ranges.h>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/std_tuple.hpp>

auto parse_server_address(std::string_view address_spec,
                          std::string_view default_service = "https")
{
    using namespace boost::spirit::x3;
    auto service = ':' >> +~char_(":") >> eoi;
    auto host    = '[' >> *~char_(']') >> ']' // e.g. for IPV6
        | raw[*("::" | (char_ - service))];

    std::tuple<std::string, std::string> result;
    parse(begin(address_spec), end(address_spec),
          expect[host >> (service | attr(default_service))], result);

    return result;
}

int main() {
    for (auto input : {
             "localhost",
             "localhost:11211",
             "127.0.0.1:http",
             "[::1]:11211",
             "::1", "[::1]",
             "::1:80", // Invalid: Is this the IPv6 address ::1:80 and a default
                       // port, or the IPv6 address ::1 and the port 80 ?
             "::1:http", // This is not ambigous, but for simplicity sake, let's
                         // consider this is forbidden as well.
         })
    {
        // auto [host, svc] = parse_server_address(input);
        fmt::print("'{}' -> {}\n", input, parse_server_address(input));
    }
}
Run Code Online (Sandbox Code Playgroud)

印刷

'localhost' -> ("localhost", "https")
'localhost:11211' -> ("localhost", "11211")
'127.0.0.1:http' -> ("127.0.0.1", "http")
'[::1]:11211' -> ("::1", "11211")
'::1' -> ("::1", "https")
'[::1]' -> ("::1", "https")
'::1:80' -> ("::1", "80")
'::1:http' -> ("::1", "http")
Run Code Online (Sandbox Code Playgroud)

奖金

验证/解析地址。解析是 100% 不变的,只是使用 Asio 来解析结果,同时验证它们:

#include <boost/asio.hpp>
#include <iostream>
#include <iomanip>
using boost::asio::ip::tcp;
using boost::asio::system_executor;
using boost::system::error_code;

int main() {
    tcp::resolver r(system_executor{});
    error_code    ec;

    for (auto input : {
             "localhost",
             "localhost:11211",
             "127.0.0.1:http",
             "[::1]:11211",
             "::1", "[::1]",
             "::1:80", // Invalid: Is this the IPv6 address ::1:80 and a default
                       // port, or the IPv6 address ::1 and the port 80 ?
             "::1:http", // This is not ambigous, but for simplicity sake, let's
                         // consider this is forbidden as well.
             "stackexchange.com",
             "unknown-host.xyz",
         })
    {
        auto [host, svc] = parse_server_address(input);

        for (auto&& endpoint : r.resolve({host, svc}, ec)) {
            std::cout << input << " -> " << endpoint.endpoint() << "\n";
        }

        if (ec.failed()) {
            std::cout << input << " -> unresolved: " << ec.message() << "\n";
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

打印(有限网络Live On Wandbox和 Coliru http://coliru.stacked-crooked.com/a/497d8091b40c9f2d

localhost -> 127.0.0.1:443
localhost:11211 -> 127.0.0.1:11211
127.0.0.1:http -> 127.0.0.1:80
[::1]:11211 -> [::1]:11211
::1 -> [::1]:443
[::1] -> [::1]:443
::1:80 -> [::1]:80
::1:http -> [::1]:80
stackexchange.com -> 151.101.129.69:443
stackexchange.com -> 151.101.1.69:443
stackexchange.com -> 151.101.65.69:443
stackexchange.com -> 151.101.193.69:443
unknown-host.xyz -> unresolved: Host not found (authoritative)
Run Code Online (Sandbox Code Playgroud)