C++及其类型系统:如何处理多种类型的数据?

sub*_*sub 7 c++ interpreter typing

"介绍"

我对C++比较陌生.我完成了所有基本的工作,并设法为我的编程语言构建了2-3个简单的解释器.

给出的第一件事让我头疼:用C++实现我语言的类型系统

想一想:Ruby,Python,PHP和Co.有很多内置类型,显然是用C实现的.所以我第一次尝试的是让我的语言中有三种可能的类型:Int,字符串和零.

我想出了这个:

enum ValueType
{
     Int, String, Nil
};

class Value
{
 public:
  ValueType type;
  int intVal;
  string stringVal;
};
Run Code Online (Sandbox Code Playgroud)

是的,哇,我知道.由于必须一直调用字符串分配器,所以传递这个类非常慢.

下次我尝试过类似的东西:

enum ValueType
{
     Int, String, Nil
};

extern string stringTable[255];
class Value
{
 public:
  ValueType type;
  int index;
};
Run Code Online (Sandbox Code Playgroud)

我会存储所有字符串stringTable并将其位置写入index.如果类型ValueInt,我只是存储在整数index,它不会在所有使用一个int索引来访问另一个INT意义,还是?

无论如何,上面也让我头疼.过了一段时间,从这里的表中访问字符串,在那里引用它并在那里复制它变得越来越多 - 我失去了控制.我不得不放下翻译稿.

现在:好的,所以C和C++是静态类型的.

  • 上面提到的语言的主要实现如何处理程序中的不同类型(fixnums,bignums,nums,strings,arrays,resources,...)?

  • 我应该怎么做以获得许多不同类型的最大速度?

  • 解决方案与上面的简化版本相比如何?

Vij*_*hew 5

一个明显的解决方案是定义类型层次结构:

class Type
{
};

class Int : public Type
{
};

class String : public Type
{
};
Run Code Online (Sandbox Code Playgroud)

等等。作为一个完整的例子,让我们为一种小型语言编写一个解释器。该语言允许像这样声明变量:

var a 10
Run Code Online (Sandbox Code Playgroud)

这将创建一个Int对象,为其分配值10并将其存储在名为 的变量表中a。可以对变量调用操作。例如,两个 Int 值的加法运算如下所示:

+ a b
Run Code Online (Sandbox Code Playgroud)

这是解释器的完整代码:

#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <cstdlib>
#include <map>

// The base Type object from which all data types are derived.
class Type
{
public:
  typedef std::vector<Type*> TypeVector;
  virtual ~Type () { }

  // Some functions that you may want all types of objects to support:

  // Returns the string representation of the object.
  virtual const std::string toString () const = 0;
  // Returns true if other_obj is the same as this.
  virtual bool equals (const Type &other_obj) = 0;
  // Invokes an operation on this object with the objects in args
  // as arguments.
  virtual Type* invoke (const std::string &opr, const TypeVector &args) = 0;
};

// An implementation of Type to represent an integer. The C++ int is
// used to actually store the value.  As a consequence this type is
// machine dependent, which might not be what you want for a real
// high-level language.
class Int : public Type
{
public:
  Int () : value_ (0), ret_ (NULL) { }
  Int (int v) : value_ (v), ret_ (NULL) { }
  Int (const std::string &v) : value_ (atoi (v.c_str ())), ret_ (NULL) { }
  virtual ~Int ()
  {
    delete ret_;
  }
  virtual const std::string toString () const
  {
    std::ostringstream out;
    out << value_;
    return out.str ();
  }
  virtual bool equals (const Type &other_obj)
  {    
    if (&other_obj == this) 
      return true;
    try
      {
        const Int &i = dynamic_cast<const Int&> (other_obj);
        return value_ == i.value_;
      }
    catch (std::bad_cast ex)
      {
        return false;
      }
  }
  // As of now, Int supports only addition, represented by '+'.
  virtual Type* invoke (const std::string &opr, const TypeVector &args)    
  {
    if (opr == "+")
      {
        return add (args);
      }
    return NULL;
  }
private:
  Type* add (const TypeVector &args)
  {
    if (ret_ == NULL) ret_ = new Int;
    Int *i = dynamic_cast<Int*> (ret_);
    Int *arg = dynamic_cast<Int*> (args[0]);
    i->value_ = value_ + arg->value_;
    return ret_;
  }
  int value_;
  Type *ret_;
};

// We use std::map as a symbol (or variable) table.
typedef std::map<std::string, Type*> VarsTable;
typedef std::vector<std::string> Tokens;

// A simple tokenizer for our language. Takes a line and
// tokenizes it based on whitespaces.  
static void
tokenize (const std::string &line, Tokens &tokens)
{
  std::istringstream in (line, std::istringstream::in);
  while (!in.eof ())
    {
      std::string token;
      in >> token;
      tokens.push_back (token);
    }
}

// Maps varName to an Int object in the symbol table.  To support
// other Types, we need a more complex interpreter that actually infers
// the type of object by looking at the format of value.
static void
setVar (const std::string &varName, const std::string &value,
        VarsTable &vars)
{
  Type *t = new Int (value);
  vars[varName] = t;
}

// Returns a previously mapped value from the symbol table.
static Type *
getVar (const std::string &varName, const VarsTable &vars)
{
  VarsTable::const_iterator iter = vars.find (varName);
  if (iter == vars.end ())
    {
      std::cout << "Variable " << varName 
                << " not found." << std::endl;
      return NULL;
    }
  return const_cast<Type*> (iter->second);
}

// Invokes opr on the object mapped to the name var01.
// opr should represent a binary operation. var02 will
// be pushed to the args vector. The string represenation of
// the result is printed to the console.
static void
invoke (const std::string &opr, const std::string &var01,
        const std::string &var02, const VarsTable &vars)
{
  Type::TypeVector args;
  Type *arg01 = getVar (var01, vars);
  if (arg01 == NULL) return;
  Type *arg02 = getVar (var02, vars);
  if (arg02 == NULL) return;
  args.push_back (arg02);
  Type *ret = NULL;
  if ((ret = arg01->invoke (opr, args)) != NULL)
    std::cout << "=> " << ret->toString () << std::endl;
  else
    std::cout << "Failed to invoke " << opr << " on " 
              << var01 << std::endl;
}

// A simple REPL for our language. Type 'quit' to exit
// the loop.
int 
main (int argc, char **argv)
{
  VarsTable vars;
  std::string line;
  while (std::getline (std::cin, line))
    {
      if (line == "quit")
        break;
      else
        {
          Tokens tokens;
          tokenize (line, tokens);
          if (tokens.size () != 3)
            {
              std::cout << "Invalid expression." << std::endl;
              continue;
            }
          if (tokens[0] == "var")
            setVar (tokens[1], tokens[2], vars);
          else
            invoke (tokens[0], tokens[1], tokens[2], vars);
        }
    }  
  return 0;
}
Run Code Online (Sandbox Code Playgroud)

与口译员交互的示例:

/home/me $ ./mylang

var a 10
var b 20
+ a b
30
+ a c
Variable c not found.
quit
Run Code Online (Sandbox Code Playgroud)


Dav*_*eas 5

您可以在这里执行几项不同的操作。不同的解决方案及时出现,其中大多数需要动态分配实际数据(boost::variant可以避免为小对象使用动态分配的内存——感谢@MSalters)。

\n\n

纯C方法:

\n\n

存储类型信息和指向必须根据类型信息(通常是枚举)进行解释的内存的 void 指针:

\n\n
enum type_t {\n   integer,\n   string,\n   null\n};\ntypedef struct variable {\n   type_t type;\n   void * datum;\n} variable_t;\nvoid init_int_variable( variable_t * var, int value )\n{\n   var->type\xc2\xa0=\xc2\xa0integer;\n\xc2\xa0\xc2\xa0\xc2\xa0var->datum = malloc( sizeof(int) );\n   *((int)var->datum) = value;\n}\nvoid fini_variable( variable_t var ) // optionally by pointer\n{\n   free( var.datum );\n}\n
Run Code Online (Sandbox Code Playgroud)\n\n

在 C++ 中,您可以通过使用类来简化使用来改进这种方法,但更重要的是,您可以寻求更复杂的解决方案,并使用现有库作为 boost::any 或 boost::variant ,为同一问题提供不同的解决方案。

\n\n

boost::any 和 boost::variant 都将值存储在动态分配的内存中,通常通过指向层次结构中虚拟类的指针,并使用重新解释(向下转换)为具体类型的运算符。

\n