我正在将我的一个C++项目(一个简单的DSL)转换为生锈学习生锈,我遇到嵌套结构和所有权问题.我很难转换一些像:
struct FileData {
bool is_utf8;
std::string file_name;
};
class Token {
public:
enum TokenType {
REGULAR,
INCLUDE_FILE,
}
Token() {
_type = REGULAR;
}
Type get_type() const { return _type; }
void beginIncludeFile() {
_type = INCLUDE_FILE;
_include_data = std::unique_ptr<FileData>(new FileData);
}
bool is_utf8() const {
assert(get_type() == INCLUDE_FILE);
return _include_data->is_utf8;
}
void set_utf8(bool value) {
assert(get_type() == INCLUDE_FILE);
_include_data->is_utf8 = value;
}
const std::string& get_file_name() const {
assert(get_type() == INCLUDE_FILE);
return _include_data->file_name;
}
void setFileNameToEmpty() {
assert(get_type() == INCLUDE_FILE);
_include_data->file_name = "";
}
void appendToFileName(char c) {
assert(get_type() == INCLUDE_FILE);
_include_data->file_name += c;
}
FileData* releaseFileData() { return _include_data.release(); }
private:
std::unique_ptr<FileData> _include_data;
TokenType _type;
};
Run Code Online (Sandbox Code Playgroud)
我为此写的锈迹是:
use std::str;
pub struct FileData {
is_utf8 : bool,
file_name : ~str
}
pub fn FileData() -> FileData {
FileData { is_utf8 : true, file_name : ~"" }
}
enum TokenType {
REGULAR,
INCLUDE_FILE
}
pub struct Token {
priv _include_data : Option<~FileData>,
priv _type : TokenType
}
pub fn Token() -> Token {
Token {
_include_data: None,
_type : REGULAR
}
}
impl Token {
pub fn get_type(&self) -> TokenType {
self._type
}
pub fn beginIncludeFile(&mut self) {
self._type = INCLUDE_FILE;
self._include_data = Some(~FileData());
}
pub fn is_utf8(&self) -> bool {
match self._include_data {
Some(ref data) => data.is_utf8,
_ => fail!("No FileData")
}
}
pub fn set_utf8(&mut self, value : bool) {
self._include_data.mutate(|mut data| {
data.is_utf8 = value;
data
});
}
// Return immutable/read-only copy
pub fn get_file_name(&self) -> &~str {
match self._include_data {
Some(ref data) => &data.file_name,
_ => fail!("No FileData")
}
}
pub fn setFileNameToEmpty(&mut self) {
match self._include_data {
Some(ref data) => data.file_name = ~"",
_ => fail!("No FileData")
}
return;
}
pub fn appendToFileName(&mut self, c : char) {
match self._include_data {
Some(ref data) => data.file_name.push_char(c),
_ => fail!("No FileData")
}
return;
}
pub fn getIncludeData(&mut self) -> ~FileData {
match self._include_data {
Some(ref data) => *data,
_ => fail!("No FileData")
}
}
}
enum LexState {
INITIAL,
EXPECT_COLON,
EXPECT_ENCODING,
EXPECT_QUOTE,
IN_FILENAME_STRING,
EXPECT_SEMI
}
impl Eq for LexState {
fn eq(&self, other: &LexState) -> bool {
return (*self as int) == (*other as int);
}
fn ne(&self, other: &LexState) -> bool {
!self.eq(other)
}
}
fn main() {
let mut t = ~Token();
let input = ~"include:utf8 \"file_path/file.foo\";";
let iter = input.iter();
let mut buf : ~str = ~"";
let mut state : LexState = INITIAL;
let buf_action = |action : &fn()| {
buf = ~"";
action();
};
while true {
let c = iter.next();
match c {
None => break,
Some(_c) => buf.push_char(_c)
}
match buf {
// Initial state
~"include" if state == INITIAL => buf_action(|| {
t.beginIncludeFile();
state = EXPECT_COLON;
}),
// Expecting either an encoding, or the start of the file name
~":" if state == EXPECT_COLON => buf_action(|| { state = EXPECT_ENCODING; }),
_ if state == EXPECT_COLON => state = EXPECT_QUOTE, // match WS
// utf8 is the only encoding accepted at the moment
~"utf8" if state == EXPECT_ENCODING => buf_action(|| {
t.set_utf8(true);
state = EXPECT_QUOTE;
}),
_ if state == EXPECT_ENCODING => t.set_utf8(false),
// Looking for string start
~"\"" if state == EXPECT_QUOTE => buf_action(||{ state = IN_FILENAME_STRING; }),
_ if state == EXPECT_QUOTE => (), // ignore other chars
// Reading filename
~"\"" if state == IN_FILENAME_STRING => buf_action(|| {
state = EXPECT_SEMI;
}),
_ if state == IN_FILENAME_STRING => t.appendToFileName(c.unwrap()),
// End of lex
~":" if state == EXPECT_SEMI => break,
_ if state == EXPECT_SEMI => fail!("Expected semi"),
_ => fail!("Unexpected character: " + str::from_char(c.unwrap()))
}
}
return;
}
Run Code Online (Sandbox Code Playgroud)
这种代码的生锈方式是什么?
Rust与C++完全不同,直线逐行转换将提供非惯用代码.这不是一个完整的答案,只是一个点点滴滴的集合:
从结构内部返回信息时,fn foo<'a>(&'a self) -> &'a SomeInformation按正常方式编写函数(特别处理str's和[]'):所以
pub fn get_file_name<'a>(&'a self) -> &'a str {
match self._include_data {
Some(ref data) => &data.file_name,
_ => fail!("No FileData")
}
}
pub fn getIncludeData<'a>(&'a self) -> &'a FileData {
match self._include_data {
Some(ref data) => &*data,
_ => fail!("No FileData")
}
}
Run Code Online (Sandbox Code Playgroud)
所述'a标记物是名为寿命,其连接返回值多长时间是有效的条件是所述周期self对象是有效的; 这意味着悬空指针是不可能的(忽略编译器错误).
以下内容的集合match:
match检查s是否完整,因此将其翻转(匹配state而不是buf)是类型更安全的.
match 有一个返回值,所以你可以"神奇地"设置状态.
该buf_action功能是独特的(我认为它一般不会更多?),这既可以进行更改,以便buf_action(foo)写成clear_buf(); foo,或者,至少是,应返回内罩的价值,所以
let buf_action = |f| { buf = ~""; f() } // note the lack of semicolon after f
Run Code Online (Sandbox Code Playgroud)调用函数有一个特殊的糖,最后一个参数是函数:do buf_action { some; actions(); here; }.(当闭包有参数时,do f |a,b,c| { x; y; z }.)
state = match state {
// Initial state
INITIAL if "include" == buf => do buf_action {
t.beginIncludeFile();
EXPECT_COLON
},
// Expecting either an encoding, or the start of the file name
EXPECT_COLON => if ":" == buf {
buf_action(|| EXPECT_ENCODING ),
} else {
EXPECT_QUOTE
},
// utf8 is the only encoding accepted at the moment
EXPECT_ENCODING => match buf {
~"utf8" => do buf_action { t.set_utf(true); EXPECT_QUOTE },
_ => { t.set_utf(false); EXPECT_ENCODING } // this is probably incorrect?
},
// Looking for string start
EXPECT_QUOTE => if "\"" == buf {
buf_action(|| IN_FILENAME_STRING)
} else {
EXPECT_QUOTE // ignore other chars
},
IN_FILENAME_STRING => if "\"" == buf {
buf_action(|| EXPECT_SEMI)
} else {
t.appendToFileName(c.unwrap());
IN_FILENAME_STRING
}
// End of lex
EXPECT_SEMI => if ":" == buf {break} else {fail!("Expected semi")},
_ => fail!("Unexpected character: %c", c)
};
Run Code Online (Sandbox Code Playgroud)
还while true应该loop; 但实际上,循环应写成:
for input.iter().advance |c| {
buf.push_char(c);
state = match state { ... }
}
Run Code Online (Sandbox Code Playgroud)
小点:
Option<~FileData>,let mut t = ~Token();⇒ Option<FileData>,let mut t = Token();.这些分配是不必要的.
lowercase_with_underscores 似乎是Rust命名约定.
Eq您拥有的impl可以由编译器自动创建#[deriving(Eq)] enum LexState { ... }.(在教程和手册中有更详细的描述.)
它是惯用的,以避免分配在可能的情况,并且这将包括使用切片(s.slice(byte_start, byte_end))中input,而不是推到字符buf; 即记录start当前令牌的索引并通过将此索引设置为当前索引来"清除"缓冲区; 但是,实施起来可能有点棘手.