C++和Haskell中的树遍历

Yan*_*Zhu 3 recursion haskell lazy-evaluation

我是Haskell的新手.我试图了解haskell如何处理递归函数调用以及它们的惰性求值.我所做的实验只是在C++和Haskell中构建二进制搜索树,并分别在后序中遍历它们.C++实现是具有辅助堆栈的标准实现.(我只是在访问它时打印出元素).

这是我的haskell代码:

module Main (main) where

import System.Environment (getArgs)
import System.IO
import System.Exit
import Control.Monad(when)
import qualified Data.ByteString as S

main = do
     args <- getArgs
     when (length args < 1) $ do
          putStrLn "Missing input files"
          exitFailure

     content <- readFile (args !! 0)
     --preorderV print $ buildTree content
     mapM_ print $ traverse POST $ buildTree content
     putStrLn "end"


data BSTree a = EmptyTree | Node a (BSTree a) (BSTree a) deriving (Show)
data Mode = IN | POST | PRE

singleNode :: a -> BSTree a
singleNode x = Node x EmptyTree EmptyTree

bstInsert :: (Ord a) => a -> BSTree a -> BSTree a
bstInsert x EmptyTree = singleNode x
bstInsert x (Node a left right)
          | x == a = Node a left right
          | x < a  = Node a (bstInsert x left) right
          | x > a  = Node a left (bstInsert x right)

buildTree :: String -> BSTree String
buildTree = foldr bstInsert EmptyTree . words

preorder :: BSTree a -> [a]
preorder EmptyTree = []
preorder (Node x left right) = [x] ++ preorder left ++ preorder right

inorder :: BSTree a -> [a]
inorder EmptyTree = []
inorder (Node x left right) = inorder left ++ [x] ++ inorder right

postorder :: BSTree a -> [a]
postorder EmptyTree = []
postorder (Node x left right) = postorder left ++  postorder right ++[x]

traverse :: Mode -> BSTree a -> [a]
traverse x tree = case x of IN   -> inorder tree
                            POST -> postorder tree
                            PRE  -> preorder tree


preorderV :: (a->IO ()) -> BSTree a -> IO ()
preorderV f EmptyTree = return ()
preorderV f (Node x left right) = do 
                                     f x
                                     preorderV f left
                                     preorderV f right
Run Code Online (Sandbox Code Playgroud)

我的测试结果表明C++明显优于Haskell:

C++性能:(注意first15000.txt大约是first3000.txt的5倍)

time ./speedTestForTraversal first3000.txt > /dev/null 

real    0m0.158s
user    0m0.156s
sys     0m0.000s
time ./speedTestForTraversal first15000.txt > /dev/null 

real    0m0.923s
user    0m0.916s
sys     0m0.004s
Run Code Online (Sandbox Code Playgroud)

Haskell具有相同的输入文件:

time ./speedTestTreeTraversal first3000.txt > /dev/null 

real    0m0.500s
user    0m0.488s
sys     0m0.008s
time ./speedTestTreeTraversal first15000.txt > /dev/null 

real    0m3.511s
user    0m3.436s
sys     0m0.072s
Run Code Online (Sandbox Code Playgroud)

我所期待的haskell应该离C++不太远.我犯了一些错误吗?有没有办法改善我的haskell代码?

谢谢

编辑: 2014年10月18日

在测试serval时,haskell的遍历仍然比C++实现慢得多.我想给Cirdec的答案一个完整的功劳,因为他指出我的haskell实现效率低下.但是,我最初的问题是比较C++和haskell实现.所以我想打开这个问题并发布我的C++代码以鼓励进一步讨论.

#include <iostream>
#include <string>
#include <boost/algorithm/string.hpp>
#include <fstream>
#include <stack>
using namespace std;
using boost::algorithm::trim;
using boost::algorithm::split;


template<typename T>
class Node
{
public:
    Node(): val(0), l(NULL), r(NULL), p(NULL) {};
    Node(const T &v): val(v), l(NULL), r(NULL), p(NULL) {}
    Node* getLeft() {return l;}
    Node* getRight(){return r;}
    Node* getParent() {return p;}
    void  setLeft(Node *n) {l = n;}
    void  setRight(Node *n) {r = n;}
    void  setParent(Node *n) {p = n;}
    T  &getVal() {return val;}
    Node* getSucc() {return NULL;}
    Node* getPred() {return NULL;}
private:
    T val;
    Node *l;
    Node *r;
    Node *p;
};

template<typename T>
void destoryOne(Node<T>* n)
{
    delete n;
    n = NULL;
}

template<typename T>
void printOne(Node<T>* n)
{
    if (n!=NULL)
    std::cout << n->getVal() << std::endl;
}




template<typename T>
class BinarySearchTree
{
public:
    typedef void (*Visit)(Node<T> *);

    BinarySearchTree(): root(NULL) {}
    void delNode(const T &val){};
    void insertNode(const T &val){
    if (root==NULL)
        root = new Node<T>(val);
    else {
        Node<T> *ptr = root;
        Node<T> *ancester = NULL;
        while(ptr && ptr->getVal()!=val) {
        ancester = ptr;
        ptr = (val < ptr->getVal()) ? ptr->getLeft() : ptr->getRight(); 
        }
        if (ptr==NULL) {
        Node<T> *n = new Node<T>(val);
        if (val < ancester->getVal())
            ancester->setLeft(n);
        else
            ancester->setRight(n);
        } // else the node exists already so ignore!
    }
    }
    ~BinarySearchTree() {
    destoryTree(root);
    }
    void destoryTree(Node<T>* rootN) {
    iterativePostorder(&destoryOne);
    }

    void iterativePostorder(Visit fn) {
    std::stack<Node<T>* > internalStack;
    Node<T> *p = root;
    Node<T> *q = root;
    while(p) {
        while (p->getLeft()) {
        internalStack.push(p);
        p = p->getLeft();
        }
        while (p && (p->getRight()==NULL || p->getRight()==q)) {
        fn(p);
        q = p;
        if (internalStack.empty())
            return;
        else {
            p = internalStack.top();
            internalStack.pop();
        }
        }
        internalStack.push(p);
        p = p->getRight();
    }
    }


    Node<T> * getRoot(){ return root;}
private:
    Node<T> *root;
};



int main(int argc, char *argv[])
{
    BinarySearchTree<string> bst;
    if (argc<2) {
    cout << "Missing input file" << endl;
    return 0;
    }
    ifstream inputFile(argv[1]);
    if (inputFile.fail()) {
    cout << "Fail to open file " << argv[1] << endl;
    return 0;
    }
    while (!inputFile.eof()) {
    string word;
    inputFile >> word;
    trim(word);
    if (!word.empty()) {
        bst.insertNode(word);
    }
    }

    bst.iterativePostorder(&printOne);

    return 0;
}
Run Code Online (Sandbox Code Playgroud)

编辑: 2014年10月20日克里斯在下面的回答是非常彻底的,我可以重复结果.

Cir*_*dec 11

列表连接++速度很慢,每次++发生时,其第一个参数必须遍历到最后才能找到添加第二个参数的位置.您可以在标准前奏[]的定义中看到第一个参数是如何遍历的:++

(++) :: [a] -> [a] -> [a]
[]     ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)
Run Code Online (Sandbox Code Playgroud)

++递归使用时,必须对每个递归级别重复这种遍历,这是低效的.

还有另一种构建列表的方法:如果你知道在开始构建列表之前会在列表的末尾出现什么,你可以使用已经存在的结尾来构建它.让我们来看看它的定义postorder

postorder :: BSTree a -> [a]
postorder EmptyTree = []
postorder (Node x left right) = postorder left ++ postorder right ++ [x]
Run Code Online (Sandbox Code Playgroud)

当我们做postorder left,我们已经知道会来什么,这将是以后postorder right ++ [x],所以它会是有意义的建立列表左侧的树与右侧,并从已经到位节点的值.同样,当我们制造时postorder right,我们已经知道它应该发生什么,即x.我们可以通过创建一个传递rest列表累积值的辅助函数来做到这一点

postorder :: BSTree a -> [a]
postorder tree = go tree []
    where
        go EmptyTree rest = rest
        go (Node x left right) rest = go left (go right (x:rest))
Run Code Online (Sandbox Code Playgroud)

当使用15k字词作为输入运行时,这在我的机器上快两倍.让我们再探讨一下,看看我们是否可以获得更深入的了解.如果我们postorder使用函数composition(.)和application($)而不是嵌套的括号来重写我们的定义

postorder :: BSTree a -> [a]
postorder tree = go tree []
    where
        go EmptyTree rest = rest
        go (Node x left right) rest = go left . go right . (x:) $ rest
Run Code Online (Sandbox Code Playgroud)

我们甚至可以删除rest参数和函数应用程序$,并以稍微更加无点的方式编写它

postorder :: BSTree a -> [a]
postorder tree = go tree []
    where
        go EmptyTree = id
        go (Node x left right) = go left . go right . (x:)
Run Code Online (Sandbox Code Playgroud)

现在我们可以看到我们做了什么.我们已经用一个将列表添加到现有列表[a]的函数替换了[a] -> [a]一个列表.空列表将替换为不向列表开头添加任何内容的函数,即标识函数id.单例列表[x]将替换为添加x到列表开头的函数(x:).列表连接a ++ b被替换为函数组合f . g- 首先添加g将添加到列表开头的内容,然后添加f将添加到该列表开头的内容.

  • 仅供参考,@ Celdec提出的方法称为"差异列表",它在功能和逻辑编程中非常标准. (3认同)