dac*_*ave 1 ubuntu parsing haskell list
我有一个Haskell程序,它读取输入文件的内容并解析它以排序和删除重复项.这个程序已经休眠了一段时间了,我需要复活它.我只是针对这个问题的一些历史背景告诉你这个.
当我重新启用该程序时,我发现它无法正常工作.我的调试已将问题隔离到解析和"清理"输入文件的代码.在此之后发生的事情对于这个问题是无关紧要的,因为我最终得到了输入文件中的空候选记录列表.
我在我的Windows笔记本电脑上编写并测试该程序,然后在需要运行的Ubuntu服务器上部署和构建源代码.作为我的调试的一部分,我已经将文本解析分解为几个下降步骤,并且在最后一步的输出上运行catMaybe的部分是我获取空列表的地方,但仅当我在Ubuntu服务器上运行它时.
以下是main中的源代码,用于演示此问题:
main = do
[ inFileName ] <- getArgs
sFile <- readFile inFileName
let lrec = lines sFile
putStrLn $ "Number of lines read from the file: " ++ show (length lrec)
let prec = map processLine lrec
putStrLn $ "Number of processed lines is " ++ show (length prec)
-- let persons = mapMaybe processLine lrec
let persons = catMaybes prec
putStrLn $ "Number of filtered person records: " ++ show (length persons)
let records = sortBy (compare `on` personEmployeeID) persons
putStrLn $ "Number of records read and sorted is " ++ show (length records)
{-
Compare and warn about employees with duplicate records.
-}
let srec = groupBy ((==) `on` personEmployeeID) records
putStrLn $ "Number of unique record groups is " ++ show (length srec)
let dups = map (personEmployeeID . head) $ filter ((> 1) . length) srec
putStrLn $ "Number of dups: " ++ show (length dups)
unless (null dups) $ putStrLn $ "WARNING: Duplicate employees: " ++ show dups
-- Remove the duplicates
let cleanedRecords = map head srec
putStrLn $ "Number of records in cleanedRecords is " ++ show (length cleanedRecords)
Run Code Online (Sandbox Code Playgroud)
正如您可能从注释行中注意到的那样,我尝试使用mapMaybe代替catMaybes而不会在结果中进行任何更改.以下是processLine方法中的代码,其中注释显示输入记录的格式:
{-
Splits a line of the input file into fields. The format includes 11 columns,
separated by semicolons. The 10th columns is required to be 'A' or 'S',
indicating the user is active or short-term; otherwise we ignore that line.
Sample Line:
------------------------------------------------------------------------------------------------------------------------------------------------
99XXXXX17;MXXX ;TXXXXX ;MIXXXXXX ;RAA CBP;RAA;19910929;19910929;19910929;A; ;
------------------------------------------------------------------------------------------------------------------------------------------------
emp id ;first name ;middle name ;last name ;loc code;dpt;hiredate;servdate;statdate;s;note ;
------------------------------------------------------------------------------------------------------------------------------------------------
* s = status
-}
processLine :: String -> Maybe Person
processLine line =
let (_ :: String, _ :: String, _ :: String, result) =
line =~ "^(.+);(.+);(.+);(.+);(.+);(.+);(.+);(.+);(.+);(A|S);(.+);$"
in case result of
[empid, fname, mname, lname, lcode, dept, hdate, srvdate, stdate, status, note]
-> Just $ Person empid (trim fname) (trim mname) (trim lname)
(trim lcode) dept hdate srvdate stdate (readStatus status) (trim note)
_ -> Nothing
Run Code Online (Sandbox Code Playgroud)
当我在我的Windows笔记本电脑上运行此代码时,它会产生以下输出:
Number of lines read from the file: 47793
Number of processed lines is 47793
Number of filtered person records: 32993
Number of records read and sorted is 32993
Number of unique record groups is 32949
Number of dups: 44
WARNING: Duplicate employees: [ {List removed for privacy } ]
Number of records in cleanedRecords is 32949
C:>cabal --version
cabal-install version 1.22.4.0
using version 1.22.3.0 of the Cabal library
C:>ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.8.3
Run Code Online (Sandbox Code Playgroud)
当我在两个不同的Ubuntu服务器上的相同输入文件上运行相同的代码时,每个服务器都有不同版本的Ubuntu和Haskell,我得到以下输出:
Number of lines read from the file: 47793
Number of processed lines is 47793
Number of filtered person records: 0
Number of records read and sorted is 0
Number of unique record groups is 0
Number of dups: 0
Number of records in cleanedRecords is 0
xx:~/$ cabal --version
cabal-install version 0.14.0
using version 1.14.0 of the Cabal library
xx:~/$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.4.1
Run Code Online (Sandbox Code Playgroud)
...并从另一个Ubuntu服务器:
Number of lines read from the file: 47793
Number of processed lines is 47793
Number of filtered person records: 0
Number of records read and sorted is 0
Number of unique record groups is 0
Number of dups: 0
Number of records in cleanedRecords is 0
yy:~/$ cabal --version
cabal-install version 0.10.2
using version 1.10.2.0 of the Cabal library
yy:~/$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.1
Run Code Online (Sandbox Code Playgroud)
像往常一样,我很困惑.我准备尝试任何事情.
有任何想法吗?
戴夫
答案是......
Windows与Unix行结尾.
我添加了代码来打印输入的前几行,并在每行的末尾看到\ r \n.我通过dos2unix运行该文件.现在我在Ubuntu系统上得到了相同的结果.
感谢您将输入文件指向我作为问题的根源.