I am a F# newbie here, the following code is to get all the lines from a csv file which rating > 9.0 then output to a new file.
But I have a hard time figuring out how many for loop does it takes to get the job done, is it all the steps are done in one for loop or 5 for loops as I highlight below?
If it need 5 for loops to get the job done, it should have load the whole file into memory after completing the 2nd loop, but the process only consumes 14M of memory the whole time, less than the csv file.
I know StreamReader will not load the whole file into memory at once, but how the following code execute?
thx in advance...
let ratings = @"D:\Download\IMDB\csv\title.ratings.csv"
let rating9 = @"D:\Download\IMDB\csv\rating9.csv"
let readCsv reader =
Seq.unfold (fun (r:StreamReader) -> // 1st for loop
match r.EndOfStream with
| true -> None
| false -> Some (r.ReadLine(), r)) reader
let toTuple = fun (s:string) ->
let ary = s.Split(',')
(string ary.[0], float ary.[1], int ary.[2])
using (new StreamReader(ratings)) (fun sr ->
use sw = new StreamWriter(rating9)
readCsv sr
|> Seq.map toTuple // 2nd for loop
|> Seq.filter (fun (_, r, _) -> r > 9.0) // 3rd for loop
|> Seq.sortBy (fun (_, r, _) -> r) // 4th for loop
|> Seq.iter (fun (t, r, s) -> // 5th for loop
sw.WriteLine(sprintf "%s,%.1f,%i" t r s)))
Run Code Online (Sandbox Code Playgroud)
您所理解的缺失部分是F#Seq是惰性的。它不会做比需要的更多的工作,尤其是在绝对必要之前,它不会消耗序列。特别是,Seq.map并且Seq.filter不像for循环那样运行;相反,它们就像转换管道一样,在现有转换之上堆叠了一个新转换。您实际上要在整个外观中运行的代码的第一部分是Seq.sortBy(因为对序列进行排序需要知道其所有值是什么,因此Seq.sortBy必须消耗整个序列才能完成其工作)。到那时,Seq.filter 步骤已经发生,因此CSV文件的很多行已被抛出,这就是为什么程序消耗的内存少于原始文件总大小的原因。
这是Seq在F#Interactive提示符下输入的懒惰的实际演示。看这个:
> let s = seq {1..20} ;;
val s : seq<int>
> let t = s |> Seq.map (fun i -> printfn "Starting with %d" i; i) ;;
val t : seq<int>
> let u = t |> Seq.map (fun i -> i*2) ;;
val u : seq<int>
> let v = u |> Seq.map (fun i -> i - 1) ;;
val v : seq<int>
> let w = v |> Seq.filter (fun i -> i > 10) ;;
val w : seq<int>
> let x = w |> Seq.sortBy id ;;
val x : seq<int>
> let y = x |> Seq.iter (fun i -> printfn "Result: %d" i) ;;
Starting with 1
Starting with 2
Starting with 3
Starting with 4
Starting with 5
Starting with 6
Starting with 7
Starting with 8
Starting with 9
Starting with 10
Starting with 11
Starting with 12
Starting with 13
Starting with 14
Starting with 15
Starting with 16
Starting with 17
Starting with 18
Starting with 19
Starting with 20
Result: 11
Result: 13
Result: 15
Result: 17
Result: 19
Result: 21
Result: 23
Result: 25
Result: 27
Result: 29
Result: 31
Result: 33
Result: 35
Result: 37
Result: 39
val y : unit = ()
> let z = w |> Seq.iter (fun i -> printfn "Result: %d" i) ;;
Starting with 1
Starting with 2
Starting with 3
Starting with 4
Starting with 5
Starting with 6
Result: 11
Starting with 7
Result: 13
Starting with 8
Result: 15
Starting with 9
Result: 17
Starting with 10
Result: 19
Starting with 11
Result: 21
Starting with 12
Result: 23
Starting with 13
Result: 25
Starting with 14
Result: 27
Starting with 15
Result: 29
Starting with 16
Result: 31
Starting with 17
Result: 33
Starting with 18
Result: 35
Starting with 19
Result: 37
Starting with 20
Result: 39
val z : unit = ()
Run Code Online (Sandbox Code Playgroud)
请注意,即使Seq.sortBy需要消耗整个列表才能完成其工作,但由于Seq在我创建sequence x时没有请求的一部分,因此实际上并没有开始遍历这些值。只有序列y和z,其使用Seq.iter,实际触发通过所有的值运行。(但是您可以看到,在步骤可以运行之前y,sortBy步骤必须完整iter地运行,但是z在没有sortBy步骤的情况下,每个值一次都一次通过转换管道,每个值一次已完全处理完下一个值就开始处理了)。