读取直到管道关闭

Question

读取直到管道关闭

我现在正在做作业以介绍操作系统，我玩得很开心，但同时也很困惑。我现在正在做管道工作；我的代码如下。

最初，我的代码如下所示：

// Child process - write
if (fork() == 0) {
    fprintf(stderr, "Child\r\n");
    close(1);
    dup(p[1]);
    close(p[0]);
    close(p[1]);
    runcmd(pcmd->left);
// Parent process - read
} else {
    wait(0);
    close(0);
    dup(p[0]);
    close(p[0]);
    close(p[1]);
    fprintf(stderr, "Parent\r\n");
    runcmd(pcmd->right);
}

Run Code Online (Sandbox Code Playgroud)

我对此的思考过程是，父母会等到孩子被终止，然后从管道中读取，就是这样。我在我们的讨论页面上将此代码发布给了我的导师，他告诉我代码存在几个问题，其中之一是：

如果子进程运行足够长的输入以阻塞管道，则父进程可能会无限挂起。

He mentioned that the correct implementation therefore (in regards to wc), would be to use a blocking read command, which would wait on the pipe until data was available, and then begin reading until the pipe has closed.

I tried looking around for some way to "read" from the pipe the moment it had data in it, but was unsure of how to go around it. In the end, in an effort to try to solve the issue with the possibility of waiting forever on a blocked pipe, I had the parent and child run simultaneously in parallel, but that may mean that the reading process may terminate first and not read in all the data before write has finished. How would I go about addressing the issue?

    int p[2];
    pipe(p);
    // Child process - read
    if (fork() == 0) {
        fprintf(stderr, "Start child\r\n");
        close(0);
        dup(p[0]);
        close(p[0]);
        close(p[1]);
        fprintf(stderr, "Child\r\n");
        runcmd(pcmd->right);
    // Parent process - write
    } else {
        fprintf(stderr, "Start parent\r\n");
        close(1);
        dup(p[1]);
        close(p[0]);
        close(p[1]);
        fprintf(stderr, "Parent\r\n");
        runcmd(pcmd->left);
   }

Run Code Online (Sandbox Code Playgroud)

Edit: I also tried the read command, but was unsure of how to actually use it since it requires the buffer, and also the expected size to read in (?). I'm uncertain of how to retrieve either of those when you don't know the size of the incoming data.

Answer 1

G-M*_*ca' 12

Piping is simple. You’re making it hard on yourself by jumping into the pool at the deep end. (Or perhaps it’s your instructor’s fault for not guiding you better.)

To become more comfortable with pipes, I suggest that you write two trivially simple programs:

One that just writes some text to the standard output and exits. It can be something simple — “The quick brown fox jumps over the lazy dog.”, “Lorem ipsum dolor sit amet, consectetur adipiscing elit, …”, a short string (maybe even a single character) repeated many times — whatever you want. Use printf, write, fprintf(stdout, …), or whatever other function(s) you like.

To test this program, just run it from a shell prompt. It should display the chosen text and exit (return you to your shell prompt).
And one that just reads text from the standard input and writes it to standard output. Use getc, gets, read, or whatever other function(s) you like. Exit when you get end-of-file. Check the man page for whatever function you use to see how it indicates end-of-file.

To test this program, create a text file (called something like jon_file.txt) and put some text into it. You can do this quickly by saying something like echo "Hello world" > jon_file.txt, or you can use an editor. Then type prog2 < jon_file.txt. It should display the contents of the file and exit (return you to your shell prompt).

Don’t call pipe, dup, or anything fancy — not even open or close. (Do include whatever debugging and/or auditing code you want to ensure that you understand what is happening when.) And then run prog1 | prog2. If you’ve done it correctly, you’ll get the output you expect.

Now try to “break” it by adding sleep calls to the programs. If you break it, let me know how you did it. It should be almost impossible — unless you make one program (or both) sleep for longer than you’re willing to sit and wait, you’ll always get prog2 to output all the data that prog1 writes.

And in case the above example doesn’t make it clear: having the parent and child (or, in general, the processes on both sides of a pipe) run “simultaneously” is the right thing to do.¹ The reading program won’t “terminate first” just because there is no data in the pipe currently. As you should have learned from the above exercise, if a program tries to read from a pipe that has no data in it currently, the read system call will force the program to wait until data arrive. The reading program won’t terminate until there are no data left in the pipe and no more coming, ever.² (At this point, read will return an end-of-file.) The “no more data coming ever” condition is indicated by the writing program closing the pipe (or exiting, which is equivalent, because exit calls close on all open file descriptors).

I don’t understand why you’re sweating the read system call at this point — although, if you don’t know how to use it yet, that confirms my suspicion that your instructor is presenting material out of logical order. (I assume that you mean the read system call and not the read command.) The only way your program makes sense is if runcmd(pcmd->right) is something that reads from standard input by some method (like our prog2 program, above). It looks like your program is just doing the function of the shell — setting up the pipes, and then letting the programs run. At that level, there’s no reason for your program (to the extent that you have shown it to us) to do any I/O (reading or writing).
__________
¹ Related reading: In what order do piped commands run?
² Of course this is an oversimplification. As you will learn soon, if you haven’t already, you can design the reading program to terminate when there is no data in the pipe currently — but that’s not the default behavior. Or you can design the reading program to terminate under any number of other conditions — e.g., if it reads a q from the pipe. Or it could be killed by a signal, etc…

I’m looking back at this answer six months later, and I see that I really didn’t address the entire question;? I covered the second half, but not the first. So, continuing from the above,

Modify the first program to write a lot of data — at least 100,000 (10⁵) or 102400 (2¹⁰×10²) characters — to stdout. Also, if you haven’t already done this, modify it to write some on-going status information to stderr. This can be something very simple; e.g., one “.” to stderr for every 1000 (or 1024) characters to stdout, and “!\n” to stderr when it’s done.

To test this, run prog1 > /dev/null. If you followed my suggestion (above), you should see 100 dots (.), followed by ! and a newline. If you don’t have any calls to sleep() or other time-consuming functions in prog1, this output should come fairly quickly.

Then run prog1 | wc -c. It should display your stderr status information, as mentioned above, followed by 100000 or 102400 or however many bytes you wrote to stdout. (This will be the output from wc -c, reporting how many bytes it read from its stdin (the pipe).)
Modify the second program to sleep 10 or 20 seconds before it starts reading.

To test this, run prog2 < jon_file.txt again. Obviously it should pause for the amount of time you specified in your sleep(), and then display the contents of the file and exit (return you to your shell prompt).

Now run prog1 | prog2 > /dev/null. But, before you do that, you might want to try to guess what will happen.

?????

I expect that it will print some dots — maybe 8, maybe 64 or 65, maybe some other number — and then the pause, and then the rest of the dots, and the !. This is because prog1 can start writing immediately, even if prog2 isn’t reading yet. The pipe can hold the data until prog2 is ready to start reading — but only up to a point. The pipe has a buffering limit. This may be 8000 (or 8192), 64000 (or 65536), or some other number. When the pipe is full, the system will force prog1 to wait. When prog2 starts reading, it drains the pipe; this makes room for the pipe to hold more data, and so prog1 is allowed to start writing again.

If you don’t see the above behavior at first, try increasing the numbers: 200,000 bytes, 30 seconds, etc.

So your teacher was partly right when he criticized the first draft of your program. (Or, perhaps, he was exactly right, and you misquoted him.) As you understand, that version of the program waited for the runcmd(pcmd->left) program (the pipe writer) to finish, and then it would start runcmd(pcmd->right) (the pipe reader). But what it the left program outputs 100,000 bytes? It will fill the pipe and then wait until it can write some more. But it won’t be able to write more until “somebody” reads from the pipe and drains the storage buffer. But the main program won’t start the pipe reader until the pipe writer has finished. Everybody is waiting for somebody else to do something, which they won’t do until the first guy has done something. (“I’ll give you the jewel as soon as you give me the money.”? /? “No, I’ll在你给我珠宝之后再给你钱。”）所以，是的；底线：如果数据因管道已满而停止移动，并且没有进程从中读取数据，那么两个进程都会无限挂起。

这种情况在文化上被随意称为Catch-22。在计算机科学中，它正式称为死锁，非正式称为致命拥抱。

归档时间：	7 年，11 月前
查看次数：	11215 次
最近记录：	5 年，5 月前