Sunday, April 21, 2013

CLOSE_WAIT

It's not the first time to hear that people complain CLOSE_WAIT state remains on the system.
It is because that the peer closes the connection (close the socket explicitly or the peer process is terminated), but your side does not take correct action on this connection.

So, what happens if the peer closes the connection?

- If your side does not take correct actions, the peer will NOT result in bad state and the peer's connection state will be cleaned up after tcp_fin_wait_2_flush_interval .

- If your side is doing write() or send() on the connection, your side will receive SIGPIPE signal ("broken pipe", see signal.h man page). The default action for SIGPIPE is exiting the application. So your application probably need change this default behavior.

-  If your side is doing read() or recv() on the connection, your side will receive the return code "0". Your side must handle this situation.

- If your side is waiting for POLLIN event via select() or poll() or port_get(), then it will fire the event and consequently your recv() code will return "0", thus you can handle this situation in your application.

- If your side does not take any action on the connection, CLOSE_WAIT will remains on your side until you exit or restart your side application. (The tcp_keepalive_interval or  tcp_keepalive_abort_interval in  tcp/ip settings does not help on this)

A typical mistake in application is as below, see below example:
....
recvbytes=recv(sockfd, buf, BUFSIZE, 0);
 if (recvbytes < 0) {
             perror("recv error");
             close(sockfd);
             .....
}
....

this is buggy, the correct way is  " if (recvbytes <= 0) {"
"0" means the peer has closed the connection.

It' strange that Linux man page clearly says "recv() returns 0" means the peer  has  performed  an  orderly shutdown but current Solaris man page says nothing about it.


No comments:

Post a Comment