Tuesday, August 27, 2013

TCP/IP - socket closing sequence, CLOSE_WAIT and TIME_WAIT



Let's say there is a socket connection established between client and server. Following is the closing sequence, once the data transfer is done:

1. Socket on client sends a TCP segment with FIN bit (in TCP header) and the socket goes into FIN_WAIT_1 state on the client.

2.  Socket on server receives the FIN and responds back with ACK to acknowledge the FIN and the socket goes to CLOSE_WAIT state. Now until the application calls the close() on this socket this is going to be in CLOSE_WAIT state.

3. Socket on client receives the ACK and changes to FIN_WAIT_2 state.

4. Socket on server closes the connection (once the application calls close()) and sends back FIN to its peer to close the connection and changes its state to last ACK.

5. Socket on client receives the FIN and sends back ACK. At this point the socket implementation on client would start a timer (TIME_WAIT) to handle the scenario where last ACK has been lost and server resends FIN.
Now the socket would wait for 2* MSL (Maximum segment lifetime- default is 4mins for solaris & windows), for Linux, the tcp_fin_timeout is 60.

6)  Socket on server receives the ACK and it moves the connection to closed state.

7)  After TIME_WAIT is elapsed socket/connection will be closed on client.

All these multi-level of acknowledgments and retransmits are necessary since TCP/IP is a reliable protocol unlike UDP.

CLOSE_WAIT:
This is a state where socket is waiting for the application to execute close(). It means that the local end of the connection has received a FIN from the other end, but the OS is waiting for the program at the local end to actually close its connection. Usually this happens when client is closing the connection but the application hasn't closed it yet. The problem is your program running on the local machine is not closing the socket. It is not a TCP tuning issue. A connection can (and quite correctly) stay in CLOSE_WAIT forever while the program holds the connection open.

CLOSE_WAIT is not something that can be configured where as TIME_WAIT can be set through tcp_time_wait_interval (The attribute tcp_close_wait_interval has nothing to do with close_wait state)
A socket can be in CLOSE_WAIT state indefinitely until the application closes it.
Faulty scenarios would be like filedescriptor leak, server not being execute close() on socket leading to pile up of close_wait sockets.


To see all the connections in "CLOSE_WAIT" state, use:
netstat -tonp 2>&1 | grep CLOSE

You can also use the above command to determine which programs as holding the connections.

If there are no programs listed, then the service is being provided by the kernel. These are likely RPC services such as nfs or rpc.lockd. Listening kernel services can be listed with
netstat -lntp 2>&1 | grep -- -

# netstat -atnp | grep "CLOSE_WAIT"

If you are seeing a large number of connections persisting in  CLOSE_WAIT state it's probably a problem with the app itself, restarting  it will clear the connections temporarily but obviously further  investigation will be required to find the cause of the problem.


TIME_WAIT:
This is just a time based wait on socket before closing down the connection permanently. Under most circumstances, sockets in TIME_WAIT is nothing to worry about.



No comments: