Transmission Control Protocol Article Index for
Transmission
Shopping
Protocol
Website Links For
Transmission
 

Information About

Transmission Control Protocol




The Transmission Control Protocol ('''TCP''') is one of the core protocols of the Internet Protocol Suite , often simply referred to as TCP/IP . Using TCP, applications on networked hosts can create ''connections'' to one another, over which they can exchange streams of data using Stream Sockets . Unlike the UDP protocol this protocol guarantees reliable and in-order delivery of data from sender to receiver. TCP also distinguishes data for multiple connections by concurrent applications (''e.g.'', Web server and e-mail server) running on the same host.

TCP supports many of the Internet's most popular application protocols and resulting applications, including the World Wide Web , E-mail , File Transfer Protocol and Secure Shell .

In the Internet Protocol Suite , TCP is the intermediate layer between the Internet Protocol (IP) below it, and an Application above it. Applications often need reliable Pipe -like connections to each other, whereas the Internet Protocol does not provide such streams, but rather only Best Effort Delivery (''i.e.'', Unreliable Packet s). TCP does the task of the Transport Layer in the simplified OSI Model of Computer Network s. The other main transport-level Internet protocol is UDP and SCTP .

Applications send streams of octets (8- Bit Byte s) to TCP for delivery through the network, and TCP divides the byte stream into appropriately sized Segments (usually delineated by the Maximum Transmission Unit (MTU) size of the Data Link Layer of the network to which the computer is attached). TCP then passes the resulting packets to the Internet Protocol, for delivery through a network to the TCP module of the entity at the other end. TCP checks to make sure that no packets are lost by giving each packet a ''sequence number'', which is also used to make sure that the data is delivered to the entity at the other end in the correct order. The TCP module at the far end sends back an ''acknowledgement'' for packets which have been successfully received; a timer at the sending TCP will cause a ''timeout'' if an acknowledgement is not received within a reasonable Round-trip Time (or RTT), and the (presumably) lost data will then be ''re-transmitted''. The TCP checks that no bytes are corrupted by using a Checksum ; one is computed at the sender for each block of data before it is sent, and checked at the receiver.


PROTOCOL OPERATION



Unlike TCP's traditional counterpart, User Datagram Protocol , which can immediately start sending packets, TCP provides connections that need to be established before sending data. TCP connections have three phases. :

# connection establishment,
# data transfer,
# connection termination,

Before describing these three phases, a note about the various States of a connection end-point or '' Internet Socket '':
# LISTEN
# SYN-SENT
# SYN-RECEIVED
# ESTABLISHED
# FIN-WAIT-1
# FIN-WAIT-2
# CLOSE-WAIT
# CLOSING
# LAST-ACK
# TIME-WAIT
# CLOSED

; LISTEN : represents waiting for a connection request from any remote TCP and port. (usually set by TCP servers)
; SYN-SENT : represents waiting for the remote TCP to send back a TCP packet with the SYN and ACK flags set. (usually set by TCP clients)
; SYN-RECEIVED : represents waiting for the remote TCP to send back an acknowledgment after having sent back a connection acknowledgment to the remote TCP. (usually set by TCP servers)
; ESTABLISHED : represents that the port is ready to receive/send data from/to the remote TCP. (set by TCP clients and servers)
; TIME-WAIT : represents waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request. According to RFC 793 a connection can stay in TIME-WAIT for a maximum of four minutes.


Connection establishment

To establish a connection, TCP uses a three-way Handshake .

Before a client attempts to connect with a server, the server must first bind to a port to open it up for connections: this is called a passive open.
Once the passive open is established, a client may initiate an active open.
To establish a connection, the three-way (or 3-step) handshake occurs:
# The active open is performed by the client sending a SYN to the server.
# In response, the server replies with a SYN-ACK.
# Finally the client sends an ACK back to the server.

At this point, both the client and server have received an acknowledgement of the connection.

Example:
# The initiating host (client) sends a synchronization (SYN flag set) packet to initiate a connection. Any SYN packet holds a Sequence Number. The Sequence Number is a 32-bit field in TCP segment header. Let the Sequence Number value for this session be x.
# The other host receives the packet, records the Sequence Number x from the client, and replies with an acknowledgment and synchronization (SYN-ACK). The Acknowledgment is a 32-bit field in TCP segment header. It contains the next sequence number that this host is expecting to receive (x + 1). The host also initiates a return session. This includes a TCP segment with its own initial Sequence Number of value y.
# The initiating host responds with the next Sequence Number (x + 1) and a simple Acknowledgment Number value of y + 1, which is the Sequence Number value of the other host + 1.


Data transfer

There are a few key features that set TCP apart from User Datagram Protocol :
  • Ordered data transfer

  • Retransmission of lost packets

  • Discarding duplicate packets

  • Error-free data transfer

  • Congestion/Flow control



Ordered data transfer, retransmission of lost packets and discarding duplicate packets

In the first two steps of the 3-way handshaking, both computers exchange an initial sequence number (ISN).
This number can be arbitrary.
This sequence number identifies the order of the bytes sent from each computer so that the data transferred is in order regardless of any fragmentation or disordering that occurs during transmission.
For every byte transmitted the sequence number must be incremented.

Conceptually, each byte sent is assigned a sequence number and the receiver then sends an acknowledgement back to the sender that effectively states that they received it.
What is done in practice is only the first data byte is assigned a sequence number which is inserted in the sequence number field and the receiver sends an acknowledgement value of the next byte they expect to receive.

For example, if computer A sends 4 bytes with a sequence number of 100 (conceptually, the four bytes would have a sequence number of 100, 101, 102, & 103 assigned) then the receiver would send back an acknowledgement of 104 since that is the next byte it expects to receive in the next packet.
By sending an acknowledgement of 104, the receiver is signaling that it received bytes 100, 101, 102, & 103 correctly.
If, by some chance, the last two bytes were corrupted then an acknowledgement value of 102 would be sent since 100 & 101 were received successfully.

However, a problem can occasionally arise when packets are lost. For example, 10,000 bytes are sent in 10 different TCP packets, and the first packet is lost during transmission. The sender would then have to resend all 10,000 bytes; the recipient cannot say that it received bytes 1,000 to 9,999 but only that it failed to receive the first packet, containing bytes 0 to 999. In order to solve this problem, an option of ''selective acknowledgment (SACK)'' has been added. This option allows the receiver to acknowledge isolated blocks of packets that were received correctly, rather than the sequence number of the last packet received successively, as in the basic TCP acknowledgment. Each block is conveyed by the starting and ending sequence numbers. In the example above, the receiver would send SACK with sequence numbers 1,000 and 10,000. The sender will thus retransmit only the first packet.

The SACK option is not mandatory and it is used only if both parties support it. This is negotiated when connection is established. SACK uses the optional part of the TCP header. See #TCP Segment Structure . The use of SACK is widespread - all popular TCP stacks support it. Selective acknowledgment is also used in SCTP .


Error-free data transfer

Sequence numbers and acknowledgments cover discarding duplicate packets, retransmission of lost packets, and ordered-data transfer.
To assure correctness a Checksum field is included (''see TCP Segment Structure for details on checksumming'').

The TCP checksum is a quite weak check by modern standards. Data Link Layers with a high probability of bit error rates may require additional link error correction/detection capabilities. If TCP were to be redesigned today, it would most probably have a 32-bit 16-bit TCP checksum catches most of these simple errors. This is the end-to-end principle at work.


Congestion control

The final part to TCP is Congestion Control . TCP uses a number of mechanisms to achieve high performance and avoid ' Congestion Collapse ', where network performance can fall by several orders of magnitude. These mechanisms control the rate of data entering the network, keeping the data flow below a rate that would trigger collapse.

Acknowledgments for data sent, or lack of acknowledgments, are used by senders to implicitly interpret network conditions between the TCP sender and receiver. Coupled with timers, TCP senders and receivers can alter the behavior of the flow of data. This is more generally referred to as flow control, congestion control and/or network congestion avoidance.

Modern implementations of TCP contain four intertwined algorithms: Slow-start , Congestion Avoidance , Fast Retransmit , and Fast Recovery ( RFC2581 ).

Enhancing TCP to reliably handle loss, minimize errors, manage congestion and go fast in very high-speed environments are ongoing areas of research and standards development.


TCP window size



The TCP receive window size is the amount of received data (in bytes) that can be buffered during a connection. The sending host can send only up to that amount of data before it must wait for an acknowledgment and window update from the receiving host. When a receiver advertises the window size of 0, the sender stops sending data and starts the persist timer. The persist timer is used to protect TCP from the dead lock situation. The dead lock situation could be when the new window size update from the receiver is lost and the receiver has no more data to send while the sender is waiting for the new window size update. When the persist timer expires the TCP sender sends a small packet so that the receivers ACKs the packet with the new window size and TCP can recover from such situations.


Window scaling

For more efficient use of high bandwidth networks, a larger TCP window size may be used. The TCP window size field controls the flow of data and is limited to between 2 and 65,535 bytes.

Since the size field cannot be expanded, a scaling factor is used. The TCP Window Scale Option , as defined in RFC 1323, is an option used to increase the maximum window size from 65,535 bytes to 1 Gigabyte. Scaling up to larger window sizes is a part of what is necessary for TCP Tuning .

The window scale option is used only during the TCP 3-way handshake. The window scale value represents the number of bits to left-shift the 16-bit window size field. The window scale value can be set from 0 (no shift) to 14.

Many routers and packet firewalls rewrite the window scaling factor during a transmission. This causes sending and receiving sides to assume different TCP window sizes. The result is non-stable traffic that is very slow. The problem is visible on some sending and receiving sites which are behind the path of broken routers.

For more information on problems that may be caused, especially with Linux and Vista systems, please see main topic TCP Window Scale Option .


Connection termination

The connection termination phase uses, at most, a four-way Handshake , with each side of the connection terminating independently. When an endpoint wishes to stop its half of the connection, it transmits a FIN packet, which the other end acknowledges with an ACK. Therefore, a typical teardown requires a pair of FIN and ACK segments from each TCP endpoint.

A connection can be "half-open", in which case one side has terminated its end, but the other has not. The side that has terminated can no longer send any data into the connection, but the other side can.

It is also possible to terminate the connection by a 3-way handshake, when host A sends a FIN and host B replies with a FIN & ACK (merely combines 2 steps into one) and host A replies with an ACK. This is perhaps the most common method.

It is possible for both hosts to send FINs simultaneously then both just have to ACK. This could possibly be considered a 2-way handshake since the FIN/ACK sequence is done in parallel for both directions.

Some host TCP stacks may implement a "half-duplex" close sequence, as Linux or HP-UX do. If such a host actively closes a connection but still has not read all the incoming data the stack already received from the link, this host will send a RST instead of a FIN (Section 4.2.2.13 in RFC 1122 ). This allows a TCP application to be sure that the remote application has read all the data the former sent - waiting the FIN from the remote side when it will actively close the connection. Unfortunatelly, the remote TCP stack cannot distinguish between a ''Connection Aborting RST'' and this ''Data Loss RST'' - both will make the remote stack to throw away all the data it received, but the application still didn't read.

Some application protocols may violate the OSI Model Layers , using the TCP open/close handshaking for the application protocol open/close handshaking - these may find the RST problem on active close. As an example:
s = connect(remote);
send(s, data);
close(s);
For a usual program flow like above, a TCP/IP stack like that described above does not guarantee that all the data will arrive to the other application ''unless'' the programmer is sure that the remote side will not send anything.


TCP PORTS

TCP uses the notion of Port Number s to identify sending and receiving application end-points on a host, or '' Internet Socket s''. Each side of a TCP connection has an associated 16-bit unsigned port number (1-65535) reserved by the sending or receiving application. Arriving TCP data packets are identified as belonging to a specific TCP connection by its sockets, that is, the combination of source host address, source port, destination host address, and destination port. This means that a server computer can provide several clients with several services simultaneously, as long as a client takes care of initiating any simultaneous connections to one destination port from different source ports.

Port numbers are categorized into three basic categories: well-known, registered, and dynamic/private. The well-known ports are assigned by the (21), Ssh (22), TELNET (23), SMTP (25) and HTTP (80). Registered ports are typically used by end user applications as ephemeral source ports when contacting servers, but they can also identify named services that have been registered by a third party. Dynamic/private ports can also be used by end user applications, but are less commonly so. Dynamic/private ports do not contain any meaning outside of any particular TCP connection.


DEVELOPMENT OF TCP

TCP is a complex and evolving protocol. However, while significant enhancements have been made and proposed over the years, its most basic operation has not changed significantly since its first specification RFC 675 in 1974, and the v4 specification RFC 793, published in 1981. RFC 1122, Host Requirements for Internet Hosts, clarified a number of TCP protocol implementation requirements. RFC 2581, TCP Congestion Control, one of the most important TCP related RFCs in recent years, describes updated algorithms to be used in order to avoid undue congestion. In 2001, RFC 3168 was written to describe Explicit Congestion Notification ( ECN ), a congestion avoidance signalling mechanism. Common applications that use TCP include HTTP ( World Wide Web ), SMTP ( E-mail ) and FTP (file transfer).

The original TCP congestion control was called TCP Tahoe , several alternative Congestion Control algorithms have been proposed: