| News Server Operation |
Article Index for News |
Website Links For News |
Information AboutNews Server Operation |
| CATEGORIES ABOUT NEWS SERVER OPERATION | |
| usenet | |
| usenet servers | |
|
ARTICLES AND POSTS End users often use the term "posting" to refer to a single message or file posted to Usenet. For articles containing plain text, this is synonymous with an article. For binary content such as pictures and files, it is often necessary to split the content among multiple articles. Typically through the use of numbered Subject: headers, the multiple-article postings are automatically reassembled into a single unit by the Newsreader . Most servers do not distinguish between single and multiple-part postings, dealing only at the level of the individual component articles. HEADERS AND OVERVIEWS Each news article contains a complete set of header lines, but in common use the term "headers" is also used when referring to the News Overview database. The overview is a list of the most frequently used headers, and additional information such as article sizes, typically retrieved by the client software using the NNTP XOVER command. Overviews make reading a newsgroup faster for both the client and server by eliminating the need to open each individual article to present them in list form. If non-overview headers are required, such as for when using a Kill File , it may still be necessary to use the slower method of reading all the complete article headers. Many clients are unable to do this, and limit filtering to what is available in the summaries. SPOOLS When the server stores the body of an article, it places it in a disk storage area generically called a "spool". There are several common ways in which the spool may be organized:
SPEED Speed, for the purpose of this article, is how quickly a server can deliver an article to the user. The server that the user connects to is typically part of a server farm that has many servers dedicated to multiple tasks. How fast the data can move in this farm is the first thing that affects the speed of delivery. Once the farm is able to deliver the data to the network, then the provider has limited control over the speed to the user. Since the network path to each user is different, some users will have good routes and the data will flow quickly. Other users will have overloaded routers between them and the provider which will cause delays. About all a provider can do in that case is try moving the traffic through a different route. If the ISP has limited connectivity to the network, routing changes may have little effect. Frequently a user can reduce the impact of network problems by using multiple connections. Some servers allow as many as 8 simultaneous connections, but this varies widely. Likewise, Newsreader s are commonly limited to using as few as two or four connections. ARTICLE SIZES Article sizes are limited to what the servers will accept. For text users this is generally not a problem. For Binary users this can be a problem since the maximum article size varies from site to site. The larger the article size, the fewer articles on each server. This generally means that a server can run with less overhead which makes for a more efficient server. This is due to the fact that fewer articles reduces the overhead needed to process them. However, the larger the article size, the fewer servers the article will arrive on. SERVERS Users frequently call their service a server. In many cases this is very far from the truth. While each service is different, here is a list of the various types of server roles that a provider will have in each server farm it runs. Roles can be mixed at a given site, for example numbering and transit may be provided by the same system. ; Transit server : These are the servers that handle basic article exchange. They exchange traffic with remote servers, supply articles to the numbering servers, and transmit articles posted from the local front end servers. ; Numbering server (stamper) : This server inserts the RFC 1036 Xref: header into each article, so that the back and front end servers all present article lists in a uniform manner. ; Back end server : This is the data storage system for the front end servers. They usually have multiple RAID disk arrays to hold the data. The provider can increase reliability by using multiple backend servers with redundant data, redundant arrays attached to the same server, or even both. ; Front end server : These are the servers that a user would actually connect to. It is not unheard of for a large commercial news service provider to have more than 50 front end servers. These systems usually only store overviews locally, and retrieve article bodies from the back end servers. These systems typically carry the heaviest CPU load in the farm. Large server farms typically also place Load Balancer s between the front end servers and the network. RETENTION Retention is simply defined as how long the server keeps articles. Most users want retention to be long enough so that they don't need to access the server every day. Conversely, overly long retention can overwhelm users with slow computers or network connections by making the article lists inordinately large. Retention is generally quoted separately for text and binary articles, though it may also vary between different groups within these categories. The times vary greatly according to the amount of storage available on the servers and continually increasing traffic, but As Of 2005 it is common for specialist news providers to have text retention of over 100 days and binary retention of over a week. It can be difficult for end users to accurately measure the retention of a server. One common method is to examine the oldest articles in a group and examine the Date: headers, but this is not always accurate. Some articles in a group may be retained for longer than others, articles from remote servers do not always arrive promptly, and at times the date headers are simply incorrect. A sampling of many or all articles, preferably in more than one newsgroup, is required to detect such anomalies. COMPLETION Given the large number of articles transferred between servers and the large size of individual articles, their complete propagation to any one server farm is not guaranteed. The term "completion" is used to describe how well a service is keeping up with the traffic. The primary obstacle to calculating the completion percentage is how many articles were posted. Looking at only one server, one cannot know how many articles were actually inserted throughout the network. Articles may never make their way outside the originating server, or may fail to find their way out to the transit cloud. Very large articles are frequently dropped, and tend to propagate less well than smaller ones. One way to measure completion is to access multiple servers and retrieve lists of articles. Because Message-ID: headers are nominally unique throughout the network, comparison of the lists is mostly a straightforward task. Practical limitations to this type of measurement include the impossibility of obtaining lists from all servers worldwide, the fact that many servers filter out Spam or employ Usenet Death Penalties , and that some servers mask incompletion by hiding multipart binary sets with missing articles. It is also necessary to take into account propagation times and retention; an article may simply have not yet arrived at a given server, or it may have been present but already expired. |
|
|