Archive:Hardware and hosting report/Archives 2004

From Wikimedia Foundation Governance Wiki

2004 Q4 Report

Technical Development
Technical Development

Most of the below report has been written by James Day; the part on the Paris machines is largely by David Monniaux.

Information about our servers may be found any time at Wikimedia servers. Developer activity falls into two main areas: server maintenance; and development of the MediaWiki software, which is also used for many non-Wikimedia applications. Most developers (though not all, by their choice) are listed here. One may show appreciation for their dedication by thank you notes or financial support. Thank you !
Until now, all developers have been working for free, but that may change in the future in order to support our amazing growth.

Installation of Squid caches in France

The cluster near Paris.
Our servers are the three in the middle:
(from top to bottom: bleuenn, chloe, ennael.)

On December 18, 2004, 3 donated servers were installed at a colocation facility in Aubervilliers, a suburb of Paris, France They are named bleuenn, chloe, and ennael by the donor's request. For the technically-minded, the machines are HP sa1100 1U servers with 640 MiB of RAM, 20 GB ATA hard disks, and 600 MHz Celeron processors.

The machines are to be equipped with Squid caching software. They will be a testbed for the technique of adding Web caches nearer to users in order to reduce latency. Typically, users in France on a DSL Internet connection can connect to these machines with a 30 ms latency, while they can only connect to the main cluster of Wikimedia servers in Florida in about 140 ms. The idea is that users from parts of Europe will use the Squid caches in France, to reduce by 1/10 second, access delays both for multimedia content for all users and for page content for anonymous users. Logged-in users will not profit as much, since pages are generated specifically for them and, thus, are not cached across users. If a page is not in a Squid cache, or a page is for a logged-in user, the Apache web servers must take 1/5 to three or more seconds plus database time to make the page. Database time is about 1/20 second for simple things, but can be many seconds for categories, or even 100 seconds for a very big watchlist.

The Telecity data center
The Telecity data center

The Squid caches were activated in early January 2005, and an experimental period ensued. As of January 31, the machines cache English, French, and multimedia content for Belgium, France, Luxembourg, Switzerland, and the United Kingdom. The system is still somewhat experimental, and it is expected that caching performance can be increased with some tuning. The installation of similar caching clusters in other countries is being considered.

Installation of more servers in Florida

In mid-October, two more dual Opteron database slave servers, with 6 drives in RAID 0 and 4GB of RAM, plus five 3GHz/1GB RAM Apache servers were ordered. Delays, due to compatibility problems, which the vendor had to resolve before shipping the database servers, left the site short of database power; until early December, the search function had to be turned off, at times.

In November 2004, five Web servers, four with high RAM (working memory) capacity used for Memcached or Squid caching, experienced failures. This resulted in very slow wikis at times.

Five 3GHz/3GB RAM servers were ordered in early December. Four of the December machines will provide Squid and Memcached service as improved replacements for the failing machines, until they are repaired. One machine with SATA drives in RAID 0 will be used as a testbed to see how much load such less costly database servers might be able to handle, as well as providing another option for a backup-only database slave also running Apache. These machines are equipped with a new option for a remote power and server health monitoring board at $60 extra. This option was taken for this order, to allow a comparison of the effectiveness of this monitoring board with a remote power strip and more limited monitoring tools. Remote power and health reporting reduces the need for colocation facility labor, which can involve costs and/or delays.

A further order of one master database server, to permit a split of the database servers into two sets of a master and a pair of slaves, with each set holding about half of the project activity, as well as five more Apaches, is planned for the end of the last quarter of 2004 or the first days of the first quarter of 2005. This order will use the remainder of the US$50,000 from the last fundraising drive. The database server split will allow the halving of the amount of disk writing each set must do, leaving more capacity for the disk reads needed to serve user requests. This split is intended to happen in about three months, after the new master has proved its reliability during several months of service as a database slave.

Increased traffic and connectivity

Traffic grew during the third quarter of 2004 from 450 requests per second at the start of this period to 800 per second at the end. In the early fourth quarter of 2004, that rose further, with daily peak traffic hours exceeding 1,000 requests per second ([1]). Average bandwidth use grew from 32 megabits per second (mbps) at the start of the fourth quarter of 2004 to 43 mbps at the end. Typical daily highs were 70 mbps, sometimes briefly hitting the 100 mbps limit of a single outgoing ethernet connection. To deal with this traffic, Dual 100 megabit connections were temporarily used, a gigabit fiber connection was arranged at the Florida colocation facility, and the required parts were ordered.  


2004 Q3 Report

Technical Development

It has been an exciting year, so far, on the technical side. We started with two servers in California and an Alexa traffic rank of 900 [2]. In February, the site moved to Tampa, Florida and added nine new servers. Three more servers entered service in early June and a fourth fast and sexy database server, Ariel, followed at the end of the month. After each upgrade, the number of people using the site rose to fill the available capacity of the new servers. As of the start of September, eight more web servers are in service, with special search and file servers are awaiting installation.

As of September, Wikipedia.org routinely ranks consistently in the top 500 English language sites in Alexa's traffic rankings [3], and is steadily increasing its reach. In June we saw nearly a million edits. So far, we have avoided the sluggish performance experienced at the end of 2003. Thanks to those whose donations have made it possible to keep up.

May saw the introduction of version 1.3 of the MediaWiki software, with improved templates, categories, a new site skin, and improved language support. Edit conflict handling was also improved significantly with automated merging when using section editing. Version 1.4, due in a few months, will include better database load balancing, speed improvements, preliminary support for PostgreSQL as a database engine, and tools to help with article reviewing.

Entering service soon will be the first Wikimedia hardware outside the United States - a set of three servers acting as a Squid cache in Paris. This will serve pages to users in parts of Europe, so they will not need to wait for pages to come from Florida. Once the cache is working well, we expect to do the same in other places, as offers of hosting allow.

The new developer committee illustrates the international nature of the technical team, with members from six countries, who will be working to keep up with the continued growth of our projects.

Purchases

Donations from July and August plus money raised in late December (during and immediately following a major server crash and downtime) was used to purchase over $60,000 worth of new hardware (see [4] and [5]). Category:Hardware report