Announcing a book for download on Slashdot

Summary

A Slashdot story generated approximately 10 times as many downloads of my book's pdf as the combined total number of downloads generated from all other stories and announcements posted prior to the Slashdot story appearing. It is estimated that this Slashdot story generated between 19,000 and 37,000 downloads of the book's pdf file; consuming between 150 GB and 250 GB of bandwidth.

Introduction

Having decided to make my book available as a free download I wanted to publicize its availability. However, I did not want to cause any web sites hosting the pdf to be temporarily disconnected from the internet because of excessive bandwidth usage.

I expected any announcement on Slashdot to generate a huge demand for bandwidth (the so called Slashdot effect) and decided to post an announcement to Slashdot last (ie., after posting announcements to all other applicable sites). While there have been some reports of the impact of the Slashdot effect diminishing in recently, I did not want to take any chances. I hoped that announcing the availability of the pdf file on a variety of forums over a two week period would take the sting out of the Slashdot bandwidth demand.

Where to host the pdf file?

Part of my cable modem package, from NTL, includes a web site. While there is a 55 MB limit on disc usage, the terms and conditions do not list any bandwidth limits. I had not previously used this web site and having a bandwidth cap suddenly introduced would not cause me any problems. I therefore decided to use the address homepage.ntlworld.com/dmjones/ in all announcements.

One annoying consequence of using the ntlworld site is that no log of file accesses or visitor information is provided. So I would have no way of knowing how many downloads of the pdf file, from this site, had occurred.

I posted a request to the ACCU general mailing list asking if anybody had any bandwidth to spare. Chris Wright came forward and offered the use of some of the bandwidth he had available to him.

The pdf file was also available from various other web sites. These sites had enough bandwidth to handle any downloads that were likely to occur (e.g., from people finding the site via a Google search).

Time-line

The following is a rough guide to the various announcements that I posted to various Internet sites, during 2005.

Download Statistics

A number of general points could be extracted from the access logs (excluding accesses to the ntlworld site, which did not provide logs). The following is a breakdown by host site that hosted the book's pdf file.

Chris Wright

To date Chris has logged 12,009 unique visitors (approximately 11,500 referred by Slashdot) consuming approximately 88.7 GB of bandwidth.

Chris detected two people trying to hack into the server hosting the pdf file. Both of them downloaded the document, then started looking around at various directories and then tried a few port scans on the web server, as well as a few brute force attacks. They weren't too clever about hiding their IP address. This behavior was reported to their respective ISP.

mirrordot

Mirrordot automatically mirrors the pages that appear in Slashdot stories, including pdf files. This service provides a means of advertising their colocation, hosting, and security services.

Once I knew that my story had been accepted by the Slashdot editors I regularly checked to see whether it had been posted to the main web page (there can be a delay of several hours between a story being accepted and it appearing on the web site). I was lucky enough to catch the story before too many comments had been posted and was able to post a comment that included the link to the mirrordot mirror. This comment was soon moderated up to level 5 and appeared as the first comment after the story itself.

Jay Jacobson, one of the creators of mirrordot, reported that during the first 24 hours there had been 1,987 downloads of the pdf file and approximately 16 GB of bandwidth had been consumed.

The following shows how total demand for bandwidth, by mirrordot, varied over time (the book's pdf was the only file consuming significant amounts of bandwidth during this time).

mirrordot load

ntlworld

Without any access logs it is not possible to give an accurate number for the total downloads generated by the Slashdot story. I have not been able to find any information on the bandwidth capacity of the ntlworld web sites. I assume the available capacity is shared between a large number of customer web sites.

The ntlworld address was given as the primary point of download. The address of an available mirror (supplied by Chris Wright appeared next to it). Experience suggests that Slashdot readers are very familiar with the Slashdot effect and may try to limit the load on the primary site by selecting one of the available mirrors.

Did the ntlworld address attract significantly more than the 11,500 visitors to Chris's site? If readers experienced problems or slow download speed with the primary site, are they likely to try one of the mirror sites? In the hours following the publication of the story on Slashdot I was able to download the pdf file at the full rate supported by my 2 Mbit link. But then I am an NTL customer connected directly to the NTL network, as are nearly 1.5 million broadband customers in the UK.

It is possible that the ntlworld site successfully processed twice as many visitors as Chris's site (ie., 23,000), and it is also possible that only half as many were successfully processed because of capacity problems (ie., 5,750).

BitTorrent

BitTorrent is a tool/protocol for cooperative distribution that has become very popular. Users of BitTorrent form a network where it is possible for everybody who has a copy of a file to share in its distribution to others who request a copy. The advantage of using BitTorrent is that bandwidth demands are distributed over all users who have a copy of the file.

The potential disadvantage of using BitTorrent is that downloads may take significantly longer (it depends on how many other people in the network have a copy of the requested file). When distributing very large files (e.g., iso images) BitTorrent may be the only method of download. I expected Slashdot readers, given the choice, to try non-BitTorrent methods of download first. I also believed there was sufficient bandwidth available to meet these demands and so did not provide a BitTorrent link.

Thirty-five minutes after the story was posted Michael Williamson set up and seeded a BitTorrent link. In the five days after the story appeared the .torrent file was accessed 274 times and there were 165 complete downloads. While this comment was quickly moderated to level 5, it was not listed close to the text of the original story.

knosof.co.uk and coding-guidelines.com

The book's web page is hosted on my company's web site. Information is slowly being migrated to a site dedicated to coding guidelines. To ensure that the Slashdot effect did not result in access to either site being denied because of excessive bandwidth usage I decided that neither site would be featured in the story submitted to Slashdot.

The address of the book's web page was posted in a comment to the story and quickly moderated to level 5. However, this comment was not near the top of the list of posted comments. Traffic to the knosof.co.uk site did increase significantly. The following graph shows the dramatic increase in page hits that occurred. The increase in bandwidth was not so dramatic. The total amount of traffic did not come close to causing any maximum limits to be exceeded.

knosof.co.uk load

For some reason one Slashdot reader posted a comment listing my recently started blog as the book's web page. Although this comment was posted earlier than the book's current web page address at knosof.co.uk, it was not subsequently moderated to a high enough level to be prominently listed. The bandwidth used by the coding-guidelines.com site increased by approximately a factor of two throughout the week following the Slashdot story. Given that this web site only came into existence in June and was not yet widely known, this increase in bandwidth was not exceptional.

An average story?

New stories can appear on Slashdot at any time. During a busy day many stories may appear in quick succession, while on a quiet day there may be few new stories posted. It is possible that the number of people reading a new story will be affected by the presence of other newly posted stories.

The time difference between stories that appeared immediately before the book pdf story was 2 hours 32 mins, 1 hour 14 mins, and 35 mins. The time difference between the book pdf story and the immediately subsequent stories was 47 minutes, 1 hour 20 mins, 2 hours 7 mins, 2 hours 51 mins, and 3 hours 16 mins. These figures are reasonably typical of the kinds of intervals that occur between stories appearing on Slashdot. The bandwidth and hit-count graphs appearing above show that traffic volume comes down from its peak within a few hours

The number of comments posted, at just over 400, was towards the upper end of the level of commenting usually seen.

Downloads generated by other announcements

The following discussion considers those sites that are likely to have generated in the region of 1,000 downloads.

The Inquirer

The Inquirer is a widely read, Internet based, IT news and gossip publication. The Inquirer published quite a long story (compared to the Slashdot one that appeared two weeks later) that gave the background on the reason for the release of the book's pdf and was humorous, rather than technical, in content. At the end of the story a link to the ntlworld copy of the pdf file was given (so no access log is available).

The stories covered by the Inquirer cover a wide range of IT related subjects, with only a small percentage being directly related to software development. It is to be expected that visitors to the Inquirer site have a wide range of IT related backgrounds, with only a small percentage being software developers. The Inquirer displays a list of the current top five most popular stories down the left-hand side of every page. When it first appeared the C book story did not make it into this top five list (or if it did, the period of time was short enough that your author did not see it occur). However, a few hours after the C book story (which included a link to the Inquirer story) appeared on Slashdot, the original Inquirer book story appeared at number five on the top five list. The obvious conclusion to be drawn is that significantly more people read the Slashdot story, clicking on the link to read the Inquirer story, than read the original Inquirer story when it first appeared on that site. E-mail provides a possible method of estimating the number of downloads made by readers of the Inquirer story. Some of the people who had started to read the book sent me an e-mail (some thanking me for making it available, others pointing out typos). The Inquirer story resulted in one e-mail being sent to me by a person who had downloaded the pdf; I also received one e-mail from somebody who had seen one of the newsgroup announcements and downloaded the pdf; the Slashdot story generated 30 e-mails. Can we infer that the Inquirer story generated 1/30th of the downloads of the Slashdot story? A single e-mail does not have much statistical significance. Two e-mails would imply only a 1/15th difference.

Based on this analysis it is likely that the story in the Inquirer resulted in around about 1,000 downloads of the pdf file.

News groups

I had expected the announcements posted to various newsgroups to generate a significant number of downloads. The announcements listed the ntlworld and Chris Wright sites as the primary download sites. The site c0x.coding-guidelines.com was listed as a source of C0X information (a link to the book's pdf file appears on this site's index page).

During the period June 28 to July 4 the logs provided by Chris Wright show 498 unique visitors and 3.6 GB of bandwidth used. The coding-guidelines.com logs show approximately 200 unique visitors during this period. Without a log of accesses to the ntlworld site it is difficult to estimate the total number of downloads occurring as a direct result of these posts. Numbers similar to those seen by Chris Wright would seem to be reasonable estimates.

Blogs

A number of people included information on the availability of the book's pdf in their blog. Six blogs referred more than 100 visitors to sites for which logs are available, with lambda-the-ultimate.org being the top referrer at over 250. References to the book also appeared in some unusual forums.

Non-English Slashdots

There are many people in the world who do not get their technical news from web sites written in English. It is likely that a story appearing on Slashdot will eventually be translated, probably edited, and posted to a non-English speaking Slashdot 'like' site. For instance, a story appeared on the Russian Linux User Group site nine days after the Slashdot story appeared. The author of this story referenced the book's web page directly (until Google supports a Russian translation I don't know what else the story said). The following graph shows the impact of the story on page accesses.

load from www.linux.org.ru

While stories have appeared on other non-English web sites, accesses made via these stories have occurred as a steady stream over a period of time (a week or so).

Download totals

The following are the number of downloads believed to have occurred directly as a result of a story appearing on Slashdot.
Chris Wright         11,500
ntlworld              5,750 to 23,000 (estimated bounds)
mirror dot            2,000
coding-guidelines.com   200
BitTorrent              165
The following lists an estimate of the number of downloads attributed to announcements posted to news groups and entries in people blogs in the two weeks prior to the story appearing on Slashdot.
The Inquirer      500-1,000
News groups     1,200-1,600
Blogs             700-1,100

January 2009 Update

The book is now big in China! A blog entry resulted in around 3,000 downloads over a week. A Google translation reveals that the blogger translated some key section into Chinese and also thinks I have the "heart of Budda". The shape of the download curve was similar to what was seen with Russia. While China has an internet population that is of similar size to the US, they are obviously not native speakers of English. The number of downloads was around 10% of the US figure over a comparable time scale, suggesting that there are a lot of software developers who think their English is good enough to read a programming book written in English.

Conclusion

Over the two week period of announcements it became obvious that I had wildly over estimated the number of people reading the non-Slashdot forums that I was posting to. I had expected at least 10,000+ downloads of the pdf file and an estimated 1,500 actually occurred.

Leaving the posting of a story to Slashdot last is a dangerous plan that has little benefit. It is dangerous in that somebody else might read an announcement posted to another site and decide to post a story to Slashdot themselves. The evidence from this book's (i.e., a developer oriented book) series of announcements is that Slashdot is likely to generate over 90% of the downloads that occur in the first month or so of it being made available. By posting a story to Slashdot first an author is able to control when it appears (to within a few hours) and the primary links to where it can be downloaded.

Feedback

Please send any feedback to cbook "at" knosof dot co dot uk


Last updated