A way to be public but not really public on the Internet…
I’ve been in the process of writing a robots.txt file for my website when I came across an interesting robots file for the Whitehouse.
For those who don’t know what a robots file is it’s a special file placed in a website’s main directory which tells ‘crawlers’ such as Google, MSN, Yahoo! and others what parts of the website to index and what not to index. This file usually is set to only disallow directories that are private and shouldn’t be publicly accessed.
To my surprise (but really it was fairly par for the course) the whole of the Whitehouse website seems to be deliberately set up so that it cannot be mirrored by such projects as the Internet Archive. Indeed a search of how the Whitehouse website looked over the years shows that the robots file must have been altered sometime after 11th Feb 2005.
So why is this a big deal you may ask… Well quite simply it means that anything published by on the Whitehouse website can be altered without any automated way of tracking the changes. Basically you can rewrite the content history of a website… There hasn’t been a reason given as far as I’m aware of why this changed occurred so I guess we can only speculate with various consperancy theories!
Check out the parts of the website you’re not allowed index with their robots.txt file.