Extrapolating it in the other direction, our future generations may also learn about us not just from our writings and conscious creations, but by reading between the lines and looking at the overall media experience we go through. Undoubtedly, the media that defines our time is the Web. So it is natural to assume that future generations would like to know not only the content of our websites, but how exactly we experienced and interacted with it. For example, they may be curious about the phenomenon called Facebook, and would like to know what it was and what we did with it.
Throughout our history, especially in recent times, we have, in most cases, been able to save and archive our intellectual creations. Books were stored in libraries, music was passed on or recorded, paintings were saved or reproduced, and in the recent era even plays were recorded on film and stored. Today, an unprecedented number of human beings are involved in the creation of web content, so we must ask the question, are we archiving our work?
My claim is we are not, and as things stand today, we are incapable of doing so. Because of some peculiarities of the technologies we are using today to build these websites, it will be practically impossible for anyone, even a decade from now, to take a look at what websites today looked and behaved like. We will have a few screen shots, we will have anecdotal descriptions, and we may have the actual textual and media content, but we will not be able to experience the working site.
In this article I’ll try to explain why that is the case, why it matters, and if there is anything that we can do to change that.
Web technology in a nutshell
The problem is, all of these underlying technologies are changing constantly, and unless the browser’s technology matches what the website server is expecting then the site will not render correctly. The browsers try their best to remain backward compatible and be able to display older websites as well as possible, but they are facing a losing battle. A browser today has a much better chance of displaying website that was cutting edge ten year ago than displaying a website that used the latest technology available just three years ago. This problem is likely to become more acute as the rate of technology change accelerates.
As we all know, all websites change over time, as does any other media outlet – magazines, television, radio. However, there is a subtle difference between the web and any other media. If you want to see an old copy of a magazine, you just have to pull out the old copy, and there it is, with advertisement and all. If you want to see an old TV program, you just have to record it, store it somewhere, and then play it back. However, for a website, the state of an older webpage is not saved anywhere. If you want to see your own Facebook page from last year, there is no easy way to do so. The website design and content is changing constantly, and like a flowing river, what happened before has flown by. In fact if you want to see the page you just saw a moment ago on New York Times, you cannot really do that because if nothing else, you are most likely to see the same news story but with different advertisements, and maybe even a few new user comments got added to the story.
Though not as easy as recording a TV program, it is possible to “record” a web page. That is, there are available technologies that can store the coded HTML page that the web server sent out. This code can then be played back later on and a suitable browser can then render that page exactly as it appeared the first time.
First problem: dynamic content
Now, if the page included any dynamic component, as most modern websites do, then every time the user interacts with the page, the web server must respond with exactly the same responses as it did the first time. This may happen if the recorded page code is played back almost immediately or even within a few days, but it is highly unlikely that the website design will remain static for too long. Eventually, the new website will no longer respond correctly when these dynamic requests are made. It is theoretically conceivable to also save some of these dynamic responses, but in practice that would be very hard to do, since many of these responses are in reaction to how the user interacts with the page, and how can we predict and save all possible user actions.
Second problem: changing web technologies
Third problem: changing hardware
Let’s say we saved copies of all the necessary software along with saved copies of the web page. Even after all that, it is unlikely that they will run on future computer hardware. So the only viable solution is to actually keep a computer from the same era, unchanged, and then try to reload the saved web page code on that machine. This is a sound solution in principle, but for all practical purpose is almost impossible to carry out. We will have to keep a computer from every period in history and use the right machine to play back websites from that era.
Even if we are willing to do that, what is the likelihood that a physical computer that we keep archived today will actually work when plugged in 50 years from now. Do we then also keep a storehouse full of spare parts from each period?
The above discussion makes it clear that even if we can solve all the technical hurdles, doing so will have an enormous cost. Since the historical value of this can only be appreciated by our future generations, it is unlikely anyone today will be able to justify the enormous cost of creating such an extensive hardware and software archive. The value to us, today, of being able to see a Facebook page from two years ago is too small to justify any major effort. That probably explains why we have not done much towards solving this problem.
There are a number of web archives, some operating since the early days of the web (e.g. http://archive.org/web/web.php). They try to store copies of the HTML page and the associated media files. While this is of enormous value, they have not been able to solve the problems discussed above. As a result, they are more successful at rendering older web pages than more recent ones.
One possible solution is to store the functioning web page experiences as a video streams. That is, to create an automated system that can try to exercise a web site by interacting with it the way a human would and record the screens as a video stream. There are two major technical hurdles here – (a) it is a non-trivial problem to create an artificial Intelligence program that can interact with the page is a sensible and human manner, and (b) with the new range of human-machine interaction through touch gestures, physical movement, and voice, it is not at all easy to simulate a human user’s actions.
Therefore, even if we can build such a system, it will only be a rough approximation of what the users actually experienced. Moreover, for the viewer of this video stream, they will get a passive view of what the web site looked like, but they will not be able to actually experience the interaction that the original users had.
What is truly remarkable is that it is the first time in modern history that we, as a race, is spending so much time on an activity (both creation and use of web sites), and yet we have no practical means or desire of archiving it for posterity. That is how we have always dealt with our real life experiences -- we could never archive them and they only existed in the present and in our memories. Now our creations are also joining that rank. May be it is time to recognize that we have entered a fundamental shift – that we are moving into a world where everything that matters is just in the present. Maybe we are transitioning from a society that valued continuity of history to a society that only connects to the present time -- a Twitterized world, where nothing needs to be permanent – just a continuous flow of information. We are standing beside a river and watching it flow by, enjoying what is in front of our eyes, with no need to run along the banks to see what floated by before.
Older Comments (2)
1. Michael Riedijk said on 11/16/12 - 10:53PM
2. Kunal Sen said on 11/19/12 - 10:51AM