The Internet Learned to Forget

Try to find a blog post someone linked you to in 2015. Not the topic — the actual page. Try to dig up a tweet you saw in 2020. Try opening an old bookmark from a laptop you used five years ago.

Go ahead. I'll wait.

For most of the 2000s, the internet's defining feature was that nothing went away. You could stumble onto a GeoCities page from 1998 and it would just sit there, still blinking, still proud. Early Google's genius was surfacing old things. The web was a library where the librarian never threw anything out.

That's not true anymore. The internet is now a river. The stuff floating past you this week is loud and bright. Everything upstream is muddy, and a lot of it is gone.

What the Numbers Look Like

Pew Research did the math in 2024, and it's uglier than most people realize.

What Vanished	By When	Source
25% of all webpages from 2013–2023	Oct 2023	Pew 2024
38% of webpages from 2013 specifically	Oct 2023	Pew 2024
8% of webpages from 2023	Oct 2023	Pew 2024
~20% of posts on X (Twitter)	2024	Pew 2024
11% of Wikipedia's external references	2024	Pew 2024
50% of URLs cited in U.S. Supreme Court opinions	2014	Harvard Law Review
70%+ of URLs cited in Harvard Law Review	2014	Harvard Law Review

That last row should stop you. Legal scholarship — the most footnote-obsessed writing on Earth — loses most of its external links within a few years. If Supreme Court opinions can't hold their citations together, what chance does your Slack thread from 2021 have?

And note the pattern in the top three rows: 8% of 2023 content is already gone. Things aren't surviving their first year.

How the Internet Started Forgetting

Three shifts happened at roughly the same time, and together they rewired the web's relationship with the past.

1. Platform extinction events. GeoCities was shut down by Yahoo in October 2009. Google Reader in July 2013. Vine in 2017. Google+. Tumblr's original porn era. Each shutdown took millions of pages with it. Some were rescued by Archive Team and friends. Most weren't. When a platform dies, its content dies with it — and the links pointing to that content across the rest of the web become tombstones.

2. The timeline replaced the timeline. Twitter switched to an algorithmic feed in 2016. Instagram in 2016. TikTok was built algorithmic from day one. The word "timeline" stopped meaning "chronological record" and started meaning "whatever the algorithm thinks you want right now." Content is no longer organized by when. It's organized by engagement. A brilliant post from 18 months ago is functionally invisible, not because it's gone, but because no feed will surface it.

3. Search started caring about freshness. Google's ranking now heavily weights recency for most query types. Older content gets demoted, even when it's more accurate. SEO accelerated this — every piece of "evergreen content" eventually gets out-SEO'd by an AI-generated 2026 listicle with better schema markup. The old good page is still up. You just can't find it.

The People Holding the Line

The internet is not forgetting alone. There's a counter-movement, and you've probably used it without thinking.

The Internet Archive was founded by Brewster Kahle in 1996 — before Google existed. The Wayback Machine launched in October 2001. As of October 2025, it has archived over a trillion web pages and 99+ petabytes of data. A small non-profit is holding most of the web's memory on a relatively tight budget, and the rest of us mostly notice this when a link 404s and we reach for web.archive.org/web/* as a reflex.

Then there's Perma.cc, built by Harvard's Library Innovation Lab, specifically so that academic citations stop rotting. Pinboard survives as the bookmarking tool that refuses to die. There are Archive Team volunteers who literally race to rescue platforms before they get shut down. These projects are heroic. They are also structurally overmatched — Pew's data is what the web looks like with those archives running at full tilt.

Why This Matters Right Now

Here's the part that keeps me up at night.

Every AI model on the market is trained on what the internet remembers. If the internet's memory is warping — more recent, less archival, algorithmically flattened — then the model's sense of history warps with it. An LLM's understanding of "the web in 2012" is assembled from the pages that survived to 2023. The surviving pages are not a representative sample. They skew toward institutional sites, Wikipedia, and whatever SEO managed to keep alive.

Each new model generation inherits a shorter, stranger, more homogeneous memory of what the web used to be. The blogs and forums where most of the actually interesting writing happened in 2005–2015 are disappearing faster than the corporate content that replaced them. The signal is fading while the noise stays loud.

This isn't a conspiracy. It's physics. But the cumulative effect is that humanity's primary reference material is now a highlight reel of the last 18 months plus whatever Wikipedia managed to footnote in time.

What You Can Actually Do

None of this is fixable at the individual level. But the individual level is where the small wins are:

Archive before you link. If you cite something in a post, hit web.archive.org/save first. Takes three seconds. Gives the link a second life.
Save stuff you care about locally. Not just bookmark it — download the page, the PDF, the image. Bookmarks are a promise the web doesn't keep.
Link to the canonical URL, not the tracking-parameter one. Survival odds go up.
Pay for the archive services that work. Pinboard, Internet Archive donations. They're running on fumes compared to what they're preserving.

The Thing I Can't Stop Thinking About

The internet used to be criticized for never forgetting. That was the whole "permanent digital record" panic of the 2010s. One embarrassing tweet could follow you forever.

We won. Almost nothing follows you anymore. Not your embarrassing tweet. Not your good blog post. Not the forum thread where you figured something out. Not the comment section where a stranger changed your mind.

The internet didn't stop remembering because we asked it to. It stopped remembering because remembering stopped being profitable, and forgetting scales better. And now the medium we built to be humanity's memory is quietly becoming its opposite.

That's not a glitch. That's the new default. And the weird thing is — almost nobody's talking about it, because the posts saying so wouldn't be in your feed anyway.

Sources

When Online Content Disappears — Pew Research Center, May 2024 — the link rot study with the 25% / 38% / Wikipedia / Twitter numbers
Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations — Harvard Law Review, 2014 — Zittrain, Albert, Lessig on 70%+ reference rot in legal citations
Internet Archive — Wikipedia — history of the Archive, Wayback Machine scale, petabyte counts
Wayback Machine — the primary source, still somehow running
Perma.cc — Harvard Library Innovation Lab — the citation-preservation tool built specifically to answer the 2014 Harvard study