He founded the Internet Archive with a utopian vision. That hasn’t changed, but the internet has

Inside his library, Brewster Kahle is dancing. He smiles as he swings to the spot, an ancient Victrola filling the lobby of the building, a former church, with the scratchy jazz melodies of the past.

He raises the needle and the music stops, but only for now. Soon his staff will convert the aging record to a string of units and zeros that will live forever in cyberspace. Here’s the Internet Archive, and that’s why Kahle, and it, is here: To make available for free, online, every piece of digital or physical information that exists.

Walking with Kahle through his colonized temple of knowledge in San Francisco’s Richmond District is to understand the extent of what he and his staff, now numbering more than 100, have been working on for nearly 25 years. In a loaded area stacks of donated books are waiting their turn on a special scanner where, covered behind a black curtain, a technician painstakingly copies endless pages.

Below, microfiche reels are converted into computer images that will adhere to the astonishing amount of data the archive has collected over the years.

Its servers contain more than 70 unique petabytes of data – 70 million gigabytes – including 65 million texts, movies, audio files, images, books and more.

Kahle’s quest to build what he calls the “Library of Alexandria for the Internet” began in the 1990s, when he began sending programs called crawlers to take digital snapshots of every page on the web, hundreds of billions of which are available to anyone through Archive’s Wayback Machine.

This vision of free and open access to information is deeply intertwined with the early ideals of Silicon Valley and the origins of the internet itself.

“The reason for the internet and specifically the World Wide Web was to make it so that everyone is a publisher and everyone can go and have a voice,” Kahle said. He realized the need for a new kind of library for this new publishing system, the Internet.

But while Kahle’s goals have not changed, so has the Internet. This early utopian vision of the positive forces of digital interconnection is increasingly in conflict with the growing amounts of copyrighted and paid-for-wall material on the Internet.

Left: 1947 Albany (NY) Times newspaper at the Internet Archive offices.  Right: Book scanner Eliza Zhang opens a box with Albany Times newspapers.

Left: 1947 Albany (NY) Times newspaper at the Internet Archive offices. Right: Book scanner Eliza Zhang opens a box with Albany Times newspapers.

Photos by Constanza Hevia H./Special to The Chronicle

When the archive began its collection, most people accessed a few major homepages such as Yahoo.com, said Margaret O’Mara, a professor at the University of Washington and a Silicon Valley historian.

“Now, not only is there a lot more information, but also a lot of that information is proprietary,” O’Mara said. “There are questions about how the internet works and how the internet economy works that cannot be answered by capturing web pages or capturing documents or digitizing a magazine.”

Despite this, she said the archive is a valuable resource for researchers like herself and reflects the idealism at the root of Silicon Valley’s dream of a more open, connected and accessible world.

“They keep the past in a way that is a rare thing to see in the industry and a community that is always so focused on the future and focused on what the next thing is,” O’Mara said.

That changing web landscape is in Kahle’s mind as he enters the beating heart of the archive’s cavernous main room. The space is quiet. Scattered by a golden light that seeps through the windows, the former church nave still feels somehow sacred. Few people are in the building due to the pandemic, but this room is never really empty, its banks are inhabited by miniature statues of past and present employees and volunteers, including glasses from Kahle himself.

Here, the server banks are buzzing and flashing with every upload and download while Kahle discusses how libraries, even in cyberspace, can burn.

Across the auditorium flanking the main stage where anthem numbers were once posted, three numbers are chosen in metal: 200, 404 and 451. The first two are common internet codes for when a page is successfully accessed or not. The third appears when content has been removed for legal reasons, such as copyright infringement.

Nor is it by chance a reference to Ray Bradbury’s anti-censorship novel “Fahrenheit 451”.

Book scanner Eliza Zhang, one of more than 100 employees, works at the offices of the Internet Archive in Richmond County.

Book scanner Eliza Zhang, one of more than 100 employees, works at the offices of the Internet Archive in Richmond County.

Photos by Constanza Hevia H. / Special to The Chronicle

Kahle has said in the past that if one library and its books burned, copies would probably live in another physical space. “That’s not the case on the net,” he said. For example, “If a newspaper is disconnected in Turkey, all their archives go. And so you can’t manage culture.”

The archive has for years purchased and digitized books, lending them through its website for free with a waiting list like other libraries. But when the coronavirus pandemic hit last year and libraries and schools closed, the archive created what it called the National Emergency Library, a collection of 1.4 million online books available to users without waiting.

This was followed by a lawsuit filed by four of the nation’s largest publishing houses, one of the many challenges the archive faces in its quest for freedom of navigation rights in cyberspace.

Kahle claims that copyright laws do not prohibit libraries such as his own from owning, digitizing, and lending books with certain controls in place.

Perhaps an even bigger barrier in Kahle’s mind is smartphones, and the proprietary and protected programs that fill them.

“These things are full of unopened apps,” he said, holding his phone during a recent Zoom call. This also means that many of them are immune to their reptiles and cannot be kept for posterity. This is a deeply annoying problem for the archive’s mission, along with pay walls that can and do block Kahle’s ramps.

Brewster Kahle, who founded the Internet Archive 25 years ago, discusses the San Francisco organization's servers, which contain more than 70 million gigabytes of data - including 65 million texts, movies, audio files, pictures, books, and more. .

Brewster Kahle, who founded the Internet Archive 25 years ago, discusses the San Francisco organization’s servers, which contain more than 70 million gigabytes of data – including 65 million texts, movies, audio files, pictures, books, and more. .

Constanza Hevia H. / Special to The Chronicle

The original internet format of hyperlinks still in use today allows people to “weave knowledge together,” he said. But “the world is innately silenced into corporate products. That’s not how we build a culture that works together, builds on each other and can build new ideas.”

Kahle’s career in technology extends back to the early 1980s when he graduated from the Massachusetts Institute of Technology, where he studied artificial intelligence before graduating. He helped found a supercomputer company called Thinking Machines before creating the first Internet publishing system called Wide Area Information Server, which was later sold to America Online.

In the past Kahle has also found ways to make money from software without sacrificing the ideal of the archive. When he sold Alexa Internet, an online research and information company he co-founded in the 1990s, to Amazon, he made a deal with then-CEO Jeff Bezos. He would sell the software only if Bezos allowed him to continue donating a copy of the internet to his archive on a daily basis. Bezos agreed.

The Internet Archive is funded today by many small donations, averaging about $ 20 a piece, according to Katie Barrett, the archive’s senior development manager. The archive also makes money by scanning books for libraries and receives funding from the Kahle / Austin Foundation, which was founded with Kahle’s wife, Mary Austin.

2019 tax forms show that the archive’s revenue exceeds $ 36 million for the year, with nearly $ 30 million of that in contributions and grants.

In its quest for a more open and accessible world, the nonprofit works with Wikipedia, repairing links and updating pages that link back to sites that would have been lost if the Wayback Machine had not saved them in the first place. Working with the archive, Wikipedia has added more than 25 million archived web pages, mostly Wayback Machine links, to 150 Wikipedia language editions.

“We share a vision of the Internet where non-profit services can increase humanity’s access to knowledge,” Gwadamirai Majange, a spokeswoman for the Wikipedia-owned Wikimedia Foundation, said in an email.

The Internet Archive building in the Richmond County.

The Internet Archive building in the Richmond County.

Constanza Hevia H./Special to The Chronicle

The archive has also partnered with groups such as the Digital Public Library of America, contributing mostly digitized print material to its website.

Groups such as the Long Now Foundation also seek to foster such long-term thinking through its 10,000 Annual Clock and a project to create a digital library of human language for future generations, in part as a counterpoint to the short-term, profitable models of modern technology companies.

Kalhe has also expanded its nonprofit efforts outside the digital world.

Among these was an unfortunate attempt to establish a credit union with $ 1 million from the archive. A more successful bid saw him set up another nonprofit and buy a nearby apartment building in San Francisco where some of his employees live for submarket rates.

For his part, Kahle said he recognizes the growing challenges to the mission, but that has not stopped him yet. “I wake up on different sides of the bed saying, you know, this is going to work, and we’re making it go away,” he said. “And other times it’s like there’s so much against us.”

Despite this, Kahle’s servers continue to blink blue with life in that big silent room. And while millions of people continue to access the seemingly endless collection, the Alexandria Library of the Internet will live on, long after its founder, as he says, “Go to the great archive in heaven.”

Leave a Reply

Your email address will not be published.