Dozens of scientific journals have disappeared from the internet, study finds

The study does not analyze why these specific journals disappeared, nor their quality, but it did find that more than 50% of them had a university affiliation. Regarding topics, more than 50% of the journals that disappeared were in the social sciences and humanities, although health, physical sciences, mathematics and life sciences were also represented.

“There is usually an immense amount of time spent by many different people behind each article,” from authors to editors to peer reviewers, Laakso told CNN.

“For all this work to be undone and cut off from any impact on the world, for such a trivial reason as not having a backup system in place for PDF files is not something that should be accepted,” said added Laakso.

The study, published in a pre-printed version, is available on arXiv, an open access archive of scholarly articles.

Track missing logs

With little documentation available on content that goes offline, the researchers said they had to do “detective work” to collect data, which they said speaks to the need for better tools to capture this phenomenon.

They have tracked journals’ presence in major bibliographic indexes such as the Directory of Open Access Journals and others over the years. They said the process was like browsing old phone books to see if, when a phone number is removed from the index, the users are still alive.
After narrowing down the list of titles that were no longer in the indexes over the years, they then sifted through thousands of web pages to figure out what had happened to each individual title, drawing on tools like the Internet Archive Wayback Machine.

In terms of absolute numbers, the study finds that only a small proportion of open access journals have disappeared over the past two decades, but the authors caution against optimistic reading.

“We believe that more journals are at risk of disappearing in the future,” said Lisa Matthias, doctoral student. candidate at the Free University of Berlin and co-author of the study, told CNN.

The study identified 900 “inactive” journals that were likely to disappear, as more than three-quarters of journals that ended up going offline did so within 5 years of the last publication.

In an email to CNN, the Directory of Open Access Journals said the study “strengthens our view that the DOAJ needs to help these journals, indexed with us, preserve their content, and we need to find a model where , depending on their economic profile, the cost of this operation is not always passed on to the newspaper. ”

“A constantly evolving set of sands”

Why is digital content disappearing from the Internet? There are many reasons, ranging from technological advancements that make web pages obsolete, to unpaid web hosting bills.

“The average lifespan of a web page is 100 days before it is edited or deleted,” Brewster Kahle told CNN. Kahle is the founder of the Internet Archive, a nonprofit organization that aims to be “the library for the Internet,” as Kahle puts it.

“The web is an ever-changing set of sands,” Kahle said.

The problem affects all kinds of digital content, but when it comes to scholarly literature, there is still a knowledge gap on what can still be saved.

The Internet Archive set out to research and archive all journal articles available online in 2018, and more recently received funding from the Mellon Foundation to pursue this goal, Kahle explained.

“Based on our analysis, 18%, or over 3 million open access articles since 1945, are not independently archived, either by us or by other preservation organizations,” Kahle said. The Internet Archive and the authors of the Open Access Journal Disappearance Study have joined forces to address the issue.

The cost of preserving knowledge

Historically, the role of preserving content for future generations belonged to libraries, but in the digital age, the role of libraries has become much more complex as the increasing cost of commercial scholarly literature impacts their budgets.
Judy Ruttenberg, senior director of scholarships and policy at the Association of Research Libraries, told CNN that libraries have been focusing on this work since the 1990s, but all the while, “subscription costs have started outperforming inflation by far and crowding out investment in other publications and, quite frankly, in programs like preservation. “

According to Ruttenberg, the study on the disappeared open access journals is “a red flag for us to pay more attention”.

What is needed, according to Ruttenberg, are coordinated approaches as the scientific community shifts from a commercial dominated publication mode to open access.

“This story is about the allocation and coordination of resources,” Ruttenberg said.

Subscription-based digital science content is not exempt from the problem of disappearing from the web, but content from smaller or more independent open-access publishers does not have some of the protections and resources that commercial content is more susceptible to. To benefit.

“The publishing technologies used to deal with preservation and archiving are mostly American or European initiatives where the solutions come at a price,” the Directory of Open Access Journals told CNN in an email.

“For traditional commercial or corporate publishers, the costs of implementing such a service and then depositing into it are negligible, compared to subscription revenues or open access publishing costs. For small publishers run by academics or for single journals, often without a constant revenue stream, the fees can be prohibitive, ”the DOAJ explained.

There are also technical issues to consider.

“Embedding content into a service can require specialized knowledge and often involves some form of testing and sampling. The people who run these journals may not have the time, skills or funding to be able to do so, ”the DOAJ explained.

The value of open access content

Internet Archive founder Brewster Kahle warned that examining blind spots in how open-access journals of the past were preserved should not suggest that commercial publishers are better equipped to manage preservation than publishers in the past. free access.

He mentioned successful open access initiatives like PLOS, a multidisciplinary nonprofit publisher founded in 2001, or arXiv, an open archives and free distribution service run by Cornell University and launched in 1991.

“These types are designed to be archived, they are designed to be retrieved and used for new types of research,” Kahle told CNN.

The importance of archiving content goes beyond preservation, Kahle explained.

“When you can put these materials together, you can start doing studies of the body of knowledge. You can do what’s called meta-science, or the science of science,” he said.

Such studies make it possible to detect biases or new models.

“This data mining is just fantastically valuable,” Kahle said. Under non-open access publishing agreements, these types of analyzes can raise issues of copyright infringement.

A work in progress

Even though the principles of open access to scientific information are supported by international organizations such as UNESCO and shared by many members of the scientific community, the transition to a fully open model is still a work in progress.

“The challenge with the transition is to make sure that we end up with the infrastructure so that libraries can coordinate their investments in open content, in the same way that we have all kinds of tools to coordinate our investments in open content. subscriptions or purchased content, ”Ruttenberg said.

At a time when so many people are turning to online resources for their learning, due to the pandemic, the conversation about open access knowledge is all the more relevant.

“Covid and the massive shift to virtual research and learning is a huge demonstration of the need for open access,” Ruttenberg said.