AI and scholarly publishers: the shrinking visibility of closed access abstracts

This blog post from Aaron Taylor highlights a concerning trend emerging in scholarly publication. Publishers withdrawing closed-access journal articles from public view, to save them from being harvested as training data for generative AI. The Petrol Tank for AI Discovery Might be Running Dry as Publishers close access to scholarly content such as abstracts due to AI incentives On the other hand,  publishers are granting licenses that allow their scholarly content to be used as training data for AI language models.  Ithika S+R have been attempting to document  this on their website : Generative AI Licensing Agreement Tracker – Ithaka S+R

 

AI and Open research: a brief overview

AI presents possibilities and uncertainty within Open Research. There is a growing need to address its impact on Research process and dissemination. This is a snapshot of my thoughts about AI as an Open Research librarian. Majority of these are supported from conferences, events, articles and blog.

Opportunities

Accessibility – AI tools can support the researcher with brainstorming ideas, drafting summaries, reviewing manuscripts, organising papers, and creating keywords (Lo, 2025; Spring, 2025). They can support authors with grammar checks and writing (Prevatt-Goldstein et al. 2025; Lo, 2025). Note-taking, subtitles, and transcriptions are easily processed with AI tools. Accessibility programs such as ClaroRead or MindMap existed before the 2022 Generative AI explosion. These are often costly and Generative AI are currently free to use, with limited capabilities.

Efficiency – AI tools have gained popularity because they save time and costs. As previously outlined, they can support with writing, often a difficult stage for many researchers. They can also support with some research processes. There are AI applications designed to conduct Literature reviews such as Jenni AI and Elicit (Spring, 2025). Software such as Pico Portal can support with systematic reviews which are often time intensive process (British Library, 2025). Library Discovery platforms are incorporating AI search enabling users to conduct searches in a conversational tone “Find papers on the impact of AI in research”, simplifying the literature review process. AI tools can also perform near impossible tasks such as the deciphering ancient papyrus scrolls, see the Herculaneum papyri (Smithson,2025) . There are many cases of AI used to automate costly processes including the National Library of Norway’s AI-Lab, see their current projects link here.

Dissemination – With English being the Lingua Franca in academia we have non-English research often overlooked particularly from the global majority. AI applications narrow that gap by aiding with translation (Prevatt-Goldstein et al. 2025). However, the translation requires oversight from a person familiar with both languages, especially when using Generative AI. There are tools that convert texts to podcasts making research accessible to those new to the subject area (Google’s NotebookLM).  Generative AI can support with creating marketing posts for social media, and in some cases generating images.

Challenges

There are extensive issues related to AI covered by the Scholarly Kitchen blogpost Artificial Intelligence Archives – The Scholarly Kitchen, so I will cover the basics.

Errors in outputs – This particularly applies to Generative AI, trained on digital data from the internet which can be biased and inaccurate. It’s a predictive model, which guesses references opposed to citing them (Smithson, 2025). Moreover, there have been anecdotal reports on how AI hallucinates, that is generate non-existent papers. Over-reliance without oversight with AI tools can be dangerous. AI powered literature search tools don’t always find the best articles (Bjelobaba et al, 2025). Reproducibility in research is challenging with AI (Smithson, 2025). A solution would be to record date, prompts, and AI applications used during research (ibid).

Privacy and copyright- The major ethical issues concerning Generative AI are privacy and copyright. There are ongoing privacy violations and copyright cases occurring daily concerning AI companies. Professor Casey Feisler has been tracking this since May 2023, see her Google Doc AI Ethics & Policy News.  Data scrapers for AI companies indiscriminately pull datasets from the internet, however the source code revealing how its processed remain closed (Decker, 2025). This is also known as ‘The black box problem’ (Prevatt-Goldstein et al. 2025). Commercial AI companies fail to fairly compensate the creators and owners of the data (Smithson, 2025). Discussion on Authorship and AI  was explored extensively in the Open Science festival Seminar ‘AI and Authorship’ summarised in the  UCL Blog post. It’s advisable for researchers to review funder policies before utilising AI tools. Even if they are broadly framed. Furthermore, ensure the data you feed AI isn’t violating privacy and copyright laws.

Digital divide- The aim of Open research based on making research accessible with few restrictions as possible, therefore more impactful, collaborative and transparent. Generative AI products often start with a free subscription to draw customers to attract investments. As their models advance, they are likely to impose costly subscriptions. HEPI reports socio-economic gap between AI users and non-AI users amongst the students (1,041 full time undergraduates). Big publishers have the financial capability to integrate AI systems therefore “affecting the diversity of the publishing ecosystem” (Lo, 2025). This also applies to Higher Educational institutions. Financially well-resourced universities can afford full access to sophisticated AI tools that could aid in research, whereas less-resourced institutions utilise the free, limited versions.

AI use is also unsustainable with data processing centres cooling systems use of water and affecting local electrical grids. There are ethical conundrums concerning AI tools in production, processing and use. It seems the benefits outweigh the drawbacks. Since 2024 32% more HEI student use AI tools (Freeman, 2025). AI applications provide time and cost saving solutions, particularly for academic researchers. Even I, as a playful sceptic, can’t deny the efficiency of using AI powered tools. CoPilot aided me with this blog by finding synonyms, alternative phrases and coined ‘playful sceptic’ (has a better ring than ‘enthusiastic cynic’ – my original idea).

I think viewing AI applications as tools, and not wholly reliable ones is the best approach. Human expertise on the research topic are essential. Critically evaluate references and don’t not solely source papers from AI powered literature searching applications. There are many more tips which I will explore in the next blog post about AI and research applications.

Co-pilot was used in developing key points, rephrasing some sentences, and finding synonyms in this blogpost.

Reference

 

An open relationship with Open Science

This is a guest post by Mark Coulson, Professor of Psychology at the School of Human and Social Sciences.

I’ve had a longstanding love affair with open science, but it took time for us to properly consummate our relationship. My first pair of open science ‘badges’ from a journal published by an American Psychological Association journal was exciting as it proclaimed both Open Data and Open Materials. Neither my data nor my materials seemed to generate much interest, and the paper’s citation count has made no discernible impact on my h-index. I still like the badges, though.

One nice thing about open science is that you can dip your foot in it. Open Methods are easy – set up an online project on a repository (I use osf.io) and attach your method (I exported an online Qualtrics survey). As anyone can access and download this, you have effectively gifted the community your methodology.

Open data are a little more tricky, principally due to the importance of full anonymization of participants, but also the fact that complex data files can end up impenetrable even to those who have created them (note: this might just be me). Still, going for open data builds confidence, tells the research community you’ve done a good job, that you are prepared to show your workings to the world, and who knows, someone might find something interesting in there that you didn’t, and want to collaborate, and publish more, and do more open science.

But then there’s the inescapable feeling that this is all a bit narcissistic – that open science is rather ‘look at me!’ A sort of Instagram-with-data. Or the even more grandiose version of narcissism which holds that your ideas are so brilliant that others will immediately steal them. (Don’t worry, they won’t. You’re not that good. And in any case, if you’re really worried you can always embargo your projects).

And more pessimistically, the nihilist in me knows that in fact the universe doesn’t care, nobody cares, no one looks at these things, and even blatantly flippant examples fail to leave a ripple on the surface of scientific discourse.

Still, the hardest part for me, and what really interfered with a full, committed relationship, was pre-registration – the specification, before the data are collected, of exactly how you will analyse them.

I freely admit that I was educated in a pre-open science environment. Statistics classes then (and I fear they haven’t changed much) involved learning straightforward rules for traversing the epistemological tundra between hypothesis and statistical test. It was often suggested that we decide which tests to use before running them, but statistical packages present so many options that the temptation is to try all of them. Just to see. And then decide which one is ‘best’.

So this love affair threatened to be over before it had properly begun. I’d had a taste, and it was delightful, but the fear of rejection, or even worse of not being noticed, weighed heavily, and the lifelong habits of freewheeling through the contents of Tabachnik and Fiddell until I found just the right tool to get just the right results were hard to shake off.

And then I read a few things, many of which stuck in my mind. I read Jacob Cohen’s seminal work on power analysis, John Ioannidis’ explanation of Why Most Published Research Findings are False, and the magnificent statement from the American Statistical Association (who definitely know what they are talking about) on why unplanned and non-transparent reporting of statistical tests (and in particular p values) makes many findings ‘uninterpretable.’

Which is when love reared its head again. Well, sort of.

Once upon a time I published a paper about people having emotional attachments to digital characters in a video game. It was a small study, fun to carry out, and was cited by more people than I generally get cited by, which is a Good Thing. That was in 2012. Skip forward a dozen years, and the sweet innocent relationships of video games passe have developed into complex, branching, polyamorous, non-binary and quite magnificent side events embedded in the normal video game tropes of killing things, flying things, amassing loot, saving universes and fulfilling prophesies. If you’re interested in the kind of things digital characters get up to in the 2020s, there’s a compilation of encounters from one game on YouTube that is over 2 hours long and very much NSFW.

So, I developed my survey (with a big thank you to a generous internship paid for by the university) and lodged it on osf. I obtained ethical approval. With a finger hovering over the button which would launch my survey out into an expectant world, I felt the excitement of data collection, analysis, discovery, publication.

And then the whisper. Pre-registration. It reminded me I preach but don’t practice. I’ve got the other badges, but I’m missing the big one. Can I actually make these decisions before rather than after I collect the data? There are some truly intimidating and brilliant examples of pre-registered studies out there. But then there are plenty which are not [link deleted]. And, as the preaching typically goes, you do eventually have to make these decisions, so why not prior to the event? And finally, listening to the whisper, remembering the wise words of others, and perhaps deciding to rid myself of my own hypocrisy, I went all in. Pre-registration, consummation.

Okay then, go judge for yourself, because in the final analysis (sic) that’s what it’s all about. The data are in, I am about to start my pre-registered analysis, and am both excited and scared about where things will go. If you do take a look at my efforts, and spot an error, please let me know, but preferably before it gets published.

Professor Mark Coulson, School of Human and Social Sciences.

 

 

 

Reflections from the webinar ‘Libraries supporting Open Research: international perspectives’

Have you ever wondered how Open Research is transforming HE institutions research structures? I recently attended the webinar ‘Libraries Supporting Open Research: International Perspectives’, the first session of Open Divide Lecture series 2025/2026

Rebecca Bryant, senior programme officer at OCLC research presented her paper summarising key points raised from a recent roundtable discussion facilitated by OCLC Research Library partnership. 30 institutions from Australia, UK and USA attended. To me, the countries aren’t varied enough to warrant ‘international perspective’, but there were a variety of approaches.

Key themes that captured my interest were:

  • UKRI mandate has been key to driving UK institutions to support Open Research.
  • There has been slow uptake from researchers to publish Open Access.
  • Some UK institutions don’t have Open Access embedded within their policies.
  • University of Manchester launching Office for Open Research (OOR) highlighting the significance of OR in Research culture.
  • AI could be used for metadata enrichment within repositories.
  • There are concerns about aggressive AI crawlers mining data in repositories can resolved by restricting machine access.
  • There is strong need for academic libraries to collaborate with other departments within their respective institutions.

Overall, I thought the webinar was informative and well-presented. The case studies featured illustrated a range of strategies taken to embed Open Research practices within institutions.  REF Open Access policy and UKRI has propelled HEI to prioritise Open Research and academic Libraries face challenges on how they are being perceived in universities. This can be remedied by fostering collaborations between libraries Open Research services and Research centres.

There is more information about this on the report co-authored by the presenter Social Interoperability in Research Support

Early Decisions on REF 2029 Open Access

Research England have modified their proposals in two key areas, accourding to their latest communication

Longform now no longer to be included. After suggesting in their consultation document that Open Access rules would be applied to books and book chapters for REF 2029, they have bowed to pressure and agreed ‘the broad set of challenges currently facing the sector’ make this impractical for now.

The 2021 regulations for articles and conference papers (that have found a home on a ‘Proceedings’ with an ISSN) will continue to apply till Jan 2026 now, not 2025. Promised tightening of allowances in the draft 2029 regulations re: embargoes applied by journal publishers, won’t have to be dealt with till then.


Blog Shoutout: Data Colada

 

This week, we want to shout out another great blog that looks at research integrity. Data Colada has been around for over a decade and has highlighted major issues around reproducibility in the social sciences. The blog’s creators are behavioral scientists so their takes are particularly interesting! They’re credited with coining the term ‘p-hacking’ to describe misuse of data analysis, to give you an idea of their impact.

They’ve made a number of notable findings over the past 11 years and continue to call out bad practice through their regular blog posts and advocacy. Check them out and add them to your subscriptions!

Reproducibility Crisis and Open Research

In scholarship, there’s been increasing conversation over the past decade about a ‘Reproducibility crisis.’ Empirical findings in notable research papers have not been able to be replicated in follow-up studies. In one telling 2008 study from the Reproducibility Project, only 39 out of 100 studies published in prominent psychology journals were successfully replicated.

Way back in 2005, John Ioannidis published a prescient article titled ‘Why Most Published Research Findings Are False.’ He summarised his findings as follows:

Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias.

Sadly, the situation seems even worse nearly two decades later. Just today, Retraction Watch have reported on a record number of over 200 retractions from one researcher, relaying that: “Joachim Boldt, has now been credited with 210 retractions – making him the first author (to our knowledge) with more than 200 retractions to his name.” His papers have been retracted on the basis of “suspicious data.”

What can be done? Naturally on this blog, we advocate adopting Open Research Practices. As we concluded in our previous post about Ethics in Research, ‘Essentially, good practice in Open Research amounts to communicating findings accurately and honestly and properly acknowledging the works of others.’ Open Research is a necessary counterbalance to the reproducibility crisis. On our UWL webpages, we’ve listed 7 Open Research Steps that should help keep your works open, transparent and reproducible, thereby enhancing academic credibility. They’re condensed slightly below but check out the webpage linked above for further links and resources:

  1. ‘Pre-register’ your finalised research design, either by uploading it to the UWL repository or elsewhere. Consider publishing your finalised design as a ‘registered report’ in a relevant journal. Journals which will accept Registered Reports will agree to publish a final paper, with ‘significant’ outcomes or not, if you follow the plan.
  2. Complete a Data Management Plan which includes how you are going to make your data sets available through a data repository, confidentiality permitting.
  3. Make sure your datasets are structured and labelled so that they are ‘Findable, Accessible, Interoperable and Reuseable’.
  4. Structure your papers so that titles and abstracts are illuminating, and keywords are prominent thereby making your paper easily discoverable through databases.
  5. Consider uploading a pre-print (a paper yet to be peer-reviewed) to a pre-print server and asking for feedback.
  6. Make sure your published papers are open access either through the UWL repository or through the expanding opportunities to make your paper open ‘in situ’ in the journal’s own website.
  7. Engage with Open Peer Review and teach Open Research practices if you are a teacher or research supervisor.

 

UKSG annual Conference 8th-10th April 2024

This year, UKSG’s annual conference in Glasgow was heavily focused around issues to do with open research. There were three key open research-based themes that produced some interesting discussion points during the various talks and workshop sessions over the three day conference; Research Integrity, Transitional Agreements, and Predatory Publishing. For this post, I’ve collated 3 essential insights gathered under each of the headings to keep it short and to-the-point!

 

Research Integrity

  • Now more than ever, there’s a need to develop and implement relevant procedures and policies, and to start developing and delivering training around issues related to potential breaches of integrity.
  • This is related to the incredible growth in retractions over the past two decades; from 40 retractions recorded in the year 2000 to over 13,000 in 2023.  The total number of retractions has been surpassed each year since they began to be recorded.
  • It’s crucial for academics to have good quality data sources as a counterbalance to pressures that break research integrity; the pressures of the evaluation system; challenges in the peer review landscape and the aims of nefarious external actors.

Transformative agreements

  • Overall good news on the Transformative agreement front from JISC! The 2022 Jisc Transformtive agreements (excluding Springer Nature) have delivered actual cost savings of £16.7m to subscribing institutions in the first year of the agreement when compared to expenditure in the preceding year.
  • As an early adopter of transitional agreements, the UK appears to be transitioning to open access more effectively than the global average. In 2022, the number of UK open access articles was 4% higher than the global average (UK: 50%; global 46%).
  • There has been a steady decline in the number of UK Green-only articles – around 4% over each of the last four years. This is a more exaggerated version of the global trend.

Predatory Publishers

  • There was an interesting talk from the head of the quality team at the Directory of Open Access Journals, Dr. Cenyu Shen, who works to prevent DOAJ from having questionable journals indexed. It’s no mean feat, they have over 2000 journals on there. Last year, they ran 409 investigations which took around 800-man hours, surpassing any of the past years records.
  • Presentations and discussions advocated for more nuance in thinking about issues to do with predatory publishers. It’s better to think of publishers as being on a kind of ‘predation spectrum’ as opposed to a binary ‘predatory-or-not’ understanding. to put this into context, issues can range from a small slip in editorial processing standards to purposely falsifying impact factor scores.
  • One question persists: Should citations be such a coin of the realm? There was no doubt that the mounting pressures of ‘publish or perish’ culture is largely to blame for the rise of (increasingly sophisticated) predatory publishers.

Fraudulent Publisher and Journal Sites

Recently, we posted about the well-known issue of predatory publishers in academia. Increasingly, journals and publishers have to also contend with fraudulent/hijacked/copied websites. This is a breed of scam website that’s fast becoming endemic in the scholarly communications world. Bad actors set up a fake website using the branding and visuals of a legitimate publication and charge an article processing fee in exchange for speedy publication under false pretenses.

Last week, Liverpool University Press posted an informative post that serves as a cautionary tale.

Red flags first came about late last year, with authors raising concerns about incorrect Scopus listings and asking about APC charges for this Open Access publication, having come across a fraudulent version of one of their journal’s homepages.

This was a clone of the journal’s own homepage – containing the journal branding, requesting (and accepting) submissions, and displaying content. The content seemed to be nonsense, and had not been taken from our site, but the site was very convincing. Our concern was not that genuine IDPR content was being scraped, but that someone posing as the editor of the journal was accepting papers and liaising with authors.

The blog post from LUP goes on to detail how they have tried to tackle the fraudulent site and have it taken down (spoiler: with great difficulty). The TLDR takeaways are:

  • Think.Check.Submit. continues to be an invaluable resource in the world of scholarly communications, if in doubt run a title search on the site.
  • Our faves at Retraction Watch also run a Hi-jacked Journal Checker.
  • As ever, contact the Open Research team if you’re unsure about anything– we’re always happy to help! Open.research@UWL.ac.uk

Be wary out there!