You wouldn’t steal a Cherwell article: AI and copyright infringement

Hello, Cherwell reader! Think this is a good article? A TikToker probably thinks so too. ‘Korean Consultant’ posted a TikTok on 5^th January 2025 titled “What your university says about you – Russell Group Part 3”. It featured nine universities – each briefly described on a slide – and it stereotyped both the universities and their students. Have a look:

Image taken 26th January 2025.

I received this TikTok on the 6th January from a friend. She was amused, for she had read some of the descriptions before. The ‘Oxford’ slide included quips like, “Either a Moocher that cares more about having first class friends than first class thoughts, or a Pampered Swot wearing a scholar’s gown every night dreaming to be a spy.”

This was suspicious. I had written just a few months earlier about a moocher who cares “more about having first-class friends than first-class thoughts” and a “pampered swot” who wears a scholar’s gown and “probably will become a spy.”

Maybe great minds think alike. But the next point was about someone who claimed to be state-educated, “ignoring their private sixth form and secondary schooling at the best grammar school in the country.” What a coincidence that I had written those exact words too!

My friend immediately recognised that, “whoever made it read your Cherwell article.” Curiously, ‘Korean Consultant’ only cited ‘GPT and online’, not my absolute banger of an article, ‘A comprehensive guide to Oxford student stereotypes’. Yes, I’m bitter.

Why I’m bitter

Firstly, someone had used my writing to potentially make money. Meanwhile, I’m not making any money from my own work.

Secondly, I’m bitter because I didn’t receive credit for my own work. If people are going to enjoy my writing, I’d like them to know its stupendous mastermind. This TikToker clearly knows that creating something is difficult and time-consuming, seeing as they stole my work instead of making their own. Stealing my work brings me neither fame, nor success, nor notoriety – and I didn’t exactly write satires of my friends as Oxford stereotypes because I wanted to fly under the radar. I did it because I am pretentious and somewhat irritating in my desire to be the Next Big Thing (i.e. Giles Coren/Caitlin Moran/Evelyn Waugh/similar). It is unlikely. But it is made even more unlikely when ‘Korean Consultant’ copies my writing, bringing me nothing but anonymity and unpaid work. No thanks.

And I’m not alone in this. Millions of writers are not receiving credit for their works. ‘Korean Consultant’ lists “GPT & online” as its sources, when its real sources are more likely writers just like me.

Using my work without crediting me is a violation of copyright. (OSPL’s (Cherwell’s parent company) legal counsel have issued a takedown request for the video, to which we have received no response.) Violating copyright is a violation of the owner’s rights. In this case the owner is OSPL. OSPL owns the particular sequence the words are in, not the idea.

For example, it is not a violation of copyright to write about poncy students interrogating their peers in Hall, but it is to write “If you want to hide silently in Hall, think again – Mr. Art Historian will slide up next to you and ask how you really feel about the representations of Zelda and F. Scott Fitzgerald”. In this particular example, although the TikToker had altered the order of the words taken from my piece, the content remains recognisable as my original work and some phrases are intact, making it a violation of copyright.

Copyright law

Copyright is an unusual law – and there are caveats, known as ‘fair dealing’ exceptions. Use of protected materials in newspaper reporting, criticism, and education is permitted within reason if the original creator is credited and the material is not used extensively or for profit. But the TikTok can generate profit, violating OSPL’s copyright.

However, AI models also use creators’ works without giving them credit in less obvious ways.

When you prompt an AI model, it generates results by scanning the internet. This might save the time when the alternative is doing a manual search for ‘Oxford student stereotypes’. But AI does not produce its sources or credit individual authors without being prompted to do so, and seems to respond irregularly. For example, when my editor asked ChatGPT, “What are some Oxford student stereotypes? Please cite your sources,” it directly cites my Cherwell article. However, for me, it includes no content from my article and suggests “a 2017 article in The Guardian” “The Oxford Student (2018)”, “The Oxford Mail (2019)” and “The Independent (2019)”.

Unlike the video, ChatGPT can cite specific sources, but only when asked – again leaving the onus on the individual creator to find and cite their sources. It is a search engine that cuts out the middleman: it works by scraping material publicly available and using it to generate synthesised results.

Large language models’ data

But AI models must be trained on something. Large Language Models (LLMs) use creators’ materials in their training process, improving the quality and specificity of results. “GPT” could be responsible for the post not only as a search engine, but as a writer – almost a ghost writer. A good writer must be a good reader.

Baroness Stowell of Beeston, chairwoman of the House of Lords’ Communications and Digital Committee, argued in The Times that tech companies are evading responsibility by training their models, which need “huge amounts of data to work properly”, on copyrighted materials. Tech companies can afford to pay for licences but are instead “simply exploiting rights holders” – such as The New York Times, who are currently suing OpenAI for infringing its copyright by using its material to train their AI model. They contend that OpenAI (the owner of ChatGPT) not only breaches their copyright, but that the use of verbatim NYT content in ChatGPT allows users to access NYT content without a subscription. The lawsuit claims that “the tool is now competing with the newspaper as a trustworthy information source” and will damage subscription revenue.

Some companies are now selling material to AI crawlers for training through licensing, giving them short-term profit on material which might otherwise earn them nothing. AI crawlers explore the internet through a variety of sources, for example websites and databases, both to generate better results for users and to train the model itself.

For example, Lionsgate has sold its whole catalogue of film and TV material to an AI company Runway to be used in training its new AI model. In turn, Lionsgate can use the resulting AI technology in their upcoming projects. Similarly, HarperCollins have made a deal with Microsoft, allowing Microsoft to train its AI models on their non-fiction books. Yet authors do have the opportunity to decline, meaning that authors can retain control over their material. While this may indicate that traditional publishers are selling out to AI, these licenses are an official agreement, showing that it is possible to train AI models without breaching copyright.

Once material has been crawled on and used, there is no going back. HarperCollins’ crawled material will go, claims Richard Osman, into the “large language pool” of “high quality prose” used to train AI models. But what is being done to protect creators?

Fighting back?

Although AI crawlers can be disabled, some are hesitant over fears it could reduce traffic for businesses. Google’s web crawler – which informs its ‘Bard’ chatbot – puts publishers in a difficult position. Businesses may have barred other crawlers from accessing material, but they fear, writes Katie Prescott, that “barring Google’s equivalent […] would disadvantage them in the long term when it comes to making their information findable and accessible on traditional Google.” This pressures businesses to accept AI crawling to retain traffic.

In December 2024, the government opened a consultation on copyright and AI. The consultation intends to establish “how the government can ensure the UK’s legal framework for AI and copyright supports the UK creative industries and AI sector together.” Both industries are vital to the UK economy and the statement makes clear that there must be a balance between protecting creators and supporting AI development.

To address the current uncertainties, the consultation proposes, in short, that AI models can be trained on any material unless the copyright owner reserves their rights. Lisa Nandy, Secretary of State for Culture, Media, and Sport, said that further licensing will in turn allow creators to “secure appropriate payment for their work”. Ultimately, this will give creators more control over their material’s use, allowing them to be paid for their work’s use in training.

Where the burden lies

This puts the responsibility on the copyright holder to declare that they do not want their work used. Yet a government spokesperson from the Intellectual Property Office stated that the consultation “does not propose exempting AI training from copyright law”. They said:

“No move will be made until we have a practical plan that delivers each of our objectives: increased control for right holders to help them license their content, access to high-quality material to train leading AI models in the UK, and more transparency for right holders from AI developers.”

An “exception” allowing AI training on copyrighted content “unless the rights holder has expressly reserved their rights” is “deeply unfair”, writes Owen Meredith, chief executive of the News Media Association. An opt-in system would surely be fairer. Peter Chen, legal counsel to OSPL, suggested instead that “the government should work with industry groups like Creative Commons to establish a new licensing format where artists can decide when and how AI companies can use their work for profit”.

It is already extremely hard for people to protect their copyright against generative AI. Judge McMahon ruled against Raw Story Media, Inc. and AlterNet Media, Inc.’s case against OpenAI for violating copyright due to “lack of standing”. Because AI models synthesise information, rather than copying verbatim, there’s less likely to be evidence of direct plagiarism. The government consultation must address the use of copyrighted work in generative AI and its training, and prioritise individual creators whose work needs to be protected.

I don’t want a random TikToker to be able to steal my writing and get away with it. I want them to take it down – or at least pay me for it. At the outrageously bare minimum, I want to know for certain that the TikToker knows they have stolen it, rather than taken it from an AI generator which will only reveal its deviously acquired sources if begged. I considered asking ChatGPT (or maybe it should be DeepSeek now?) to write this article, but if I had, I know for a fact that it would completely undermine my strongest feeling: that I want everyone to know that my writing was written by me.

You wouldn’t steal a Cherwell article: AI and copyright infringement

Check out our other content

In defence of Oxford’s ugliest architecture

5 top tips to stay toasty and trendy this winter

Abolishing tuition fees would be a middle class cash grab

In defence of Oxford’s ugliest architecture

5 top tips to stay toasty and trendy this winter

Abolishing tuition fees would be a middle class cash grab

Oxford Union would ‘cancel cancel culture’

Much ado about funding: Financing Oxford student theatre

The Goat Review: ‘raw, absurdist, and honest’

Most Popular Articles

In defence of Oxford’s ugliest architecture

5 top tips to stay toasty and trendy this winter

Abolishing tuition fees would be a middle class cash grab

Oxford Union would ‘cancel cancel culture’

Much ado about funding: Financing Oxford student theatre

Support Student Journalism

Explore

Follow us

More