Featured Article : New Certification For Copyright Compliant AI

Following many legal challenges to AI companies about copyrighted content being scraped and used to train their AI models (without consent or payment), a new certification for copyright-compliant AI has been launched.

The Issue

As highlighted in the recent case of the New York Times suing OpenAI over the alleged training of its AI on New York Times articles without permission for free (with the likelihood of a ‘fair use’ claim in defence), how AI companies train their models is now a big issue.

The organisation ‘Fairly Trained’ says that its new Licensed Model certification is intended to highlight this difference between AI companies who scrape data (and claim fair usage) and AI companies who license it, thereby getting permission and pay for training data (i.e. they choose to do so for ethical and legal reasons). As Fairly Trained’s CEO, Ed Newton-Rex says: “You’ve got a bunch of people who want to use licenced models and you’ve got a bunch of people who are providing those. I didn’t see any way of being able to tell them apart”

Fairly Trained says it hopes its certification will “reinforce the principle that rights-holder consent is needed for generative AI training.”

Fairly Trained – The Certification Initiative

The non-profit ‘Fairly Trained’ initiative has introduced a Licensed Model (L) certification for AI providers that can be obtained by (awarded to) any generative AI model that doesn’t use any copyrighted work without a licence.

Who?

Fairly Trained says the certification can go to “any company, organisation, or product that makes generative AI models or services available” and meets certain criteria.

The Criteria

The main criteria for the certification includes:

– The data used for the model(s) must be explicitly provided to the model developer for the purposes of being used as training data, or available under an open license appropriate to the use-case, or in the public domain globally, or fully owned by the model developer.

– There must be a “robust process for conducting due diligence into the training data,” including checks into the rights position of the training data provider.

– There must also be a robust process for keeping records of the training data that was used for each model training.

The Price

In addition to meeting the criteria, AI companies will also have to pay for their certification. The price is based on an organisation’s annual revenue and ranges from $150 submission fee and $500 annual certification fee for an organisation with a $100k annual revenue to a $500 submission fee and $6,000 annual certification fee for an organisation with a $10M annual revenue.

What If The Company Changes Its Training Data Practices?

If an organisation acquires the certification and then changes its data practices afterwards (i.e. it no longer meets the criteria), Fairly Trained says it is up to that organisation to inform Fairly Trained of the change, which suggests that there’s no pro-active checking in place. Fairly Trained does, however, say it reserves the right to withdraw certification without reimbursement if “new information comes to light” that shows an organisation no longer meets the criteria.

None Would Meet The Criteria For Text

Although Fairly Trained accepts that its certification scheme is not an end to the debate over what creator consent should look like, the scheme does appear to have one significant flaw at the moment.

As Fairly Trained’s CEO, Ed Newton-Rex has acknowledged, it’s unlikely that any of the major text generation models could currently get certified because they have been trained upon a large amount of copyrighted work, i.e. even ChatGPT is unlikely to meet the criteria.

The AI companies argue, however, that they have had little choice but to do so because copyright protection seems to cover so many different things including blog and forum posts, photos, code, government documents, and more.

Alternative?

Mr Newton-Rex has been reported as saying he’s hopeful that there will be models (in future) that are trained on a small amount of data and end up being licensed, and that there may also be other alternatives. Examples of some ways AI models could be trained without using copyrighted material (but probably not without consent) include:

– Using open datasets that are explicitly marked for free use, modification, and distribution. These can include government datasets, datasets released by academic institutions, or datasets available through platforms like Kaggle (provided their licenses permit such use).

– Using works that have entered the public domain, meaning copyright no longer applies. This includes many classic literary works, historical documents, and artworks. Generating synthetic data using algorithms. This could include text, images, and other media. Generative models can create new, original images based on certain parameters or styles (but could arguably still allow copyrighted styles to creep in).

– Using crowdsourcing and user contribution, i.e. contributions from users under an open license.

– Using data from sources that have been released under Creative Commons or other licenses that allow for reuse, sometimes with certain conditions (like attribution or non-commercial use).

– Partnering / collaboratiing with artists, musicians, and other creators to generate original content specifically for training the AI. This can also involve contractual agreements where the rights for AI training are clearly defined.

– Using web scraping but with strict filters to only collect data from pages that explicitly indicate the content is freely available or licensed for reuse.

Collaboration and Agreements

Alternatively, AI companies could choose to partner with artists, musicians, and other creators to generate original content (using contractual agreements) specifically for training the AI. Also, they could choose to Enter into agreements with organisations or individuals to use private or proprietary data, ensuring that the terms of use permit AI training.

What Does This Mean For Your Business?

It’s possible to see both sides of the argument to a degree. For example, so many things are copyrighted and AI companies such as OpenAI with ChatGPT wouldn’t have been able to create and get a reasonable generative AI chatbot out there if it had to get consent from everyone for everything and pay for all the licenses needed.

On the other hand, it’s understandable that creatives such as artists or journalistic sources such as the New York Times are angry that their output may have been used for free (with no permission) to train an LLM and thereby create the source of its value that it may then charge users for. Although the idea of providing a way to differentiate between AI companies that had paid and acquired permission (i.e. acted ethically for their training content sounds like a fair idea), the fact that the LLMs from the main AI companies (including ChatGPT) may not even meet the criteria does make it sound a little self-defeating and potentially not that useful for the time being.

Also, some would say that relying upon companies to admit when they may have changed their AI training practices and potentially lose the certification they’ve paid for (when Fairly Trained isn’t checking anyway) may also sound as though this may not work. All that said, there are other possible alternatives (as mentioned above) that could require consent and organisations working together that could result in useful, trained LLMs without copyright headaches.

Although the Fairly Trained scheme sounds reasonable, Fairly Trained admits that it’s not a definitive answer to the problem. It’s probably more likely that the outcomes of the many lawsuits will help shape how AI companies act as regards training their LLMs in the near future.

Recent Blog Posts

Top 6 Cybersecurity Threats Facing Sheffield Businesses (and How to Prevent Them)

In the past year, 43% of UK businesses reported experiencing a cyber breach ...

Windows 10 End of Life: What Does this Mean for You?

As the countdown to October 14, 2025 continues, the end of support for ...

What Do IT Support Specialists Do? And How To Choose One

What is an IT support specialist? Technology is at the core of every ...

Client Testimonials

Stuart B.

10:16 10 Sep 24

Flyford have helped out IT throughout our growth. So, matching our systems to not only what we need now, but what we will be needing in the future; future proofing.
They just make it all easy, and take the stress out of IT for us.

Xanthe S.

12:12 06 Jul 22

We would highly recommend Flyford’s services. They are always on hand for help and advice, nothing is ever too much trouble. All the guys are knowledgable, helpful and friendly. You can’t want much more from a company! Many thanks from us all at Green Mile Trees.

Alan G.

18:08 23 May 22

I messed up my Dell laptop downloading non standard updates and lost the inbuilt system update. I tried for a week to rectify my mistake and couldn’t. Then I installed BT Cloud and that wouldn’t work either. John from Flyford sorted it out within 30 minutes. Big thanks to everyone, highly recommended.👍

Hanicks L.

11:34 26 Mar 22

Excellent support

Stephanie M.

15:33 10 Feb 22

Flyford run the IT for our accountancy firm in Retford. We rely heavily on IT for our business and they are always efficient dealing with our requests and keeping us up and running at all times. They also help us forward plan for our growing needs, keeping in mind budgets and working to our time frames

City Lets Lincoln City Let's L.

21:02 22 Jan 19

The team at Flyford are great for our business based in Lincoln. We are only a small company, so its great to know we have help with our computers should we need it. We have recommended Flyford to other business’s in the area.

Areas we cover

We provide it support, telephony and it managed services to the following locations and their surrounding areas:

Doncaster, Sheffield , Tickhill , Maltby , Rotherham, Swallownest , Barnsley , Lincoln, Nottingham, Worksop, Retford, Newark, Harworth, Edwinstowe, Barlborough

Freqently Asked
Questions:

If you have any further questions please feel free to contact us

What services does Flyford offer?

We offer a wide range of services, including IT computer support, managed services IT support, IT consultancy, cybersecurity, and more. Whether you’re looking for company IT support or help with specific issues, we’ve got you covered.

Why choose Flyford for your IT Support?

We pride ourselves on providing tailored IT support solutions for businesses of all sizes. Our expert team delivers high-quality, 24/7 IT support, ensuring that your systems are always running smoothly.

How can IT support benefit my business?

IT support is essential for increasing productivity, safeguarding your valuable data, and reducing downtime, helping your business run smoothly and efficiently. With reliable IT support in place, you can focus on your core operations while knowing your systems are secure, reliable, and performing at their best. Additionally, you’ll have the peace of mind that expert assistance is always on hand whenever you need it.

What types of IT support are available?

There are different levels of IT support, including remote IT support, 24/7 IT support, and on-site assistance. Services range from basic troubleshooting to full IT management support.

What is IT support, and why do I need it?

IT support provides assistance for managing and troubleshooting technology. It’s essential for businesses that rely on technology to ensure smooth operations and reduce downtime.

Featured Article : New Certification For Copyright Compliant AI

Table of Contents

2nd Line / 3rd Line IT Support Engineer