Dear Creative Commons (@creativecommons.org @creativecommons@mastodon.social @creativecommons@x.com),
Can we have CC-NT licenses for no-training (ML/LLM, GenAI in general), just like we have CC-NC for non-commercial?
My previous post¹ reminded me that I’ve been creating, writing, inventing, and then sharing things with #CreativeCommons (CC) #licenses for a long time (I have to see if I can dig up my first use of CC licenses.)
I’ve used and recommended a variety of CC licenses for decades, e.g.
* CC0 — for standards work, e.g. I drove and wrote up https://wiki.mozilla.org/Standards/licensing (with help from lawyers)
* CC-BY — aforementioned blog post (and other snippets of #openSource)
* CC-BY-NC — photos on Flickr (dozens of which have been used in publications²)
* CC-SA — for CASSIS³, which I still consider experimental enough that I chose "share-alike" to deliberately slow its spread, and hopefully reduce mutations (while allowing ports of its functions to other languages)
So I have some idea of what I’m talking about.
There have been LOTS of discussions of the challenges, downsides, and disagreements with sweeping use of copyrighted content to train generated artificial intelligence AKA #genAI software and services, sometimes also called #machineLearning. The most common examples being Large Language Models AKA #LLM, but also models for generating images and video. Smart, intelligent, and well-intentioned people disagree on who has rights to do what, or even who should do what in this regard.
There have been many proposals for new standards, or updates to existing standards like robots.txt etc. but I have not really seen them make noticeable progress. There are also lots of techniques published that attempt to block the spiders and bots being used to crawl and collect content for GenAI, an arms race that ends up damaging well-established popular uses such as web search engines (or making it harder to build a new one).
The brilliant innovation of Creative Commons was to look at the use-cases and intentions of creators publishing on the web in the 2000s and capture them in a small handful of clear licenses with human readable summaries.
Creatives are clamoring for a simple way to opt-out of their publicly published content from being used to train GenAI. New Creative Commons licenses solve this.
This seems like an obvious thing to me. If you can write a license that forbids “commercial use”, then you should be able to write a license that forbids use in “training models”, which respectful / well-written crawlers should (hopefully) respect, in as much as they respect existing CC licenses.
I saw that Creative Commons published a position paper⁴ for for an IETF workshop on this topic, and it unfortunately in my opinion has an overly cautious and pessimistic (outright conservative one could say) outlook, one that frankly I believe the founders of Creative Commons (who dared to boldly create something new) would probably be disappointed in.
First, there is no Creative Commons license on the Creative Commons position paper. Why?
Second, there are no names of authors on the Creative Commons position paper. Why?
Lots of people similarly (to the position paper) said the original Creative Commons licenses were a bad idea, or would not be used, or would be ignored, or would otherwise not work as intended. They were wrong.
If I were a lawyer I would fork those existing licenses and produce such “CC-NT” (for “no-training”) variants (though likely prefix them with something else since "CC" means Creative Commons) just to show it could be done, a proof of concept as it were that creators could use.
Or perhaps a few of us could collect funds to pay an intellectual property lawyer to do so, and of course donate all the work produced to the commons, so that Creative Commons (or someone else) could take it, re-use it, build upon it.
Someone needs to take such a bold step, just as Creative Commons itself took a bold step when they dared to create portable re-usable content licenses that any creator could use (a huge innovation at the time, for content, inspired in no doubt by portable re-usable open source licenses⁵).
References:
¹ https://tantek.com/2024/263/t1/20-years-undohtml-css-resets
² https://flickr.com/search/?user_id=tantek&tags=press&view_all=1
³ https://tantek.com/github/cassis
⁴ Creative Commons Position Paper on Preference Signals, https://www.ietf.org/slides/slides-aicontrolws-creative-commons-position-paper-on-preference-signals-00.pdf
⁵ https://en.wikipedia.org/wiki/Comparison_of_free_and_open-source_software_licenses