gherard5555 19 hours ago

If big companies are allowed to pirate content to train their models, so do I. I have to train my brain after all

  • marcuskane2 10 hours ago

    I think you wrote this to be snarky, but it's just literally true.

    You are allowed to read books from the library and remember what they say, and use that information to inform your own future writing or speaking or actions.

    You are allowed to listen to copyrighted music and learn from it. You can even play songs from The Beatles or Metallica in your garage as training.

    You absolutely do have the right to "train your brain" on copyrighted material. Copyright restricts who is allowed to publish the work, not who is allowed to consume it.

    • pull_my_finger 10 hours ago

      Ok but you're not "remembering what they say", you're creating a "derivative work" by literally just tokenizing/vectorizing (I'm not a data scientist or AI expert) the words as they appear exactly. AI doesn't innovate based on works it consumes, and it doesn't understand "concepts" picked up from it. It simply adds the possibility of regurgitating (read plagiarizing) verbatim or part or whole to a list of other possibilities. This is on top of the fact that these parasites didn't even ask to use or purchase the works to begin with, they stole (pirated) them.

      • archontes 9 hours ago

        It's a stretch to call training an AI creating a 'derivative work' by the legal definition.

        You could count the words in a book and publish the word count, and while the information is based on the contents of the book, that would fall incredibly short of being a derivative work.

        I suspect they committed whatever copyright violation is committed when they downloaded the copyrighted works. Training an AI on them is simply not related to the protections that copyright offers.

    • lern_too_spel 9 hours ago

      > You are allowed to read books from the library

      You aren't allowed to download a torrent of pirated books as these companies have done and freely distribute it to multiple brains to train on.

      If the brains can then write down the original works from memory, you aren't allowed to make copies of these brains and freely distribute them either.

  • dctoedt 15 hours ago

    > If big companies are allowed to pirate content to train their models, so do I.

    The syntax brought to mind Stephen Colbert's 2007 book, I Am America (And So Can You!)

seydor 18 hours ago

Is watching pirated movies for "educational purposes" also Fair Use? If not , what is the rational argument?

  • marcuskane2 10 hours ago

    It is! Or at least, it can be.

    The exact lines are a bit blurry and subjective, but this is a decent overview: https://www.lib.uchicago.edu/copyrightinfo/fairuse.html

    If you're charging people to watch in a movie theater, you can't pretend that's educational. But if you want to show a video in a classroom that's relevant to the coursework, it's absolutely covered under fair use.

captainbland 19 hours ago

Using copyrighted material for massive profit can be fair use so long as a billionaire's (or billionaire funded) company is doing it.

  • cyanydeez 17 hours ago

    yeah, as long as you can properly compensate the politicians and judges, anything is fair use.

SR2Z 12 hours ago

At the risk of sounding smug, obviously. Whether or not something is fair use has never been related to how it was obtained, only how it's used.

Fair use is an affirmative, after-the-fact defense.

8-prime 18 hours ago

Whilst I'm generally not against piracy - I think there are valid reasons as to why it's a thing - I can't help but feel like disguising pirating content under the veil of fair use is a bit far fetched and disingenuous.

Using copyrighted material in a fair use way seems fine to me and is important. But these companies not wanting to pay for creating their model and then just claiming fair use is silly.

  • Ukv 18 hours ago

    > these companies not wanting to pay for creating their model and then just claiming fair use

    Certainly model developers would prefer not to pay if given the option, but I also feel it's not untruthful to say that it hasn't actually been feasible to license content on the scale required.

    Even just for training an object detection network as a side project, I struggled to find sufficient pre-training material outside of web-scraped datasets like ImageNet. I even contacted Getty and was told directly that they don't license images for machine learning.

    Something like a compulsory licensing scheme where you pay into a pot to train a model could potentially work. Mostly, I hope whatever we eventually get is feasible for open source groups, individual developers, universities, smaller companies, etc. rather than only being made with the few biggest companies in mind.

    • dontlaugh 17 hours ago

      Or, you simply don’t train anything. Not all technology needs to exist.

      • Ukv 17 hours ago

        Large-scale pre-training is not specific to chatbots. There are undeniably a huge range of beneficial uses for machine learning: language translation, video transcription, material/product defect detection, weather forecasting/early warning systems, OCR, spam filtering, protein folding, tumor segmentation, drug discovery and interaction prediction, etc.

        "simply don’t train anything" does not seem ethically (many models have the potential to or already are improving lives), politically (staying in the lead is currently seen as an important issue), or legally (as noted in this brief, "the ultimate test of fair use is whether the copyright law’s goal of ‘promoting the Progress of Science and useful Arts’ ‘would be better served by allowing the use than by preventing it.’") viable to me.

nonrandomstring 18 hours ago

> prominent intellectual property law professors

These guys are part of the problem IMHO. Having studied some law at university and then reading all of Lawrence Lessig's works when he dissected IP law 20 years ago for the Creative Commons project, I was left with the distinct feeling that "IP law" is ugly, unfair, arbitrary, ineffective, crippling to the mind, and devastating to progress and the economy.

I now strongly agree with the likes of Richard Stallman that "intellectual property" is a grotesque and bankrupt mess that should be avoided if you want to have any semblance of a mature conversation.

Actually, people training AI have a point. New technologies expose the silliness of bad ideas we've clung to for three centuries. And the only way forward is to hugely reform or repeal most of IP law for EVERYONE.

The truth is that long before the Statute of Anne "copyright" has its roots in censorship and political control. We used to burn the printers of "seditious works" on pyres of their own books in St James' square in London.

Much of "IP law" still functions the same today but hidden behind a cover story about "protecting creators".

I now mentally substitute the phrases "intellectual property" with "coercive control of information" (CCI).

CCI gets to the nub of power relations instead of pretending we have nice IP "laws" that are applied uniformly. In reality copyright, patents and trademarks have become tools of censorship and denial for those with money, and they do almost nothing to protect individual creators. Things like the DMCA are simply monstrous. If we can't reform and enforce it to actually protect creators it's time to scrap the whole rotten show in my opinion.

  • jkaplowitz 17 hours ago

    Trademarks are not like the rest of what you listed - they’re effectively a consumer protection law to reduce confusion about the origins or endorsers of a commercial product or service, though its protection is not restricted to consumers. A lot of advocates of software freedom and the non-software equivalents are more okay with trademarks than with the rest of what is often called IP.

    Nobody is prevented by trademark law from offering any product or service, including commercially; the trademark holder’s consent or an applicable exception is only needed in order to use the trademarked name/logo in their own offering’s name/logo.

    • nonrandomstring 16 hours ago

      I think that's why it's all the more important that trademarks do not get so casually lumped-in with the ridiculous catch-all term "intellectual property".

      (agreeing with you here and suggesting that people who work with consumer protection marks should distance themselves from the term "IP" entirely)

  • captainbland 13 hours ago

    In my view IP laws in the digital age are basically a look into what would happen under western capitalism if we discovered any other effectively post-scarcity technology. We wouldn't be transported into Star Trek's universe, we would instead be jailed and beaten for using it.

    Only the very wealthiest would be allowed to produce with it, they would form state backed cartels and we would be no better off for its invention.

    • nonrandomstring 11 hours ago

      That's an interesting take. When the environmentally unsustainable and so far unprofitable "AI" gold-rush collapses the world will be left with thousands of "models" kicking about, with widespread ability to run locally. If LLMs do ever provide any genuine utility these will be replicated at near zero cost across the globe. Then we'll see the boot switch to the other foot, more of what you predict.... a vicious and disgracefully hypocritical attempt to contain and erase that data to stop anybody obtaining any value from it. At that point the corporations will be magically all for safety and copyright again!

      • captainbland 7 hours ago

        I think to an extent they're already laying the rhetorical groundwork for all of this, for the day they need it. AI safety, distillation is "unfair". I expect they will try to avoid it being purely about copyright so as to not be seen as hypocrites, but frame it in a more convoluted manner.

hulitu 8 hours ago

> Training AI Using 'Pirated' Content Can Be Fair Use, Law Professors Argue

No conflict of interest here ? Asking for a friend. /s