Reddit said in a filing to the Securities and Exchange Commission that its users’ posts are “a valuable source of conversation data and knowledge” that has been and will continue to be an important mechanism for training AI and large language models. The filing also states that the company believes “we are in the early stages of monetizing our user base,” and proceeds to say that it will continue to sell users’ content to companies that want to train LLMs and that it will also begin “increased use of artificial intelligence in our advertising solutions.”

On Wednesday, Reuters reported that Reddit has entered a contract with Google, which will license its content for $60 million a year in order to train Google’s AI models.

  • TheMurphy@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    10 months ago

    Question: Wouldn’t Lemmy instances easy be able to this without many users knowing?

    And would they also be able to sell data from other instances, because they can load data from federated instances?

    • rsuri@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      1
      ·
      10 months ago

      Basically yes, but unlike Reddit which has control over its proprietary network, Lemmy instances would have a hard time locking down access to create artificial scarcity for their data without causing other problems.

        • NeatNit@discuss.tchncs.de
          link
          fedilink
          English
          arrow-up
          2
          ·
          10 months ago

          It’s the whole copyright question. Users own the copyright on their own posts, and it’s the terms of service that are supposed to say what the server and other federated servers are allowed or not allowed to do with them. I don’t even remember if there were terms of service when I joined Lemmy… But assuming there were, and they didn’t explicitly say whether it or federated servers can use user content to train AI, then it becomes a legal question that can only be determined by courts.

          Note that this determination will only apply in the country/state where that court is.

          IANAL

    • markon@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      10 months ago

      I don’t have a problem with anyone scraping what’s already public, I just don’t want anyone to profit off just selling the data I made for them. OpenAI is at least creating useful stuff. All Reddit ever did was be the middleman.

      • TheMurphy@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        Agreed.

        But it’s just a little hypocritical to not use reddit because of this, if it turns out it’s much worse to use Lemmy in this regard.

    • learningduck@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Yeah. Guess some AI companies may have set up an instance already. They won’t even have a rate limit or anything on their own instances.