In addition to the possible business threat, forcing OpenAI to identify its use of copyrighted data would expose the company to potential lawsuits. Generative AI systems like ChatGPT and DALL-E are trained using large amounts of data scraped from the web, much of it copyright protected. When companies disclose these data sources it leaves them open to legal challenges. OpenAI rival Stability AI, for example, is currently being sued by stock image maker Getty Images for using its copyrighted data to train its AI image generator.

Aaaaaand there it is. They don’t want to admit how much copyrighted materials they’ve been using.

  • cendawanita@kbin.social
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    @chemical_cutthroat
    Again, all of your analogical effort presumes that an LLM is synthesizing. When I say, specifically, they generate outputs based on statistical probability it’s not at all the same as a sentient process of reiterative learning based on their available knowledge.

    If you can’t get that distinction, then all the effort to respond to you will expect too much from me (personally; I wish the best to others who’d like). If you’re really sincere though, honestly it’s been best elaborated by Timnit Gebru and Emily Bender in their writings about the “stochastic parrot”. Please do have a read. https://dl.acm.org/doi/10.1145/3442188.3445922
    @stopthatgirl7