It’s frustrating when you’re not understood — especially when you’re trying to speak to Siri, Alexa, or another internet-connected device.

Voice datasets that power voice recognition services are owned by a handful of major companies, and they can wildly underrepresent the voices of non-dominant accents, Black, Indigenous, and other people of color, disabled people and gender marginalised people. In fact, for people speaking other global languages - there may be no datasets at all.

That’s why Mozilla launched Common Voice — the world’s largest public voice database, powered by the voices of volunteer contributors. Our goal is to teach machines how real people speak.

Today, we’re asking you to contribute to Common Voice, but we want you to choose how you’ll do it. Will you donate your voice to one of our Common Voice language datasets? Or will you make a $34 donation to Mozilla to support projects like this to reclaim the internet? (Or both!)

I’d be curious about the privacy concerns, but this might help a lot with underrepresented voice data. It might come down to if someone wants more datasets for their particular voice/language more than the other concerns.

If your language/accent is already well documented, it might not help as much?

  • auf@lemmy.ml
    link
    fedilink
    English
    arrow-up
    60
    ·
    11 months ago

    As long as the actual software would be free and open-sourced, I’m willing to help

    • yo_scottie_oh@lemmy.ml
      link
      fedilink
      English
      arrow-up
      40
      ·
      edit-2
      11 months ago

      The data set is available under the Mozilla Public License v2 through the Common Voice GitHub page. I’m not sure if I’m reading the terms of the license correctly, but I believe it allows commercial use.

      • Otter@lemmy.caOP
        link
        fedilink
        English
        arrow-up
        20
        arrow-down
        1
        ·
        edit-2
        11 months ago

        I think that might be a part of the focus, to push companies into including these underrepresented languages/accents so that the products work for everyone instead of a smaller subset

        Worth considering before contributing

      • auf@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        4
        ·
        11 months ago

        bruh the license sucks. Not gonna do that until they change it

        • sir_reginald@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          11 months ago

          what’s wrong with the MPL? it’s a pretty standard free software license. It’s more permissive than the GPL, but less than the MIT license. The most relevant project using it is Firefox, but you can go to github and find a ton of projects using the MPL as their license.