• Eager Eagle@lemmy.world
    link
    fedilink
    English
    arrow-up
    25
    ·
    10 months ago

    Good move, but anyone using public data already applies a simple spam filter to reject “dumb” data poisoning. Also, hatred and other negative comments as responses will be penalized in a language model training, so an effective data poisoning takes effort. I’ll just throw some ideas here how poisoning could hypothetically have a tangible negative impact in their results.

    The best one can do in terms of data poisoning is make comments that are not easily discernible from usual comments - both for humans and machines - but are either unhelpful or misleading. This is an “in-distribution” data poisoning attack. To be really effective in having any impact whatsoever for training, they need to be mass applied using different user accounts that also upvote each others’ comments in a way that mimics real user interaction: if applied in a simplistic way, a simple graph analysis on these interactions can highlight these fake accounts as a christmas tree.

    • greenskye@lemm.ee
      link
      fedilink
      English
      arrow-up
      24
      arrow-down
      1
      ·
      edit-2
      10 months ago

      but are either unhelpful or misleading

      Honestly that just sounds like a lot of Reddit users in general

    • Adalast@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      10 months ago

      I was contemplating the merits of botting with the current model with slight vectorization offsets so the data becomes prone to overfitting.

      I would think it would alao work to post using valid, but non-standard syntax so it muddies the n-gram searches.