The silliness of anonymizing data that’s already wide open in the public aside, if you were to anonymize the usernames you’d end up producing a worse AI because often the literal username of the person in question is significant to the context of what’s being written. Think of all the “relevant username” comments, for example. People make puns about usernames, berate people for having offensive usernames, and so forth. If those usernames were all replaced with anonymized substitutes the AI would be training on nonsense.
The silliness of anonymizing data that’s already wide open in the public aside, if you were to anonymize the usernames you’d end up producing a worse AI because often the literal username of the person in question is significant to the context of what’s being written. Think of all the “relevant username” comments, for example. People make puns about usernames, berate people for having offensive usernames, and so forth. If those usernames were all replaced with anonymized substitutes the AI would be training on nonsense.