To add some nuance to this conversation, what they are using this for is Channel recommendations, Search results, Autocomplete, and Emoji suggestion and the model(s) they train are specific to your workspace (not shared between workspaces). All of which seem like they could be handled fairly privately using some sort of vector (embeddings) search.
I am not defending Slack, and I can think of number of cases where training on slack messages could go very badly (ie, exposing private conversations, data leakage between workspaces, etc), but I think it helps to understand the context before reacting. Personally, I do think we need better controls over how our data is used and slack should be able to do better than "Email us to opt out".
> the model(s) they train are specific to your workspace (not shared between workspaces)
That's incorrect -- they're stating that they use your "messages, content, and files" to train "global models" that are used across workspaces.
They're also stating that they ensure no private information can leak from workspace to workspace in this way. It's up to you if you're comfortable with that.
From the wording, it sounds like they are conscious of the potential for data leakage and have taken steps to avoid it. It really depends on how they are applying AI/ML. It can be done in a private way if you are thoughtful about how you do it. For example:
Their channel recommendations:
"We use external models (not trained on Slack messages) to evaluate topic similarity, outputting numerical scores. Our global model only makes recommendations based on these numerical scores and non-Customer Data"
Meaning they use a non-slack trained model to generate embeddings for search. Then they apply a recommender system (which is mostly ML not an LLM). This sounds like it can be kept private.
Search results:
"We do this based on historical search results and previous engagements without learning from the underlying text of the search query, result, or proxy"
Again, this is probably a combination of non-slack trained embeddings with machine learning algos based on engagement. This sounds like it can be kept private and team specific.
autocomplete:
"These suggestions are local and sourced from common public message phrases in the user’s workspace."
I would be concerned about private messages being leaked via autocomplete, but if it's based on public messages specific to your team, that should be ok?
Emoji suggestions:
"using the content and sentiment of the message, the historic usage of the emoji [in your team]"
Again, it sounds like they are using models for sentiment analysis (which they probably didn't train themselves and even if they did, don't really leak any training data) and some ML or other algos to pick common emojis specific to your team.
To me these are all standard applications of NLP / ML that have been around for a long time.
The way it's written means this just isn't the case. They _MAY_ use it for what you have mentioned above. They explicitly say "...here are a few examples of improvements..." and "How Slack may use Customer Data" (emph mine). They also... may not? And use it for completely different things that can expose who knows what via prompt hacking.
Agreed, and that is my concern as well that if people get too comfortable with it then companies will keep pushing the bounds of what is acceptable. We will need companies to be transparent about ALL the things they are using our data for.
I am not defending Slack, and I can think of number of cases where training on slack messages could go very badly (ie, exposing private conversations, data leakage between workspaces, etc), but I think it helps to understand the context before reacting. Personally, I do think we need better controls over how our data is used and slack should be able to do better than "Email us to opt out".