If you know what you're looking for -- and no, that does not mean em-dashes, they're a low-signal indicator and people get precious about them -- your false positive rate can get pretty low. See Russell et al, "People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text."
Anecdotally speaking, I'm heavily involved in AI cleanup work on Wikipedia. At this point, I can read article text typical of AI, guess within 6-12 months when that edit was made based on differences in LLM output over the years, and usually be right. (e.g., "delve" was common through early 2024 until GPT-4o killed it; I've seen theories that it's not a GPT-4o thing but rather the result of people avoiding the word when it became a meme, but its frequency dropped off hard even in edits that clearly were not reviewed or edited at all.) Given that Wikipedia has accumulated billions of edits over 25+ years, that says a lot.
Anecdotally speaking, I'm heavily involved in AI cleanup work on Wikipedia. At this point, I can read article text typical of AI, guess within 6-12 months when that edit was made based on differences in LLM output over the years, and usually be right. (e.g., "delve" was common through early 2024 until GPT-4o killed it; I've seen theories that it's not a GPT-4o thing but rather the result of people avoiding the word when it became a meme, but its frequency dropped off hard even in edits that clearly were not reviewed or edited at all.) Given that Wikipedia has accumulated billions of edits over 25+ years, that says a lot.