Well there is this notion of "calibrating" the score. It is well-known that most humans are bad at estimating calibrated probabilities unassisted. The system could have been designed to accommodate this user-interface difficulty. For example, I am sure there is enough data floating around that you could map a simple 5-choice Likert scale to some calibrated probabilities, without making any assumptions. But instead it is just a raw slider, with nothing marked besides the default 50-50, not really great for input. Even a simple "yes/no" choice (translating to fixed calibrated probabilities around 25%/75%) would probably result in better log-loss scores overall.
If you're suggesting a Likert scale of "very unlikely" "somewhat likely" etc.: no, the data floating around suggests that does not work. People saying "somewhat likely" mean anything between 10 % and 80 %. There's no way to map fuzzy descriptions to calibrated probabilities.
If you're suggesting a Likert scale with alternatives like "10 %", "25 %", "50 %", etc. that are then auto-calibrated against the average human overconfidence (so that an answer of 25 % really means 40 %), then that might work, but what would be the point?
10% for "somewhat likely" wouldn't make any sense, "likely" by itself means >50%. I was proposing to simply label 5 or 7 points on the slider, like 10% as "very unlikely", 50% as "neutral", and 66% as "somewhat likely". I am sure there is a decent-sized study that asked people to predict events on a Likert scale as likely/unlikely and then one could calibrate the mapping from Likert scale to probability points using this study. There is a study showing that people intuitively map Likert scales to a slider https://link.springer.com/article/10.3758/s13423-017-1344-2/... so by properly spacing and positioning the Likert labels, people will at least be somewhat more calibrated than in the absence of any cues.
It does make sense for some people. Some people might say it's somewhat likely Sweden is still not a NATO member at the end of the year – and mean there's a 10 % probability it happens.
That's the problem with these fuzzy labels – the variance between individuals (and even within individuals across time) is huge.