I use violin plots but a complication is that the shape depends upon the bandwidth hyperparameter of the kernel density estimator that is used inside. The plot can differ a lot for different bandwidth values.
Selection of the 'proper' bandwidth is a classic bias-variance tradeoff problem.
While true, that's not an additional problem compared to box plots which effectively just set the bandwidth to maximum. So IMO they are strictly better.
I agree but so do box plots. I think probably the best thing is violin plots when there's lots of data and bee swarm plots when there isn't. But either are better than box plots.
Selection of the 'proper' bandwidth is a classic bias-variance tradeoff problem.