1. Removing outliers was for making the data easier to analyze as some outliers ...

jrochkind1 · on Aug 12, 2017

The problem is "making the data easier to analyze" may make the analysis invalid. Your response is not increasing my faith that you carefully considered what removing the outliers would do to validity of analysis.

> Not sure how narrowing down the top analysts is a flaw here.

Potentially, because how did you decide what "top analysts" were? If it's using the same methods you used to determine they were successful, it just means analysts that come out of your math come out of your math.

If a 50K people flip 10 coins, one of them might flip 10 heads. It doesn't mean that person is better at flipping heads. We could in fact calculate the chances of one of 50K people flipping ten heads. If I decided it meant that some people really were better at flipping heads, I'd probably be wrong. (Although if I calculated the chances and discovered it was like a one in bazillion chance that even one of 50K people would flip ten heads... I'd probably at least consider that they might be better at flipping heads! But I'd probably run the experiment again. :) )

If I pick the top 100 heads-flippers from my 50K coin flippers, and show that they really are better at flipping heads because they flipped more heads in the same dataset that I used to pick them as the top 100 heads-flippers in the first place --- I haven't really shown that at all. By "narrowing down top analysts", depending on how you did it, it's possible you simply found the analysts who got lucky, while ignoring the ones who didn't.

Statistical analysis is _tricky_.

teraflop · on Aug 12, 2017

If 50,000 people each flip 10 coins, it's actually overwhelmingly likely that someone will get 10 heads. The chance that it doesn't happen is about one in a sextillion (10^21).

tigershark · on Aug 12, 2017

It's a completely different matter. As far as I know analysts are doing their analysis when pricing stocks. They may not be so good or discount all the factors, but analysing the products of some companies, their revenues, returns and other parameters seems extremely different than flipping coins to me. So your analysis has no whatsoever basis given that you are comparing a completely random outcome of some well known physical action to a chaotic system (the market) in which at least the basics influence factors on his constituents are well understood. Or are you suggesting for example that warren buffet is just being lucky for endless decades and you and everyone else know at least how to equate his performance? If is that what you think then please, I would be rather amused to see your performance as an investor compared to him in the course of several decades.

jrochkind1 · on Aug 12, 2017

Well, see, that's the whole deal, investigating _how_ different it is than flipping coins. That's the whole question, really. Starting with the assumption that they _must_ be doing better than chance is not the right place to start in order to analyze if they are or not.

Most statistical analysis is about trying to distinguish meaningful results (implying a repeatable correlation of some kind that means something), from random chance with no meaning. The whole point is you _don't_ start out knowing if the thing you are investigating is random chance or not, if you did, you wouldn't need to analyze it. That's what statistical analysis is for. In part because we humans are really really good at finding patterns and assuming a meaningful correlation when in fact it's just random chance.

The coin example is useful because we all know (or define for the sake of the discussion) that it must be random chance, so any analysis that appeared to say it wasn't is probably in error. And using the same sort of analysis on something where you don't know how much of the effect is due to random chance--is not going to answer the question.

ikeboy · on Aug 12, 2017

Funny you mention Buffet, he's about to win his bet that a set of many hedge funds fail to beat the market over a decade.

tigershark · on Aug 12, 2017

I mention him because apparently for the parent message he is only a coin thrower and he will give us only insights on the percentage of people that can get 10 heads in a row.

ikeboy · on Aug 12, 2017

If someone tried to use Buffet as an example of market beating but did zero statistical analysis to determine how likely it is to be actual skill they would also deserve to be dismissed.

youngprogrammer · on Aug 12, 2017

I probably should have included the outliers when analyzing overall performance but if I recall correctly they did not have a significant effect.

The top analysts were determined by the average performance from one year after their ratings have been made. This isn't the top analysts out of 50,000 it's the top out of 50 or so analyst-rating pairs. There were only 16 or so analysts in total that I looked at. This isn't an instance of survivor bias as your example states. If I were to be more rigorous I could give a statistical test for this.

gnaritas · on Aug 12, 2017

Looking for top performers is always invoking survivor bias. It's a classic data snooping issue where the common sense approach is exactly wrong, but it'll sell a lot of books and it'll convince people who you know what you're doing as a stock analyst when it's just random luck.

youngprogrammer · on Aug 13, 2017

Top 10 performers out of 16 or so analysts in the analysis is not survivor bias.

gnaritas · on Aug 13, 2017

Sure it is, it was survivor bias when you selected the 16.

ikeboy · on Aug 12, 2017

If you're ultimately making a claim that strategy X beats the market, you need to make sure strategy X is something you can implement without time travel.

For 2, it's not just the effect of the release on prices, but could also be the effect of other things on both releases and prices - imagine great news came out which bumped up the price and also caused analysts to upgrade their ratings.

You should use price data from the day after release.

For top analysts, if you only know who's top after looking at their performance, then it's again not a repeatable strategy. Compare: "you can beat the market just by buying the top 10 stocks!"

It's nice to play with data, I'm just laying out some of the reasons these won't work in the "real world", and pointing towards where a future analysis could be improved.

youngprogrammer · on Aug 12, 2017

You're correct that I should have accounted for outliers when measuring performance of this strategy. If I recall correctly, even with the outliers, they did not significantly affect the average performance of analyst ratings. I would have to rerun the numbers though.

The analysis was more about measuring the performance of analysts which is why the price data for before and after the recommendation. For practical purposes of using this strategy, you are right that the price data from days after release would be better.

If the top 10 stocks you picked beat the market and have consistent earnings and dividends over a period of time, would this not be a repeatable strategy?

fourthark · on Aug 12, 2017

Point is, you don't know which were the top ten until after the fact.

youngprogrammer · on Aug 13, 2017

How do you measure top ten if you don't have a ranking system?