> Two things here. First, 90% confidence isn't great, I look for 99% confidence ...

> Two things here. First, 90% confidence isn't great, I look for 99% confidence in running tests.

Why? Why are you so worried about controlling false positives that you're willing to eat a whole bunch of false negatives?*

You're not administering expensive drugs to cancer patients, you're designing a website! If you mistakenly think that green buttons perform better than blue buttons when the actual truth is the null hypothesis that they perform the same, that's not the end of the world.

* and I do mean a whole bunch; in that scenario, moving from alpha=10% to alpha=1% means you increase your false negatives by something like 3x. The power calculations:

    R> power.prop.test(n=20, p1=0.5, p2=0.25, sig.level=0.10)
    ...
              power = 0.4951
    ...
    R>
    R> power.prop.test(n=20, p1=0.5, p2=0.25, sig.level=0.01)
    ...
              power = 0.1646
    ...
    R>
    R> 0.4951/0.1646
    [1] 3.008