From: Murphy O'Brien on
If I'm using a test like say lillietest or kstest I can get a p
value. When I google it I get quotes like this.

"If the p-value is less than some significance level, alpha,
(typically practitioners use an alpha of 0.05) then we say that the
result is statistically significant (at the 5% level) - i.e. the
probability of incorrectly rejecting the null hypothesis is less than
5%."

I thought I used to understand this but I don't anymore. Maybe today
is one of my stupid days!

This is the confusing line (Is it all the double negatives perhaps?)

"if p<0.05 the probability of incorrectly rejecting the null
hypothesis is less than 5%."

To translate this one part at a time

If p<0.05 probability of incorrectly rejecting the statement "P is
Normal" is < 5%.

If p<0.05 probability of incorrectly embracing the statement "P is
not Normal" is < 5%.

If p<0.05 probability of being wrong when you say "P is not
Normal" is < 5%.

If p<0.05 probability of being right when you say "P is not
Normal" is > 95%.

So therefore:
I'm 95% certain I'm not wrong by saying P is not normal.

But I always thought that if P <0.05 then its a very good bet that
your data is normal.

Help!
Where have I gone wrong!

Murphy
From: Peter Perkins on
Murphy O'Brien wrote:
> If I'm using a test like say lillietest or kstest I can get a p
> value. When I google it I get quotes like this.
>
> "If the p-value is less than some significance level, alpha,
> (typically practitioners use an alpha of 0.05) then we say that the
> result is statistically significant (at the 5% level) - i.e. the
> probability of incorrectly rejecting the null hypothesis is less than
> 5%."

This quote is either imprecisely worded, or wrong, depending on how harsh you
want to be. The p-value is computed for particular data. The probability of
incorrectly rejecting is a long-run probability, over many repetitions of the
test procedure on different data. A better statement for the Neyman-Pearson
interpretation of hypothesis testing would be something like (leaving out the
preliminaries and ignoring things like discreteness),

"Given a fixed significance level alpha, chosen in advance (typically
practitioners use an alpha of 0.05), you can define a test procedure which has a
probability of incorrectly rejecting a true null hypothesis only alpha%, by
simply computing the p-value (tail area under the null) and rejecting the null
if p<alpha. In that case, we say that a result is statistically significant (at
the 5% level). You have no idea (and are even not allowed to care) whether
you're correct in any particular case, but you know that the probability of
rejecting a true null hypothesis is only 5%, so you're doing well in that sense
in the long run."

A Fisherian might say,

"If the p-value is small, then the observed test statistic falls among a group
of potential outcomes that are extreme and pretty unlikely (you'd only see such
outcomes 5% of the time, say), and so a more plausible explanation of the data
is that the null hypothesis is not true. We then say that the result is
statistically significant (at the 5% level), because you have some strong
evidence against the null (if it were true, then a very unlikely event would
have had to have occurred). If you use this procedure with a fixed cutoff of
5%, then the probability of incorrectly rejecting a null hypothesis over the
long run is 5%, but that's just gravy -- really you care about the evidence (or
lack of) that each particular set of data provides against the null."

These are two different ways of looking at it.

So this

> But I always thought that if P <0.05 then its a very good bet that
> your data is normal.

is backwards. You haven't said what the hypothesis(es) are, but you probably
mean, "the p-value for a test of the null hypothesis of normality is less than
..05", and the appropriate conclusion in that case should be to reject the null.

Hope this helps. MATLAB, by the way, is a perfect platform for understanding
all this stuff experimentally, by simulating data and computing rejection rates,
and so forth. The book Computational Statistics Handbook with MATLAB by
Martinez and Martinez is an excellent place to start.

- Peter Perkins
The MathWorks, Inc.
From: Murphy O'Brien on
Peter Perkins wrote:
>
>
> Murphy O'Brien wrote:
>> If I'm using a test like say lillietest or kstest I can get a p
>> value. When I google it I get quotes like this.
>>
>> "If the p-value is less than some significance level, alpha,
>> (typically practitioners use an alpha of 0.05) then we say that
> the
>> result is statistically significant (at the 5% level) - i.e.
the
>> probability of incorrectly rejecting the null hypothesis is
less
> than
>> 5%."
>
> This quote is either imprecisely worded, or wrong, depending on how
> harsh you
> want to be. The p-value is computed for particular data. The
> probability of
> incorrectly rejecting is a long-run probability, over many
> repetitions of the
> test procedure on different data. A better statement for the
> Neyman-Pearson
> interpretation of hypothesis testing would be something like
> (leaving out the
> preliminaries and ignoring things like discreteness),
>
> "Given a fixed significance level alpha, chosen in advance
> (typically
> practitioners use an alpha of 0.05), you can define a test
> procedure which has a
> probability of incorrectly rejecting a true null hypothesis only
> alpha%, by
> simply computing the p-value (tail area under the null) and
> rejecting the null
> if p<alpha. In that case, we say that a result is statistically
> significant (at
> the 5% level). You have no idea (and are even not allowed to care)
> whether
> you're correct in any particular case, but you know that the
> probability of
> rejecting a true null hypothesis is only 5%, so you're doing well
> in that sense
> in the long run."
>
> A Fisherian might say,
>
> "If the p-value is small, then the observed test statistic falls
> among a group
> of potential outcomes that are extreme and pretty unlikely (you'd
> only see such
> outcomes 5% of the time, say), and so a more plausible explanation
> of the data
> is that the null hypothesis is not true. We then say that the
> result is
> statistically significant (at the 5% level), because you have some
> strong
> evidence against the null (if it were true, then a very unlikely
> event would
> have had to have occurred). If you use this procedure with a fixed
> cutoff of
> 5%, then the probability of incorrectly rejecting a null hypothesis
> over the
> long run is 5%, but that's just gravy -- really you care about the
> evidence (or
> lack of) that each particular set of data provides against the
> null."
>
> These are two different ways of looking at it.
>
> So this
>
>> But I always thought that if P <0.05 then its a very good
bet
> that
>> your data is normal.
>
> is backwards. You haven't said what the hypothesis(es) are, but
> you probably
> mean, "the p-value for a test of the null hypothesis of normality
> is less than
> .05", and the appropriate conclusion in that case should be to
> reject the null.
>
> Hope this helps. MATLAB, by the way, is a perfect platform for
> understanding
> all this stuff experimentally, by simulating data and computing
> rejection rates,
> and so forth. The book Computational Statistics Handbook with
> MATLAB by
> Martinez and Martinez is an excellent place to start.
>
> - Peter Perkins
> The MathWorks, Inc.
>
Peter,

Thanks for helping me understand it. So then my translation was
correct, i.e. assuming, for example, the test is for normality. The
statement (which I find confusing with its double negatives)

"If P<0.05, the probability of incorrectly rejecting a true null
hypothesis is <5%"

means

If p<0.05 the probability of being right when you say "P is not
Normal" is > 95%.

And you only say "It is normal" if p>0.05

So if you want it to be normal you want p to be big. So, for example,
0.50 is really good and 0.02 is bad.

And I was completely wrong in my "I always thought " bit at the end.

OK. I think I have it more or less straight now.

Thanks again.

Murphy
From: Peter Perkins on
Murphy O'Brien wrote:

> If p<0.05 the probability of being right when you say "P is not
> Normal" is > 95%.

Well, not exactly. What I was trying to say was that the probability of
incorrectly rejecting is a long-run property of the test procedure. It is not
affected at all by a particular p-value. If, in a particular trial, your
p-value is less than .05, then yes, your observed data fall in the rejection
region, and that region has been defined to give the test procedure a 5%
probability of false rejection. But that is not the same thing as what you said.

Some of this is angels on the head of a pin. If you reject the null when your
p-value is less than .05, then you have a 5% probability of being wrong _if the
null is indeed the truth_. That's a conditional probability, though, and you
don't know in any particular case if the condition is true. I guess the 5%
probability can also be thought of as a worst case unconditional probability
too, so in that sense, the "> 95%" is correct.

- Peter