Six Sigma Quality Resources for Achieving Six Sigma Results
Click To Learn More About PremiumLinks
 Main Site > Main Channel > Discussion Forum Search:
 
 for    
Publications
Marketplace
| iSixSigma
Stuff
| iSixSigma
Blogosphere
| Events
Calendar
| The
Dictionary
| Discussion
Forum
| Find
a Job
| Post
a Job
| Industry
News
| Newsletter
Signup
| Sigma
Calculator
| Online
Surveys
Nominations for iSixSigma Awards! close November 30 – nominate your project/program today!
iSixSigma Magazine Signup
 iSixSigma Live!  
  Live! Home
  2010 Summit & Awards
  2010 Energy Forum
 Free Newsletters!  
  Sign Up Now!
  Manage Subscriptions
  New To Six Sigma?
  Six Sigma Q&A
  Cert. Practice Test
  Problem Solving Wizard
  ISSSP Info
ISSSP Is The Official Six Sigma Society of iSixSigma
 Channels 
  Europe
  Financial Services
  Healthcare
  Military
  Software / IT
 Quality Directory 
  Best Practices
  Certifications/Awards
  Consultants
  Culture Evolution
  Methodologies
  News & Events
  Organizations
  Product/Service Guides
  Statistics & Analysis
  Tools & Templates
  Voice of the Customer
  Free Whitepapers
 Related Topics 
  Innovation
  Outsourcing/Offshoring
  Business Process Mgt
 Quick Access 
  Help
  Search
  Advertise Here
  Article Archives
  Newsletter Archives
 User Feedback 
  Please suggest site
  improvements.
 
  [ larger form ]

Why P-Value >0.05 to Say Data Is Normal?

Search The Forum
  • By Individual Post
  • By Thread
  • Past 1 Day of Posts
  • Past 2 Days of Posts
  • Past Week of Posts
  • By Keyword or Phrase
  • Message: 50035
    Posted by: giriraj
    Posted on: Wednesday, 14th July 2004


    Why should the P-value be greater than 0.05 to conclude that the available data set is normal? What does the 0.05 signify? Please, can anyone help me? No one has given a justifiable answer.


    Message: 50037
    Posted by: rube
    Posted on: Thursday, 15th July 2004

    In your basic statistics class, you have the curve of normal distribution of a normal population. Alpha level is 95% and this is normal, outside of that is 1-alpha or 5%. this 5% (0.05) means, that normal falls within this range, beyond that, would be too rare to be by chance alone and must be by the effect of something: an intervention, a medication, or something like that. A p value >0.05 would be say 0.08 or 0.12 or 0.5. A p value <0.05 means that there  is "statistically significant" difference from one population to the other (say experiment vs control).

    The Public Health standard is a level of 0.05, some researchers, say in pharmaceuticals use a level of 0.01 to be even more sure that anything beyond that, would indicate a statistical difference. But the default is 0.05, when in doubt use 0.05.

    If you need more, I can go to my text books and get more info. I don't understand it because I'm smart, but because I am not smart and have had to struggle with this alot until one Public Health Professor explained it in layman's terms.


    Message: 50042
    Posted by: giriraj
    Posted on: Thursday, 15th July 2004

    Dear rube,

    Thanx for your response. But I could not able to understand still, why in case of normality P>0.05. But understood about the statistical significance, where p<0.05. For better understanding u can help me in explaining in layman's language.

    giriraj


    Message: 50044
    Posted by: bhaven
    Posted on: Thursday, 15th July 2004

    Giriraj,

    When you enter the data say in minitab to check the normality , the data is checked against the standard normal distribution data and if your data is significantly different then the standard normal distribution data then p value will be <0.05 and so your data are not normal. This is how you should see the things

    Bhaven


    Message: 50082
    Posted by: Kirk
    Posted on: Thursday, 15th July 2004

    To put it in layman's terms, you can pick whatever value you want to use as a critical value.  It all depends on how much risk you want to accept in saying a sample is from a normal distribution when it may come from some other type.  The lower the p-value, the less risk of making this error.  Many times I use a P-value of 0.1 as a critical value, but this is just my preference.  Also don't forget about the "fat pencil test" when looking at a normal probability plot.  Even if the test fails the p-value, the data may be "close enough" to a normal distribution to allow you to apply statistical tools that require a normal distribution.

    Kirk


    Message: 50132
    Posted by: Gabriel
    Posted on: Thursday, 15th July 2004

    Maybe this is picky, but the right answer would have been:

    "Why P-Value < 0.05 to Say Data Is NOT Normal?

    Let's start with a simple example not related with normality tests, but related with hypothesis testing.

    Let's say that you have a population of balls, and that you are specially intersted in their diameters. Somehow (never mind how, but you can say it's by divine revelation) you know that the diameters follow a normal distribution with standard deviation = 1mm. About the average, you requested 100mm, but you suspect that the man who sold you the balls may have given you another diameter by mistake.

    Case a) You take one ball and measure its diameter: It is 105.15mm. Assuming that the the population average was actually 100mm, that ball would be at 5.15mm mm from the average. How likely is that you take one ball and find it that far from the average or farther? Given that I know that the population is normally distributed and the standard deviation is 1mm, 99% of the population is within 5.15mm (i.e. ±5.15 sigmas from the average). That means that there is 1% of probability to take only one ball and find it at 5.15mm from the average or farther. To say it in other words, it is very strange that the data I have could have been obtained IF the average was actually 100mm. So I take that I have proven, "beyond any reasonable doubt", that the average is not 100mm. Of course that this conclusion can be wrong. There is that 1% still.

    Case b) I take one ball and its diameter is 100.67mm. It would 0.67mm (0.67 sigmas) from the average IF the average was actaully 100mm. 50% of the population is within ±0.67sigmas from the average and the other 50% is beyond ±0.67 sigmas from the average. That means that finding a ball 0.67 sigmas from the averag or farther is the expected result in 1 out of 2 trials, so the ball I found would be reasonably expectable if the population average was 100. There is no enough evidence to say that the average is not 100mm. Is there evidence to say that the average IS 100mm? ABSOLUTELY NOT! Note that exactly the same conclusions I've just made would have resulted from asking "Is the average 101.34mm?", since 100.67mm is also 0.67mm away from 101.34mm. Nothing in this analysis supports that the average is 100mm over 101.34mm. Further more, even if the sampled ball would have been exactly 100mm that would not be enough to say that the average is 100mm, because for example if the average was 100.67mm a situation such as taking a ball and find it 0.67mm from 100.67mm or farther (for example, 100mm) would be expactable in 1 out of 2 trials.

    Given the hypothesis "average = 100mm", in case a) we say that the p-value is 0.01 (i.e. 1%) and we feel it's fair to say that it is proven, beyond any reasonable doubt, that the population average is not 100mm. In case 2) we say that the p-value is 0.5 (i.e.50%) and we feel we don't have any reason to suspect that the average is not 100mm. Where is the limit? There is not a clear cut-off. p=0.05 is generally accepted. Some times p=0.1 or p=0.01 are used too.

    We have beeen testing some characteristic of the population analyzing data from a sample. Note that we tested whether the population average was 100, but we didn't expect our sample ball to be 100 even if the population average was 100. It is "how far the ball is from 100" that gives as a clue about wether it is likely otr not that the average is 100.

    In a normality test, one does not expect that the sampled data is perfectly normally distributed even if it was taken from a perfectly normal distribution. It is the magnitude of that lack of fit what gives us a clue about wehther it is likely that the population (not the sample dataset) is normal or not. The p-value is the probability to get a dataset as far from the normal shape as the one you got (or farther) IF it actualy belonged to a normal distribution. A small p-value means that if the distribution was normal such a dataset would be unlikely, so one can say that the distribution is not normal (with a risk of being wrong, being the risk the p-value). A large p-value means that if the dataset was normal such a dataset would be likely, what does not mean that the distribution is normal because this dataset could be likely to get from other not normal distributions too (that's why I changed your original question). How small is small and how large is large? As said, the value 0.05 (5%) is a conventional limit, but other values can be used too.


    Message: 50139
    Posted by: Orlando
    Posted on: Thursday, 15th July 2004

    Hi Giriraj

    When checking for normality, just use the fat pencil test.  That is what Douglas Montgomery recommends.  Plot the data on a nornal graph paper, then put a pencil along side the data and see if the pencil covers the data.  That will work all the time.

    There is no perfect normal data.  And there are tons of test for normality.  Your conclusions from these varouis test will depend on many factors, including sample size etc.

    Now if your question is directed more towards of why ".05" and not ".04" or ".03" then everyone who has an opinion about it is right.

    Orlando


    Message: 50176
    Posted by: Hemant Gham
    Posted on: Friday, 16th July 2004

    Giriraj,

    Some problem with posting I guess. Let me know your e-mail if you wish a to have my views. Hope this time when I post it really gets posted.

    Cheers !=============================================

    As for the Normality check, p-value can be thought of as "The probability of claiming that the data is NOT normal when in reality it IS normal".

    You may need to understand this through hypothesis testing. I assume we know alpha and beta risks and hypothesis (mechanisms) too.

    In this case to check for normality of the data, lets state hypotheses. 

    Ho = Null Hypothesis ----- There data IS normal (Status Quo)

    Ha = Alternative Hypothesis ---- There data IS NOT normal (What we are hoping to prove)  

    When we study the data for normality we can make two kinds of errors. The error we normally worry about most is Type I, sometimes called alpha error. Thus when we say that the data is not normal with 95% confidence, we are saying that we have accepted Ha and our probability of being wrong is a = 5%. 

    From where did this 95% come from? There is always the risk of the hypothesis being true but the data makes it look false.

    To minimize this risk we set the confidence level (CI):

    to 95% giving us a 5% chance of being wrong (a = 0.05) OR

    to 99% giving us a 1% chance of being wrong (a = 0.01)  

    What is that which will show me that the data is normal or not normal?

    It is "p-value".

    If the p-value > 0.05 … Then Ho is true and there is no evidence that the data is not normal (Accept Ho)

    If the p-value < 0.05…. Then Ho is false and there is a strong evidence that the data is not normal (Reject Ho )

    1.      The p-value is the observed probability of making a Type I error. If it is more than 0.05 then the probability of being wrong in claiming that the data is not normal is more than the specified allowance given through alpha. That is, we will increase our chances of being wrong every time we go away up from 0.05, when it is not required. In other words, we are increasing our chances of being wrong unnecessarily when we are know we can make mistakes only 5% of the time.

    2.      Unless there is an exception based on engineering judgment, we usually standardize on a Type I error probability of a = 0.05.

    3.      a’ should be specified before the hypothesis test is conducted.

    4.      Thus, any p-value less than 0.05 means we reject the null hypothesis and, tentatively, accept the alternate hypothesis.

    In short, p-value is the smallest level of significance that would lead to rejection of H0 (the data is normal) with the given data.


    Message: 50444
    Posted by: philip
    Posted on: Monday, 19th July 2004

    1) Technically 1-p is the probability that, IF A STATE OF NATURE EXISTS, you will get the result you got. It is NOT the probability that a state of nature exists. So p = 0.05 means that, if the data are distributed in a certain way, you will get a sample equivalent to (NOT identical to) the you got 80% of the time.  

    2) There is no such thing as a test for normality. All normality tests, test for a detectable lack of fit to normality.

    3) As many correspondents in this thread have stated, there is no such thing as normally distributed data - the normal distribution is a mathematical abstraction.

    4) All tests for normality are based on a certain sample size. If you are using a formal test, the algorithm must aggregate the data into cells (just like a histogram) to perform the tests. This usually means a minimum of 5 units per cell. The test then uses chi -square to detect a lack of fit for each class, and checks that there are 5 items remaining in each tail. This means that with 20 data, the goodness of fit test is only applied to the middle 50% of the data (+/- 0.675sigma) and tells you nothing about the area outside this. For different levels of sigma you will require different amounts of data, as follows:

    +/- 1sigma


    Message: 50445
    Posted by: philip
    Posted on: Monday, 19th July 2004

    1) Technically 1-p is the probability that, IF A STATE OF NATURE EXISTS, you will get the result you got. It is NOT the probability that a state of nature exists. So p = 0.05 means that, if the data are distributed in a certain way, you will get a sample equivalent to (NOT identical to) the you got 80% of the time.  

    2) There is no such thing as a test for normality. All normality tests, test for a detectable lack of fit to normality.

    3) As many correspondents in this thread have stated, there is no such thing as normally distributed data - the normal distribution is a mathematical abstraction.

    4) All tests for normality are based on a certain sample size. If you are using a formal test, the algorithm must aggregate the data into cells (just like a histogram) to perform the tests. This usually means a minimum of 5 units per cell. The test then uses chi -square to detect a lack of fit for each class, and checks that there are 5 items remaining in each tail. This means that with 20 data, the goodness of fit test is only applied to the middle 50% of the data (+/- 0.675sigma) and tells you nothing about the area outside this. For different levels of sigma you will require different amounts of data, as follows:

    +/- 1sigma 


    Message: 50446
    Posted by: hajar attaran
    Posted on: Monday, 19th July 2004

    Suppose that X1,X2,…,Xn be iid and T is a  test statistic. Suppose that you like to test H0 versus H1 at confidence level 0.05 (α=0.05)._ It’s arbitrary if you like you can take α=0.01 or α=0.1 and etc._ This means type І error equal 0.05. You know that in hypothesis testing we fixed type І error and try to minimize type ІІ error.

    Suppose the test function is

     

    Φ(x)=1     if T>t

               Γ   if T=t

              0   if T<t45l41191195xe" o:preferrelative="t" o:spt="75" coordsize="21600,21600"> 0 1 0">1">2 1 2">3 21600 pixelWidth">3 21600 pixelHeight">0 0 1">6 1 2">7 21600 pixelWidth">8 21600 0">7 21600 pixelHeight">10 21600 0">

     

     

    You know that  p-value=PH0(T>t)=P(RH0). So when you have α=0.05 when your p-value greater than .05 you can’t reject H0.

    I suggest you to see mathematical books to know the concept of p-value better


    Message: 50447
    Posted by: philip
    Posted on: Monday, 19th July 2004

    1) Technically 1-p is the probability that, IF A STATE OF NATURE EXISTS, you will get the result you got. It is NOT the probability that a state of nature exists. So p = 0.05 means that, if the data are distributed in a certain way, you will get a sample equivalent to (NOT identical to) the you got 80% of the time.  

    2) There is no such thing as a test for normality. All normality tests, test for a detectable lack of fit to normality.

    3) As many correspondents in this thread have stated, there is no such thing as normally distributed data - the normal distribution is a mathematical abstraction, in which the data are totally continuous (not discrete due to measurement granularity) and extends to +/- infinity.

    4) All tests for normality are based on a certain sample size. If you are using a formal test, the algorithm must aggregate the data into cells (just like a histogram) to perform the tests. This usually means a minimum of 5 units per cell. The test then uses chi -square to detect a lack of fit for each class, and checks that there are 5 items remaining in each tail. This means that with 20 data, the goodness of fit test is only applied to the middle 50% of the data (+/- 0.675sigma) and tells you nothing about the area outside this. For different levels of sigma you will require different amounts of data, as follows:

    +/- 1sigma  32

    +/- 2 sigma 220

    +/- 3 sigma 3704

    +/- 4 sigma 157799

    +/- 5 sigma 17415232

    +/- 6 sigma 5049883388

    Graphical methods are a little less precise than this but the concept is the same.

    This means that, with a small sample (20 say) we will not find a detectable lack of fit for almost any collection of data. This is not too much of a problem for most uses "normality" tests as we are interested in the effect on t-tests, ANOVA etc. However, it becomes a serious problem when we are trying to use the tails of the distribution, for example to estimate the areas under the tails of the distribution. This is why a six sigma process can only be described as producing less than 0.1% defective and NOT 2ppb or 3.4ppm etc which is pure fantasy. It also explains why a 6sigma process is actually no better/different to a 4 sigma process.

    Take a look at Don Wheelers book "Normality and the Process Behaviour chart for more info"


    Message: 50481
    Posted by: Enrique
    Posted on: Monday, 19th July 2004

    Giriraj

    I think you have two questions in one.

    The answer to P>0.05 is related to Hypothesis Testing as explained for Gabriel.

    "To Say Data Is Normal" is another questions, but you have the answer, this questions depends on the teorical distributions you are testing your collected data.


    Message: 67034
    Posted by: Joseph Banerjee
    Posted on: Thursday, 31st March 2005

    Hi ,

    I read this question probably a little bit late any how I just wanted to shared something about the bug question " WHY P 0.05 ?

    Why P=0.05?


     

    The standard level of significance used to justify a claim of a statistically significant effect is 0.05. For better or worse, the term statistically significant has become synonymous with PWARNING: Image embedded by poster. 0.05.

    There are many theories and stories to account for the use of P=0.05 to denote statistical significance. All of them trace the practice back to the influence of R.A. Fisher. In 1914, Karl Pearson published his Tables for Statisticians & Biometricians. For each distribution, Pearson gave the value of P for a series of values of the random variable. When Fisher published Statistical Methods for Research Workers (SMRW) in 1925, he included tables that gave the value of the random variable for specially selected values of P. SMRW was a major influence through the 1950s. The same approach was taken for Fisher's Statistical Tables for Biological, Agricultural, and Medical Research, published in 1938 with Frank Yates. Even today, Fisher's tables are widely reproduced in standard statistical texts.

    Fisher's tables were compact. Where Pearson described a distribution in detail, Fisher summarized it in a single line in one of his tables making them more suitable for inclusion in standard reference works*. However, Fisher's tables would change the way the information could be used. While Pearson's tables provide probabilities for a wide range of values of a statistic, Fisher's tables only bracket the probabilities between coarse bounds.

    The impact of Fisher's tables was profound. Through the 1960s, it was standard practice in many fields to report summaries with one star attached to indicate P WARNING: Image embedded by poster. 0.05 and two stars to indicate P WARNING: Image embedded by poster. 0.01, Occasionally, three starts were used to indicate P WARNING: Image embedded by poster. 0.001.

    Still, why should the value 0.05 be adopted as the universally accepted value for statistical significance? Why has this approach to hypothesis testing not been supplanted in the intervening three-quarters of a century?

    It was Fisher who suggested giving 0.05 its special status. Page 44 of the 13th edition of SMRW, describing the standard normal distribution, states

    The value for which P=0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation ought to be considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant. Using this criterion we should be led to follow up a false indication only once in 22 trials, even if the statistics were the only guide available. Small effects will still escape notice if the data are insufficiently numerous to bring them out, but no lowering of the standard of significance would meet this difficulty.

    Similar remarks can be found in Fisher (1926, 504).

    ... it is convenient to draw the line at about the level at which we can say: "Either there is something in the treatment, or a coincidence has occurred such as does not occur more than once in twenty trials."...

    If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 per cent point), or one in a hundred (the 1 per cent point). Personally, the writer prefers to set a low standard of significance at the 5 per cent point, and ignore entirely all results which fail to reach this level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.

    However, Fisher's writings might be described as inconsistent. On page 80 of SMRW, he offers a more flexible approach

    In preparing this table we have borne in mind that in practice we do not want to know the exact value of P for any observed WARNING: Image embedded by poster. 2, but, in the first place, whether or not the observed value is open to suspicion. If P is between .1 and .9 there is certainly no reason to suspect the hypothesis tested. If it is below .02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. Belief in the hypothesis as an accurate representation of the population sampled is confronted by the logical disjunction: Either the hypothesis is untrue, or the value of WARNING: Image embedded by poster. 2 has attained by chance an exceptionally high value. The actual value of P obtainable from the table by interpolation indicates the strength of the evidence against the hypothesis. A value of WARNING: Image embedded by poster. 2 exceeding the 5 per cent. point is seldom to be disregarded.

    These apparent inconsistencies persist when Fisher dealt with specific examples. On page 137 of SMRW, Fisher suggests that values of P slightly less than 0.05 are are not conclusive.

    [T]he results of t shows that P is between .02 and .05.

    The result must be judged significant, though barely so; in view of the data we cannot ignore the possibility that on this field, and in conjunction with the other manures used, nitrate of soda has conserved the fertility better than sulphate of ammonia; the data do not, however, demonstrate this point beyond the possibility of doubt.

    On pages 139-140 of SMRW, Fisher dismisses a value greater than 0.05 but less than 0.10.

    [W]e find...t=1.844 [with 13 df, P = 0.088]. The difference between the regression coefficients, though relatively large, cannot be regarded as significant. There is not sufficient evidence to assert that culture B was growing more rapidly than culture A.

    while in Fisher [19xx, p 516] he is willing pay attention to a value not much different.

    ...P=.089. Thus a larger value of WARNING: Image embedded by poster. 2 would be obtained by chance only 8.9 times in a hundred, from a series of values in random order. There is thus some reason to suspect that the distribution of rainfall in successive years is not wholly fortuitous, but that some slowly changing cause is liable to affect in the same direction the rainfall of a number of consecutive years.

    Yet in the same paper another such value is dismissed!

    [paper 37, p 535] ...P=.093 from Elderton's Table, showing that although there are signs of association among the rainfall distribution values, such association, if it exists, is not strong enough to show up significantly in a series of about 60 values.

    Part of the reason for the apparent inconsistency is the way Fisher viewed P values. When Neyman and Pearson proposed using P values as absolute cutoffs in their style of fixed-level testing, Fisher disagreed strenuously. Fisher viewed P values more as measures of the evidence against a hypotheses, as reflected in the quotation from page 80 of SMRW above and this one from Fisher (1956, p 41-42)

    The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance.

    Still, we continue to use P values nearly as absolute cutoffs but with an eye on rethinking our position for values close to 0.05**. Why have we continued doing things this way? A procedure such as this has an important function as a gatekeeper and filter--it lets signals pass while keeping the noise down. The 0.05 level guarantees the literature will be spared 95% of potential reports of effects where there are none.

    For such procedures to be effective, it is essential ther be a tacit agreement among researchers to use them in the same way. Otherwise, individuals would modify the procedure to suit their own purposes until the procedure became valueless. As Bross (1971) remarks,

    Anyone familiar with certain areas of the scientific literature will be well aware of the need for curtailing language-games. Thus if there were no 5% level firmly established, then some persons would stretch the level to 6% or 7% to prove their point. Soon others would be stretching to 10% and 15% and the jargon would become meaningless. Whereas nowadays a phrase such as statistically significant difference provides some assurance that the results are not merely a manifestation of sampling variation, the phrase would mean very little if everyone played language-games. To be sure, there are always a few folks who fiddle with significance levels--who will switch from two-tailed to one-tailed tests or from one significance test to another in an effort to get positive results. However such gamesmanship is severely frowned upon and is rarely practiced by persons who are native speakers of fact-limited scientific languages--it is the mark of an amateur.

    Bross points out that the continued use of P=0.05 as a convention tells us a good deal about its practical value.

    The continuing usage of the 5% level is indicative of another important practical point: it is a feasible level at which to do research work. In other words, if the 5% level is used, then in most experimental situations it is feasible (though not necessarily easy) to set up a study which will have a fair chance of picking up those effects which are large enough to be of scientific interest. If past experience in actual applications had not shown this feasibility, the convention would not have been useful to scientists and it would not have stayed in their languages. For suppose that the 0.1% level had been proposed. This level is rarely attainable in biomedical experimentation. If it were made a prerequisite for reporting positive results, there would be very little to report. Hence from the standpoint of communication the level would have been of little value and the evolutionary process would have eliminated it.

    The fact that many aspects of statistical practice in this regard have changed gives Bross's argument additional weight. Once (mainframe) computers became available and it was possible to calculate precise P values on demand, standard practice quickly shifted to reporting the P values themselves rather than merely whether or not they were less than 0.05. The value of 0.02 suggested by Fisher as a strong indication that the hypothesis fails to account for the whole of the facts has been replaced by 0.01. However, science has seen fit to continue letting 0.05 retain its special status denoting statistical significance.

    *Fisher may have had additional reasons for developing a new way to table commonly used distribution functions. Jack Good, on page 513 of the discussion section of Bross (1971), says, "Kendall mentioned that Fisher produced the tables of significance levels to save space and to avoid copyright problems with Karl Pearson, whom he disliked."

    **It is worth noting that when researchers worry about P values close to 0.05, they worry about values slightly greater than 0.05 and why they deserve attention nonetheless. I cannot recall published research downplaying P values less than 0.05. Fisher's comment cited above from page 137 of SMRW is a rare exception.

    References

    • Bross IDJ (1971), "Critical Levels, Statistical Language and Scientific Inference," in Godambe VP and Sprott (eds) Foundations of Statistical Inference. Toronto: Holt, Rinehart & Winston of Canada, Ltd.
    • Fisher RA (1956), Statistical Methods and Scientific Inference New York: Hafner
    • Fisher RA (1926), "The Arrangement of Field Experiments," Journal of the Ministry of Agriculture of Great Britain, 33, 503-513.
    • Fisher RA (19xx), "On the Influence of Rainfall on the Yield of Wheat at Rothamstead,"


    Message: 67035
    Posted by: Joseph Banerjee
    Posted on: Thursday, 31st March 2005

    Why P=0.05?

    The standard level of significance used to justify a claim of a statistically significant effect is 0.05. For better or worse, the term statistically significant has become synonymous with PWARNING: Image embedded by poster. 0.05.

    There are many theories and stories to account for the use of P=0.05 to denote statistical significance. All of them trace the practice back to the influence of R.A. Fisher. In 1914, Karl Pearson published his Tables for Statisticians & Biometricians. For each distribution, Pearson gave the value of P for a series of values of the random variable. When Fisher published Statistical Methods for Research Workers (SMRW) in 1925, he included tables that gave the value of the random variable for specially selected values of P. SMRW was a major influence through the 1950s. The same approach was taken for Fisher's Statistical Tables for Biological, Agricultural, and Medical Research, published in 1938 with Frank Yates. Even today, Fisher's tables are widely reproduced in standard statistical texts.

    Fisher's tables were compact. Where Pearson described a distribution in detail, Fisher summarized it in a single line in one of his tables making them more suitable for inclusion in standard reference works*. However, Fisher's tables would change the way the information could be used. While Pearson's tables provide probabilities for a wide range of values of a statistic, Fisher's tables only bracket the probabilities between coarse bounds.

    The impact of Fisher's tables was profound. Through the 1960s, it was standard practice in many fields to report summaries with one star attached to indicate P WARNING: Image embedded by poster. 0.05 and two stars to indicate P WARNING: Image embedded by poster. 0.01, Occasionally, three starts were used to indicate P WARNING: Image embedded by poster. 0.001.

    Still, why should the value 0.05 be adopted as the universally accepted value for statistical significance? Why has this approach to hypothesis testing not been supplanted in the intervening three-quarters of a century?

    It was Fisher who suggested giving 0.05 its special status. Page 44 of the 13th edition of SMRW, describing the standard normal distribution, states

    The value for which P=0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation ought to be considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant. Using this criterion we should be led to follow up a false indication only once in 22 trials, even if the statistics were the only guide available. Small effects will still escape notice if the data are insufficiently numerous to bring them out, but no lowering of the standard of significance would meet this difficulty.

    Similar remarks can be found in Fisher (1926, 504).

    ... it is convenient to draw the line at about the level at which we can say: "Either there is something in the treatment, or a coincidence has occurred such as does not occur more than once in twenty trials."...

    If one in twenty does not seem high enough odds, we may, if we prefer it, draw the line at one in fifty (the 2 per cent point), or one in a hundred (the 1 per cent point). Personally, the writer prefers to set a low standard of significance at the 5 per cent point, and ignore entirely all results which fail to reach this level. A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance.

    However, Fisher's writings might be described as inconsistent. On page 80 of SMRW, he offers a more flexible approach

    In preparing this table we have borne in mind that in practice we do not want to know the exact value of P for any observed WARNING: Image embedded by poster. 2, but, in the first place, whether or not the observed value is open to suspicion. If P is between .1 and .9 there is certainly no reason to suspect the hypothesis tested. If it is below .02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. Belief in the hypothesis as an accurate representation of the population sampled is confronted by the logical disjunction: Either the hypothesis is untrue, or the value of WARNING: Image embedded by poster. 2 has attained by chance an exceptionally high value. The actual value of P obtainable from the table by interpolation indicates the strength of the evidence against the hypothesis. A value of WARNING: Image embedded by poster. 2 exceeding the 5 per cent. point is seldom to be disregarded.

    These apparent inconsistencies persist when Fisher dealt with specific examples. On page 137 of SMRW, Fisher suggests that values of P slightly less than 0.05 are are not conclusive.

    [T]he results of t shows that P is between .02 and .05.

    The result must be judged significant, though barely so; in view of the data we cannot ignore the possibility that on this field, and in conjunction with the other manures used, nitrate of soda has conserved the fertility better than sulphate of ammonia; the data do not, however, demonstrate this point beyond the possibility of doubt.

    On pages 139-140 of SMRW, Fisher dismisses a value greater than 0.05 but less than 0.10.

    [W]e find...t=1.844 [with 13 df, P = 0.088]. The difference between the regression coefficients, though relatively large, cannot be regarded as significant. There is not sufficient evidence to assert that culture B was growing more rapidly than culture A.
    while in Fisher [19xx, p 516] he is willing pay attention to a value not much different.

    ...P=.089. Thus a larger value of WARNING: Image embedded by poster. 2 would be obtained by chance only 8.9 times in a hundred, from a series of values in random order. There is thus some reason to suspect that the distribution of rainfall in successive years is not wholly fortuitous, but that some slowly changing cause is liable to affect in the same direction the rainfall of a number of consecutive years.

    Yet in the same paper another such value is dismissed!
    [paper 37, p 535] ...P=.093 from Elderton's Table, showing that although there are signs of association among the rainfall distribution values, such association, if it exists, is not strong enough to show up significantly in a series of about 60 values.

    Part of the reason for the apparent inconsistency is the way Fisher viewed P values. When Neyman and Pearson proposed using P values as absolute cutoffs in their style of fixed-level testing, Fisher disagreed strenuously. Fisher viewed P values more as measures of the evidence against a hypotheses, as reflected in the quotation from page 80 of SMRW above and this one from Fisher (1956, p 41-42)

    The attempts that have been made to explain the cogency of tests of significance in scientific research, by reference to hypothetical frequencies of possible statements, based on them, being right or wrong, thus seem to miss the essential nature of such tests. A man who "rejects" a hypothesis provisionally, as a matter of habitual practice, when the significance is at the 1% level or higher, will certainly be mistaken in not more than 1% of such decisions. For when the hypothesis is correct he will be mistaken in just 1% of these cases, and when it is incorrect he will never be mistaken in rejection. This inequality statement can therefore be made. However, the calculation is absurdly academic, for in fact no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas. Further, the calculation is based solely on a hypothesis, which, in the light of the evidence, is often not believed to be true at all, so that the actual probability of erroneous decision, supposing such a phrase to have any meaning, may be much less than the frequency specifying the level of significance.

    Still, we continue to use P values nearly as absolute cutoffs but with an eye on rethinking our position for values close to 0.05**. Why have we continued doing things this way? A procedure such as this has an important function as a gatekeeper and filter--it lets signals pass while keeping the noise down. The 0.05 level guarantees the literature will be spared 95% of potential reports of effects where there are none.

    For such procedures to be effective, it is essential ther be a tacit agreement among researchers to use them in the same way. Otherwise, individuals would modify the procedure to suit their own purposes until the procedure became valueless. As Bross (1971) remarks,

    Anyone familiar with certain areas of the scientific literature will be well aware of the need for curtailing language-games. Thus if there were no 5% level firmly established, then some persons would stretch the level to 6% or 7% to prove their point. Soon others would be stretching to 10% and 15% and the jargon would become meaningless. Whereas nowadays a phrase such as statistically significant difference provides some assurance that the results are not merely a manifestation of sampling variation, the phrase would mean very little if everyone played language-games. To be sure, there are always a few folks who fiddle with significance levels--who will switch from two-tailed to one-tailed tests or from one significance test to another in an effort to get positive results. However such gamesmanship is severely frowned upon and is rarely practiced by persons who are native speakers of fact-limited scientific languages--it is the mark of an amateur.

    Bross points out that the continued use of P=0.05 as a convention tells us a good deal about its practical value.

    The continuing usage of the 5% level is indicative of another important practical point: it is a feasible level at which to do research work. In other words, if the 5% level is used, then in most experimental situations it is feasible (though not necessarily easy) to set up a study which will have a fair chance of picking up those effects which are large enough to be of scientific interest. If past experience in actual applications had not shown this feasibility, the convention would not have been useful to scientists and it would not have stayed in their languages. For suppose that the 0.1% level had been proposed. This level is rarely attainable in biomedical experimentation. If it were made a prerequisite for reporting positive results, there would be very little to report. Hence from the standpoint of communication the level would have been of little value and the evolutionary process would have eliminated it.

    The fact that many aspects of statistical practice in this regard have changed gives Bross's argument additional weight. Once (mainframe) computers became available and it was possible to calculate precise P values on demand, standard practice quickly shifted to reporting the P values themselves rather than merely whether or not they were less than 0.05. The value of 0.02 suggested by Fisher as a strong indication that the hypothesis fails to account for the whole of the facts has been replaced by 0.01. However, science has seen fit to continue letting 0.05 retain its special status denoting statistical significance.

    *Fisher may have had additional reasons for developing a new way to table commonly used distribution functions. Jack Good, on page 513 of the discussion section of Bross (1971), says, "Kendall mentioned that Fisher produced the tables of significance levels to save space and to avoid copyright problems with Karl Pearson, whom he disliked."

    **It is worth noting that when researchers worry about P values close to 0.05, they worry about values slightly greater than 0.05 and why they deserve attention nonetheless. I cannot recall published research downplaying P values less than 0.05. Fisher's comment cited above from page 137 of SMRW is a rare exception.

    References

    • Bross IDJ (1971), "Critical Levels, Statistical Language and Scientific Inference," in Godambe VP and Sprott (eds) Foundations of Statistical Inference. Toronto: Holt, Rinehart & Winston of Canada, Ltd.
    • Fisher RA (1956), Statistical Methods and Scientific Inference New York: Hafner
    • Fisher RA (1926), "The Arrangement of Field Experiments," Journal of the Ministry of Agriculture of Great Britain, 33, 503-513.
    • Fisher RA (19xx), "On the Influence of Rainfall on the Yield of Wheat at Rothamstead,"


    Message: 67597
    Posted by: srikanth_boyina2003
    Posted on: Friday, 8th April 2005

    hi

    i am very interested


    Message: 67601
    Posted by: Darth
    Posted on: Friday, 8th April 2005

    Good, then call 555-1212 for a good time.  Ask for Shirley.


    Message: 81411
    Posted by: tiger
    Posted on: Tuesday, 18th October 2005

    is anybody discuss the detailed method to compute p-value?


    Message: 81414
    Posted by: Bill Craig
    Posted on: Tuesday, 18th October 2005

    Instead of a detailed method, you could use the statistical table for the test you are doing. You basically use the calculated test statistic (t, F, etc.), and the degrees of freedom. When you find the corresponding alpha for this combination (ie t-calc, df), this is the p-value.  For a normality test, you want to "fail to reject" the null hypothesis. As the saying goes...if P is low, reject Ho. I would use a histogram, a normal plot, and a statistical test to be sure!


    Message: 82238
    Posted by: Patrick
    Posted on: Friday, 28th October 2005

    Think of the p-value as a measure of "unusualness". 

    0.01 is an event that occurs 1 in 100 times, not a common occurrence; unusual. 0.05 is where the line was drawn in the sand, 0.3 is an event that occurs 30 in 100 times; a common or usual occurrence.

    Anytime the p-value is very small, concern should follow.

    Pat Lindner


    Message: 94221
    Posted by: Amit Sharma
    Posted on: Thursday, 25th May 2006

    Interesting stuff though

    1) Can someone explain the signiificance on p-value diagramatically on a normal distribution chart ?

    2) I have following data : 7 8 10 8 6 9 6 7 7 8 9 8

    Hypothesized population mean = 7, significance level = 0.05 (Ha: Population mean is not= 7)

    If I run a T-test on Minitab I get P-value = 0.056

    What does this P-value mean ? How can I calculate the same P-value without a software ?


    Message: 94239
    Posted by: Darth
    Posted on: Thursday, 25th May 2006

    To answer the question in your Subject line, the null hypothesis is that the distribution is normal.  If the value of p, just common practice, is greater than .05 then you are not able to reject that hypothesis.

    As for your other questions.  Your p value can be obtained by using the appropriate distribution tables whether it be for Z, t, F or Chi Square.  Once you have calculated the appropriate test statistic for your hypothesis test you take that value, considering the necessary requirements for d.f. and look it up in the tables.  You can then figure out the "p-value" which is just the area to the left, right or both of the test statistic value.  In the old days we would determine the theoretical test statistic value for our desired alpha and then compare to the calculated test statistic and decide whether the value was in the "zone of acceptance or rejection."  It's early morning so please forgive any rambling.  Tough week.


    Message: 102437
    Posted by: Tom
    Posted on: Tuesday, 10th October 2006

    "Take a look at Don Wheelers book "Normality and the Process Behaviour chart for more info"  "

    I've just read it ... couldn't put it down ... better than a detective novel ... brilliant.


    Message: 139919
    Posted by: Dr. Mehrdad Jalalianِ
    Posted on: Tuesday, 15th April 2008

    the best answer to why choose 0.05 is the article at this address:

    http://www.tufts.edu/~gdallal/p05.htm

    Regards, Dr. Mehrdad Jalalian. email: drjalalianyahoo.com website: http://www.doctormehrdad.com


    Message: 148857
    Posted by: niamh
    Posted on: Wednesday, 12th November 2008

    what??????????????????????????????????????





    "The Bottom Line" Links

    BEST SELLING PRODUCTS (iSixSigma Publications)
    1. Six Sigma Black Belt (DMAIC) Training Slides - 2009 Version!
      The 2009 Six Sigma Black Belt course includes over 40 more slides than the 2008 version. Contents include: 1,220 PowerPo...
    2. Certified Lean Six Sigma Black Belt Assessment Exam
      Interested in assessing your knowledge of Lean Six Sigma? Preparing for certifications? Testing your students and traine...
    3. Certified Lean Six Sigma Green Belt Assessment Exam
      This assessment exam is useful for students interested in assessing their knowledge of Lean Six Sigma on the Green Belt ...
    4. Certified Lean Six Sigma Black Belt E-book
      In 670 pages learn everything within the Lean Six Sigma DMAIC body of knowledge to successfully achieve Black Belt certi...
    5. Kaizen Workshop E-book
      This 150+ page ebook teaches key tools and techniques of Kaizen, as well as real application to enhance learning. Kaizen...
    6. Six Sigma Yellow Belt Training Slides - 2009 Version
      The 2009 Six Sigma Yellow Belt course is comprised of: 503 slidesInstructor notesSlide explanations15 data sets19 suppo...
    7. Design For Six Sigma (DFSS) E-Book or Print
      Need an "encyclopedia" consisting of many of the tools you’ll study? Need a helpful refresher to apply the DFSS process?...
     
    Six Sigma AdLinks
    AdLinks Information
     
    Home | Discussion Forum | Event Calendar | Job Shop
    Link To iSixSigma | Rate This Page | Report A Problem | Free Content For Your Site | Submit Article For Publishing
     Terms of Service. �2000-2009 iSixSigma. All rights reserved. v3.0lb, 0.0
    About iSixSigmaContact UsPrivacy PolicySite Map