Pojęcie wielkości efektu na tle teorii Neymana-Pearsona testowania hipotez statystycznych

Wiesław Szymczak

doi:10.18778/1427-969X.19.01

Autor

Wiesław Szymczak Uniwersytet Łódzki, Wydział Nauk o Wychowaniu, Instytut Psychologii, Zakład Metodologii Badań Psychologicznych i Statystyki

DOI:

https://doi.org/10.18778/1427-969X.19.01

Słowa kluczowe:

teorie testowania hipotez statystycznych, prawdopodobieństwo, moc testu, empiryczna moc testu, wielkość efektu

Abstrakt

Celem tej pracy jest zwrócenie uwagi badaczy wykorzystujących metody statystyczne w analizie wyników swoich badań na pomieszanie dwóch różnych teorii testowania hipotez statystycznych, teorii Fishera i teorii Neymana–Pearsona. Zawarcie, w obecnie stosowanym instrumentarium statystycznym, pomysłów z obu tych teorii, powoduje, że znakomita większość badaczy bez chwili namysłu za prawdziwą przyjmuje stwierdzenie, iż im mniejsze prawdopodobieństwo, tym silniejsza zależność. Przedstawione zostały słabe strony teorii Neymana–Pearsona i wynikające z nich problemy przy podejmowaniu decyzji w wyniku przeprowadzonych testów. Problemy te stały się usprawiedliwionym poszukiwaniem mniej zawodnych rozwiązań, jednakże zaproponowane mierniki wielkości efektu, jako wykorzystujące z jednej strony dogmat o związku między wielkością prawdopodobieństwa w teście i siłą zależności, a z drugiej – brak jakichkolwiek podstaw teoretycznych tego rozwiązania, wydają się jeszcze jednym pseudorozwiązaniem rzeczywiście występujących problemów. Dodatkowo, wykorzystywanie mierników wielkości efektów wygląda na próbę zwolnienia badaczy z głębokiego myślenia o uzyskanych wynikach z analizy statystycznej, w kategoriach merytorycznych. Powstał trywialny przepis: odpowiednia wartość miernika natychmiast implikuje siłę zależności – podejście takie wydaje się niegodne badacza.

Bibliografia

Agresti A. (1990). Categorical Data Analysis. New York: John Wiley and Sons.
Google Scholar

Allen J., Le H. (2007). An additive measure of overall effect size for logistic regression models. Journal of Educational and Behavioral Statistics, 33, 416–441.
Google Scholar DOI: https://doi.org/10.3102/1076998607306081

Anscombe F. J., Aumann R. J. (1963). A definition of subjective probability. The Annals of Mathematical Statistics, 34 (1), 199–205.
Google Scholar DOI: https://doi.org/10.1214/aoms/1177704255

APA (2010). Publication Manual, 6th ed. Washington: American Psychological Association.
Google Scholar

Berger J. O. (2003). Could Fisher, Jefreys and Neyman have agreed on testing? Statistical Sciences, 18 (1), 1–32.
Google Scholar DOI: https://doi.org/10.1214/ss/1056397485

Blalock H. M. (1975). Statystyka dla socjologów. Warszawa: PWN.
Google Scholar

Blume J. D. (2002). Likelihood methods for measuring statistical evidence. Statistics in Medicine, 21, 2563–2599.
Google Scholar DOI: https://doi.org/10.1002/sim.1216

Christensen R. (2005). Testing Fisher, Neyman, Pearson, and Bayes. The American Statistician, 59 (2), 121–126.
Google Scholar DOI: https://doi.org/10.1198/000313005X20871

Chinn S. (2000). A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine, 19 (22), 3127–3131.
Google Scholar DOI: https://doi.org/10.1002/1097-0258(20001130)19:22<3127::AID-SIM784>3.0.CO;2-M

Chow S. L. (1996). Statistical Significance: Rationale, Validity and Utility. London: Sage Publications.
Google Scholar

Cohen J. (1988). Statistical Power Analysis for the Behavioral Sciences, 2nd ed. Hillsdale: Lawrence Erlbaum Associates, Inc.
Google Scholar

Cohen J. (1992). Statistical power analysis. Current Directions in Psychological Sciences, 1 (3), 98–101.
Google Scholar DOI: https://doi.org/10.1111/1467-8721.ep10768783

Denis D. J. (2003). Alternatives to null hypothesis significance testing. Theory and Science, 4 (1), 1–17.
Google Scholar

Dienes Z. (2011). Bayesian versus orthodox statistics: Which side are you on? Perspective on Psychological Science, 6 (3), 274–290.
Google Scholar DOI: https://doi.org/10.1177/1745691611406920

Dooling D. J., Danks J. H. (1975). Going beyond tests of significance: Is psychology ready? Bulletin of the Psychonomic Society, 5, 15–17.
Google Scholar DOI: https://doi.org/10.3758/BF03336685

Dudek B. (2007). Stres związany z pracą: teoretyczne i metodologiczne podstawy badań zależności między zdrowiem a stresem zawodowym. [W:] M. Górnik-Durose, B. Kożusznik (red.), Perspektywy psychologii pracy (s. 220–246). Katowice: Wydawnictwo Uniwersytetu Śląskiego.
Google Scholar

Favreau O. E. (1997). Sex and gender comparison: Does null hypothesis testing create a false dichotomy? Feminism and Psychology, 7, 63–81.
Google Scholar DOI: https://doi.org/10.1177/0959353597071010

Field A. (2009). Discovering Statistics Using SPSS, 3rd ed. London: Sage Publications.
Google Scholar

Fisher R. A. (1935). The logic of inductive inference (with discussion). Journal of the Royal Statistical Society, 98 (1), 39–82.
Google Scholar DOI: https://doi.org/10.2307/2342435

Fisz M. (1969). Rachunek prawdopodobieństwa i statystyka matematyczna. Warszawa: PWN.
Google Scholar

Greenland S., Maclure M., Schlesselman J. J., Poole C., Morgenstern H. (1991). Standardized regression coefficients: A further critique and review of some alternatives. Epidemiology, 2 (5), 387–392.
Google Scholar DOI: https://doi.org/10.1097/00001648-199109000-00015

Greenland S., Schlesselman J. J., Criqui M. H. (1986). The fallacy of employing standardized regression coefficients and correlations as measures of effect. American Journal of Epidemiology, 123 (2), 203–208.
Google Scholar DOI: https://doi.org/10.1093/oxfordjournals.aje.a114229

Greń J. (1968). Modele i zadania statystyki matematycznej. Warszawa: PWN.
Google Scholar

Hilbe J. M. (2009). Logistic Regression Models. Boca Raton: Chapman and Hall/CRC.
Google Scholar DOI: https://doi.org/10.1201/9781420075779

Hoenig J. M., Heisey D. M. (2001). The abuse of power: The pervasive fallacy of power calculations for data analysis. The American Statistician, 55 (1), 19–24.
Google Scholar DOI: https://doi.org/10.1198/000313001300339897

Hosmer D. W., Lemeshow L. (1989). Applied Logistic Regression. New York: John Wiley and Sons.
Google Scholar

Hubbard R., Armstrong J. S. (2006). Why we don’t really know what “statistical significance” means: A major educational failure. Journal of Marketing Education, 28 (2), 114–120.
Google Scholar DOI: https://doi.org/10.1177/0273475306288399

Hubbard R., Bayarri M. J. (2003). Confusion over measures of evidence (p’s) versus errors (α’s) in classical statistical testing. The American Statistician, 57 (3), 171–182.
Google Scholar DOI: https://doi.org/10.1198/0003130031856

Inman H. F. (1994). Karl Pearson and R. A. Fisher on statistical tests: A 1935 exchange from nature. The American Statistician, 48 (1), 2–11.
Google Scholar DOI: https://doi.org/10.1080/00031305.1994.10476010

Jeffreys H. (1961). Theory of Probability, London: Oxford University Press.
Google Scholar

Jones L. V., Tukey J. W. (2000). A sensible formulation of the significance test. Psychological Methods, 5 (4), 411–414.
Google Scholar DOI: https://doi.org/10.1037/1082-989X.5.4.411

Karni E. (1993). A definition of subjective probabilities with state-dependent preferences. Econometrica, 61 (1), 187–198.
Google Scholar DOI: https://doi.org/10.2307/2951783

Kelley K., Preacher K. J. (2012). On effect size. Psychological Methods, 17 (2), 137–152.
Google Scholar DOI: https://doi.org/10.1037/a0028086

Killeen P. R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16 (5), 345–353.
Google Scholar DOI: https://doi.org/10.1111/j.0956-7976.2005.01538.x

Kline R. B. (2013). Beyond Significance Testing. Statistics Reform in the Bahavioral Sciences, 2nd ed. Washington: American Psychological Association.
Google Scholar DOI: https://doi.org/10.1037/14136-000

Kołmogorow A. N. (1933). Grundbegriffe der Wahrscheinlichkeitsrechnung. Berlin: Springer-Verlag. Za: H. Bauer (1968). Probability Theory and Elements of Measure Theory. New York: Holt, Rinehart and Winston, Inc.
Google Scholar

Laplace P. S. (1812). Theorie analytique des probabilites. Paris: Courcier.
Google Scholar

Lehmann E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88 (424), 1242–1249.
Google Scholar DOI: https://doi.org/10.1080/01621459.1993.10476404

Lehmann E. L. (1995). Neyman’s Statistical Philosophy. Probability and Mathematical Statistics, 15, 29–36.
Google Scholar

Lenth R. V. (2007). Post hoc power: Tables and commentary. Technical Report No. 378, The University of Iowa, Department of Statistics and Actuarial Sciences, July, 1–13.
Google Scholar

Levine T. R., Weber R., Hullett C., Park H. S., Lindsey L. L. M. (2008). A critical assessment of null hypothesis significance testing in quantitative communication research. Human Communication Research, 34, 171–187.
Google Scholar DOI: https://doi.org/10.1111/j.1468-2958.2008.00317.x

Lindgren B. W. (1962). Statistical Theory. New York: The Macmillan Co.
Google Scholar

Lindquist E. F. ([1938] 1993). A first course in statistics. Cambridge: Houghton Miffilin. Za: C. J. Huberty. Historical origins of statistical testing practices: The treatment of Fisher versus Neyman-Pearson views in textbooks. Journal of Experimental Education, 61 (4), 317–333.
Google Scholar DOI: https://doi.org/10.1080/00220973.1993.10806593

Machina M. J., Schmeidler D. (1992). A more robust definition of subjective probability. Econometrica, 60 (4), 745–780.
Google Scholar DOI: https://doi.org/10.2307/2951565

Magee L. (1990). R2 measures based on Wald and likelihood ratio joint significance tests. The American Statistician, 44 (3), 250–253.
Google Scholar DOI: https://doi.org/10.1080/00031305.1990.10475731

Magiera R. (2007). Modele i metody statystyki matematycznej. Cz. II. Wnioskowanie statystyczne, wyd. 2 rozszerz. Wrocław: Oficyna Wydawnicza GiS.
Google Scholar

Manthey J. (2010). Elementary Statistics: A History of Controversy. Boston: AMATYC 2010 Conference – Bridging Past to Future Mathematics, 11–14 November.
Google Scholar

Menard S. (2000). Coefficients of determination for multiple logistic regression analysis. The American Statistician, 54 (1), 17–24.
Google Scholar DOI: https://doi.org/10.1080/00031305.2000.10474502

Mises R. von (1936). Wahrscheinlichkeit, Statistik und Wahrheit. Wienna: Springer Verlag.
Google Scholar

Nagelkerke N. J. D. (1991). A note on a general definition of the coefficient of determination. Biometrika, 78 (3), 691–692.
Google Scholar DOI: https://doi.org/10.1093/biomet/78.3.691

Neyman J. (1977). Frequentist probability and frequentist statistics. Synthese, 36, 97–131.
Google Scholar DOI: https://doi.org/10.1007/BF00485695

Neyman J., Pearson E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London. Series A, 231, 289–337. Za: E. L. Lehmann (1995). Neyman’s statistical philosophy. Probability and Mathematical Statistics, 15, 29–36.
Google Scholar

O’Keefe D. J. (2007). Post hoc power, observed power, a priori power, retrospective power, prospective power, achieved power: Sorting out appropriate uses of statistical power analyses. Communications Methods and Measures, 1 (4), 291–299.
Google Scholar DOI: https://doi.org/10.1080/19312450701641375

Onwuegbuzie A. J., Leech N. L. (2004). Post hoc power: A Concept whose time has come. Understanding Statistics, 3 (4), 201–230.
Google Scholar DOI: https://doi.org/10.1207/s15328031us0304_1

Papoulis A. (1972). Prawdopodobieństwo, zmienne losowe i procesy stochastyczne. Warszawa: Wydawnictwa Naukowo-Techniczne.
Google Scholar

Rao C. R. (1982). Modele liniowe statystyki matematycznej. Warszawa: PWN.
Google Scholar

Rasch D. (2012). Hypothesis testing and the error of the third kind. Psychological Test and Assessment Modeling, 54 (1), 90–99.
Google Scholar

Roberts S., Pashler H. (2000). How persuasive is a good fit? A comment on theory testing. Psychological Review, 107 (2), 358–367.
Google Scholar DOI: https://doi.org/10.1037/0033-295X.107.2.358

Rodgers J. L. (2010). The epistemology of mathematical and statistical modeling. A quiet methodological revolution. American Psychologist, 65 (1), 1–12.
Google Scholar DOI: https://doi.org/10.1037/a0018326

Rosenthal R. (1991). Metaanalytic Procedures for Social Research, 2nd ed. Newbury Park: Sage.
Google Scholar DOI: https://doi.org/10.4135/9781412984997

Rosnow R. L., Rosenthal R. (2005). Beginning behavioural research: A conceptual primer, 5th ed. Englewood Cliffs NJ: Pearson/Prentice Hall.
Google Scholar

Royall R. (2000). On the probability of observing misleading statistical evidence (with comments). Journal of the American Statistical Association, 95 (451), 760–780.
Google Scholar DOI: https://doi.org/10.1080/01621459.2000.10474264

Royall R. (1997). Statistical Evidence. A Likelihood Paradigm. London: Chapman and Hall/CRC.
Google Scholar

Sedlmeier P., Gigerenzer G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105 (2), 309–316.
Google Scholar DOI: https://doi.org/10.1037/0033-2909.105.2.309

Seltman H. J. (2014). Experimental design and analysis. Chapter 12: Statistical power, http://www.stat.cmu.edu/~hseltman/309/Book/Book.pdf [dostęp: 10.12.2014].
Google Scholar DOI: https://doi.org/10.1017/CBO9781107256651.003

Silvey S. D. (1978). Wnioskowanie statystyczne. Warszawa: PWN.
Google Scholar

Sink C. A., Mvududu N. H. (2010). Statistical power, sampling, and effect sizes: Three keys to research relevancy. Counseling Outcome Research and Evaluation, 1 (2), 1–18.
Google Scholar DOI: https://doi.org/10.1177/2150137810373613

Sterne J. A. C. (2002). Teaching hypothesis tests – time for significant change? Statistics in Medicine, 21 (7), 985–994.
Google Scholar DOI: https://doi.org/10.1002/sim.1129

Szymczak W. (2010). Podstawy statystyki dla psychologów. wyd. 2 popr. Warszawa: Difin.
Google Scholar

Tabachnick B. G., Fidell L. S. (2007). Using Multivariate Statistics, 5th ed. Boston: Pearson Education, Inc.
Google Scholar

Thalheimer W., Cook S. (2002). How to calculate effect sizes from published research articles: A simplified methodology, http://work-learning.com/effect_sizes.htm [dostęp: 28.08.2012].
Google Scholar

Thompson B. (1994). The concept of statistical significance testing. Practical Assessment, Research and Evaluation, 4, 5.
Google Scholar

Valentine J. C., Cooper H. (2003). Effect Size Substantive Interpretation Guidelines: Issues in the Interpretation of Effect Sizes. Washington: What Works Clearinghouse.
Google Scholar

Volker M. A. (2006). Reporting effect size estimates in school psychology research. Psychology in the Schools, 43 (6), 653–672.
Google Scholar DOI: https://doi.org/10.1002/pits.20176

Williams R. H., Zimmerman D. W. (1989). Statistical power analysis and reliability of measurement. Journal of General Psychology, 116 (4), 359–369.
Google Scholar DOI: https://doi.org/10.1080/00221309.1989.9921123

Zubrzycki S. (1970). Wykłady z rachunku prawdopodobieństwa i statystyki matematycznej. Warszawa: PWN.
Google Scholar