Statistics is the
science concerned with developing and studying methods for collecting,
analyzing, interpreting and presenting empirical data. (1)
Statistics is the science of learning from data,
and of measuring, controlling, and communicating uncertainty; and it thereby
provides the navigation essential for controlling the course of scientific and
societal advances. This field will become ever more critical as academia,
businesses, and governments rely increasingly on data-driven decisions,
expanding the demand for statistics expertise. (2)
Statistics is a
highly interdisciplinary field; research in statistics finds applicability in
virtually all scientific fields and research questions in the various
scientific fields motivate the development of new statistical methods and
theory. (3)
Two fundamental ideas
in the field of statistics are uncertainty and variation. There are many
situations that we encounter in science (or more generally in life) in which
the outcome is uncertain. In some cases the uncertainty is because the outcome
in question is not determined yet (e.g., we may not know whether it will rain
tomorrow) while in other cases the uncertainty is because although the outcome
has been determined already we are not aware of it (e.g., we may not know
whether we passed a particular exam). (4)
Probability is a
mathematical language used to discuss uncertain events and probability plays a
key role in statistics. Any measurement or data collection effort is subject to
a number of sources of variation. By this we mean that if the same measurement
were repeated, then the answer would likely change. Statisticians attempt to
understand and control (where possible) the sources of variation in any
situation. (5)
It is asserted that statistics must be relevant to
making inferences in science and technology. The subject should be renamed
statistical science and be focused on the experimental cycle, design–execute–analyse–predict.
Its part in each component of the cycle is discussed. The P-value
culture is claimed to be the main prop of non-scientific statistics, leading to
the cult of the single study and the proliferation of multiple-comparison
tests. The malign influence of P-values on protocols for the
analysis of groups of experiments is discussed, and also the consequences of
the formation of inferentially uninteresting linear models. Suggestions for
action by statisticians include the sorting out of modes of inference, the
removal of non-scientific procedures, the offering of help to editors, the
promotion of good software and teaching methods built round the experimental
cycle. (6)
Statistics is seen as being primarily concerned with
the theory and practice of the matching of theory to data by research workers.
Swings between data-heavy and model-heavy views of statistics are discussed,
and also aspects which inhibit communication between statisticians and
scientists or technologists, especially the insufficient attention given by
statisticians to the problems of combining information from many data sets.
Several obstacles to better communication by statisticians are discussed, and
also how the current gap between them and scientists/technologists might be
bridged. (7)
The inter-relations of the two are considered and it
is pointed out that neither can be said to underlie or dominate the other though
practitioners of each are often intolerant of the alternative approach.
Statistics may have a more important role to play in technology than in
science; it may itself best be considered as a technology rather than as a
science. These ideas are discussed in the context of the teaching and practice
of statistics in general and medical statistics in particular. (7)
Descriptive statistics allow a scientist to quickly
sum up major attributes of a dataset using measures such as the mean, median,
and standard deviation. These measures provide a general sense of the group
being studied, allowing scientists to place the study within a larger context. Inferential
statistics are used to model patterns in data, make judgments about data,
identify relationships between variables in datasets, and make inferences about
larger populations based on smaller samples of data. Transferring results from
small sample sizes to large populations is especially important with respect to
scientific studies. (8)
The phrase "statistically significant" is a key concept in data analysis, and it is commonly misunderstood. Many people assume that, like the common use of the word significant, calling a result statistically significant means that the result is important or momentous, but this is not the case. Instead, statistical significance is an estimate of the probability that the observed association or difference is due to chance rather than any real association. In other words, tests of statistical significance describe the likelihood that an observed association or difference would be seen even if there were no real association or difference actually present. The measure of significance is often expressed in terms of confidence, which has the same meaning in statistics as it does in common language, but can be quantified. (9)
References
1. Chatfield
C. Statistics for technology: a course in applied statistics. Routledge; 2018.
2.
Davidian M, Louis TA. Why
statistics? American Association for the Advancement of Science; 2012.
3.
Ostle B. Statistics in research.
Stat Res. 1963;(2nd Ed).
4.
Mandel J. The statistical analysis
of experimental data. Courier Corporation; 2012.
5.
Hogg R V, Tanis EA, Zimmerman DL.
Probability and statistical inference. Vol. 993. Macmillan New York; 1977.
6.
Nelder JA. From statistics to
statistical science. J R Stat Soc Ser D (The Stat. 1999;48(2):257–69.
7.
Nelder JA. Statistics, science and
technology. J R Stat Soc Ser A. 1986;149(2):109–21.
8.
LeBlanc DC. Statistics: concepts
and applications for science. Jones & Bartlett Learning; 2004.
9.
Wasserman L. All of statistics: a
concise course in statistical inference. Vol. 26. Springer; 2004.