A simpler definition of the word statistics would be “A science that is used to collect, organize and interpret data”. However, according to the Shorter Oxford English Dictionary, the word statistics means “The department of study that has for its object the collection and arrangement of numerical facts or data, whether relating to human affairs or to natural phenomena.”
What are the common terms used in statistics?
Statistics can be classified as descriptive statistics and inferential statistics. Descriptive statistics is used to organize data in tabular, diagrammatic, graphical and numerical methods. However, inferential statistics is used to derive conclusions from a set of data.
This is defined as “A quantity or attribute, which varies from one member of the population being studied to another.” There are two types of variables and they are called Qualitative and Quantitative variables. Qualitative variables describe the attributes such as eye color and skin complexion.
Quantitative variables are numerically represented data such as age and body mass index. They can be categorized as discrete and continuous. Discrete variables can only take known fixed numbers such as number of people attending a concert. It cannot take numbers such as 2.3 or 1.4. On the other hand, continuous variables can take any number.
Population is defined as “The total collection of objects, people or data, which statistical inferences are drawn,” e.g., all the patients who suffer from Tuberculosis in a country. Populations can be finite or infinite. Example of an infinite population would be “All the people who will suffer from Tuberculosis in the future.”
It is usually not possible to get a practical value for the given variable in a large or infinite population. A sample in statistics means the values of the variables for members of a part or subset of the population. However, the sample must represent the population in respect to the variables being studied.
What are the measures of central tendency?
Central tendency is the middle value or average of a data set. There are three measures of central tendencies in statistics.
This is the sum of all values in a data set divided by the number of such values. It is also called the average.
The median is the middle value in a data set that is sorted in order (ascending or descending). If there are two even numbers in the middle, then the median would be the average of those two values.
The mode is the value of a data set occurring with the most frequency. If there are no repeated values in a data set then there is no mode for that data set. Unlike mean, the mode is not susceptible to the effects of extreme values in a data set.
Measures of dispersion
A measure of dispersion is important in measuring the spread of a data set. They are range, variance and standard deviation. Range is the difference between smallest and largest values in a dataset. Usually it is the easiest to calculate.
Standard deviation is based on the deviations from the arithmetic mean. Value of the standard deviation indicates how accurately a set of data is dispersed. Variance is the square of the standard deviation.
Sources and further reading
Statistics at square one by T.D.V. Swinscow