In order to learn the concepts of probability and statistics first we need to have some idea on the subjects of combination and permutation. For the case of permutation, the videos on this page (accessed 09/16/15) will digest the concept of permutation and the videos on this page (accessed 09/16/15) do the same for combinations that are not order conscious. Perfect recourse from Khanacademy!
In probability theory, an event is a possible outcome of a trial. probability of the event E is shown as P(E). For example, a trial is throwing a die, and an event is to get a 4. Probability of any event is between 0 (certainly will not occur, such as P(7) when throwing a die) and 1 (certainly will occur, such as P(head or tail) when flipping a coin). Total probabilities of all the possible events is equal to 1. That is, probability if getting any number between 1 to 6 for throwing a die is the probability of getting either one of those numbers, which is a certainty. The complementary probability is the probability of a specific event not happening = p(not E) = 1 - P(E). Sometimes probabilities of events depend on other events that result on conditional probabilities. For example, drawing cards without replacement increases the probability of future events for drawing a specific card. If there is replacement, then the probability of drawing any specific card is independent of the probability of other drawings happened before. If:
P(A) is probability of event A
P(A,B) is probability of events A and B
P(A+B) is probability of event A or B
P(A|B) is the conditional probability of event A given that event B is occurred
Then we have:
P(A+B) = P(A) + P(B) - P(A,B)
If events A and B are independent then P(A|B) = P(A) ==> P(A,B) = P(A)P(B) = P(B)P(A)
If events A and B are dependent then P(A|B) = P(A,B)/P(B) ==> P(A,B) = P(A)P(B|A) = P(B)P(A|B)
A variable (accessed 09/16/15) can be discrete or continuous. Accordingly, events can be discrete or continuous and they are called discrete numerical events. If events can be categories as a continuous way, they form a continuous distribution. When we have a distribution, we will deal with measures of central tendency (mean, median, and mode) that shows where is the middle value or range of the event distribution is. Mode is the most frequently observed value in a distribution. Median divides the distribution into two parts with equal number of observations, which makes it immune to extreme observations. The mean is the arithmetic average of the distribution values, which is influenced by extreme observations. Since, finding a population mean is not feasible, sample mean is used as an unbiased estimation for it. It is unbiased because on the average, it is equal to the population mean. Sometimes, if some observations are more important than the other ones, maybe the weighted mean is used for calculation of mean. Also, sometimes, geometric mean is used when the average of ratios is required. Another term that is used is the root mean square, which is the square root of "the sum of squared values of observations divided by number of observation". Videos on this page (accessed 09/16/15) from Khanacademy, explains everything about measures of central tendency in details.
Time to star reviewing calculus now.
If a distribution is known, confidence intervals C.I. or confidence limits C.L. can be calculated for the true value of a metric (e.g. C.I. of mean). According to the central limit theorem, if we draw n samples from a normal distribution with mean μ, and standard deviation σ, means of these samples will will be normally distributed with mean μ, and variance variance σ2/n. This concept helps in defining the confidence limits (accessed 09/16/15) (one-tail an two-tail). The knowledge of confidence limits also help in hypothesis testing (accessed 09/16/15). Finally, if two variables x and y are sampled from two different standard normal distributions, their sum will be coming from a distribution with mean μx + μy and variance σx2/nx + σy2/ny. While, the difference will come from a distribution with the same variance of σx2/nx + σy2/ny but mean μx - μy.
ust like we have measures of central tendency, there are measures of dispersion (variance, standard deviation, etc.) as well that show how much variation there is in a distribution. An unbiased estimator of the population's standard deviation is the sample standard deviation. Variance is the square of standard deviation. Videos on this page (accessed 09/16/15) from Khanacademy, explains everything about measures of dispersion in details. Super resource! The ration of a measure of dispersion to a measure of central tendency is called a relative dispersion. For example, coefficient of variation, CV (ration of sample standard deviation to sample mean), is a relative dispersion measure.
Next subject is the probability density function, f(x) for a random variable, for which the videos on this page (accessed 09/16/15)are excellent resource for comprehending the concepts. Probability distribution function, F(x) gives the cumulative probability. There are different known distribution functions that are frequently used in different applications. Examples, include binomial, normal (accessed 09/16/15), and t distributions. Binomial is a discrete distribution used for binary outcomes, such as flipping a coin with only two possible outcome events. Normal is a continuous symmetrical distribution, that is also known as bell-shaped curve. t-distribution which is also known as student's t-distribution is used for comparing two variables to see if the difference between the distribution of two variables are statistically significant (t-test). t-distribution is symmetric around 0 and as the sample size increases, it approaches normal distribution.