Tom Apostol gives a nice brief introduction to the history of probabability in the second volume of his calculus textbook.
A gambler's dispute in 1654 led to the creation of a mathematical theory of probability by two famous French mathematicians, Blaise Pascal and Pierre de Fermat. Antoine Gombaud, Chevalier de Méré, a French nobleman with an interest in gaming and gambling questions, called Pascal's attention to an apparent contradiction concerning a popular dice game. The game consisted in throwing a pair of dice 24 times; the problem was to decide whether or not to bet even money on the occurrence of at least one "double six" during the 24 throws. A seemingly wellestablished gambling rule led de Méré to believe that betting on a double six in 24 throws would be profitable, but his own calculations indicated just the opposite.
One way we can address Mr. Gombaud's question today is to experiment. We run a trial, and then we know whether we'll win or lose. Suppose we try it  we throw 2 dice 24 times and come up without double6's, so we say to ourselves, "I'm going to lose my money if I pay to play this game", so we don't. But somebody else has done that same experiment, and gets a pair of double sixes, and so they think, "This is a game I can win, atleast most of the time. I'd be happy to pay fifty cents to play it." And when they actually play, they might win or they might lose. That one experiment didn't tell them (or us) what was going to happen very well  this is a game of chance, and the outcome can be different each time it is played.
And so, we've reached a great philosophical bifurcation point  how do we model and make sense of things in the world that do not follow simple constant laws  things that may change for no easily observable reason. In simple societies, unusual events are often cause for invoking superstition and supernatural spirits and ancestors. In ancient greece Aesop recognized the goddess Tyke (Fortuna) in his fables as reason for outcomes that had no recognizable cause, suggesting that people could use religious offerings to improve their luck. This idea also resonates with islamic idea of occasionalism, promoted by Abu alGhazali in the 11th century. Occasionalism says that rather than interpretting predictable cause and effect as laws of got, all events like should really be seen like chance events as direct acts of God's. This suggests that laws of nature are actually acts of God and that the study of theology should replace the study of nature. In popular culture as in Thomas Hardy's 1878 novel "Return of the native", fate and irony are invoked as agents of luck. (see Roll the Bones: The History of Gambling for more history)
The birth of probability theory arose from an alternative perspective that was emerging in France and Italy in the 1500's and 1600's. While no individual trial of a gambling game can be deterministically understood, patterns could be found when large numbers of games where considered, either played by many different people, or played by the same people repeatedly! So, instead of considering just one trial, we should instead consider many trials. And instead of talking about whether a trial will be a win or a lose, we can then talk about how often a trial will be a win or a loss. This new language has allowed us to greatly expand and explore our description of the world.
Once people began describing the world with probabilities, whole new areas of description and experimentation openned up. You might find some of the following interesting...
Today, we can use our computers to quickly perform randomizing experiments involving many trials. To do this, though, we have to make an assumption about our die, and how it will roll. One idea (perhaps the simplest idea, but by no means the only idea) is that each time you roll the die, it is equally likely to come up with each of the six numbers on its sides, and that you can not use past rolls to predict anything about future rolls.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 

Of 100,000 trials, you might a fraction 0.491040 of them. That would suggest that you shouldn't pay more than about 60 cents to play the game if you want to come out a winner. But this number comes up different every time we calculate it. If we run things 5 more times, we get
Of 100000 trials, you won a fraction 0.493010 of them
Of 100000 trials, you won a fraction 0.492310 of them
Of 100000 trials, you won a fraction 0.489920 of them
Of 100000 trials, you won a fraction 0.490850 of them
Of 100000 trials, you won a fraction 0.491080 of them
The numbers are only a little different, but they are still different. There is a fundamental limitation on all Montecarlo methods like the one above  they converge to an answer at a rate proportional to the square root of the number of trials. Thus, to get an answer that is twice as accurate, we have to quadruple the number of trials. The gambler of the 1600's did not realize this precisely, but they new it was hard to tell exactly how fair a game was by trial and error, and hoped that some mathematical calculations would be a better way. Pascal, Fermat, and Laplace ultimately obliged them with the development of probability theory, and the concept of independence.
If we assume that all outcomes are equally likely (equipartion postulate), then there are \(36\) possible outcomes every time we roll a pair of dice. Of these, there are \(35\) that are losers. The chance of losing each role is then \(35/36\). The chance of losing all the roles is \((35/36)^{24}\), so the chance of winning atleast one is \[1  \left(\frac{35}{36}\right)^{24} = 0.4914...\] If you pay less than 49 cents each time you play, then on average over a long time, you are very likely to be making more money than you'll be losing. This calculated value is centered among our simulation estimates, and thus, passes the smelltest.
This is not a course in probability theory  I will assume the basic idea of a probability is familiar to you, and that commonsense and reasoning will be able to lead us through any brambles we may stumble across. What we will address now is how we can use combinatorics and probability to model various possible outcomes of processes where chance is prominent.
Like in geometry, we like to use probability theory by find formula's for the shapes over various events. But in this case, shape is a probablity measure (a.k.a. density, distribution) for how often the different events under consideration can be expected to occur. For the GombaudPascal problem, doublesixes may not appear, may appear once, tiwice, maybe as many as 24 times. But we will probably never observe 24 doublesixes in a row, although we will often observe no doublesixes. Here is a simulation that shows how often the various outcomes have occurred in experiment and compares them with the predicted frequencies from the binomial distribution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 

Here is a list of a few common and important discrete probability distributions. For reference, given a random variable \(x\) that takes the value of a natural number, then the probability generating function for \(x\) is \[\hat{x}(s) = E[s^n] = \sum _ {n=0}^{\infty} p _ n s^n\] while the cumulant generating function \[\tilde{x}(t) = \ln E[e^{t}] = \ln \hat{x}(e^{t}).\]
A random variable with takes the value 1 (success) with probablity p and 0 (failure) with probability 1p.
\[\hat{x}(t) = 1p + p s\] \[\tilde{x}(t) = \ln( 1 p + p e^t)\]
What is the probability of having k successes among \(n\) trials when each trial is independent and identical? The answer is the binomial probability distribution, with \(p\) being the probability of an individual success. The binomial probability generating function \[\hat{x}(s) = \sum_{k=0}^{n} \Pr(X = k) s^k = \sum_{k=0}^{n} {}_nC_k \, p^k(1p)^{nk} s^k.\]
What is the probability that you can have k successes before your first failure? The answer is a geometric measure with success probability p. \[\hat{x}(s) = \sum _ {k=0}^{\infty} (1p) p^k s^k\] \[\tilde{x}(t) = n \ln( p + (1p) e^t)\]
What is the probability of having k successes before r failures, when each trial is independent and identical? The answer is a Pascal probability measure, (a special case of the negative binomial probability measure) \[\sum _ {k=0}^{\infty} {} _ {k+r1}C _ {k} p^k(1p)^r\] \[\tilde{x}(t) = n\ln((1p)/(1pe^{t}))\]
The Pascal measure also appears if events are Poisson distributioned, but the mean of the Poisson in a heterogeneous population is Gammadistributed.
Another family of commonly used measures are powerlaws. These are important particularly because the way a measures's tail thins out can have important consequences in many theories.
\[\frac{1}{\zeta(n)} \sum _ {x=1}^{\infty} \frac{1}{x^n} s^n\] where \(\zeta(n)\) is the Riemann zetafunction.