I have found that native markov chains are really not great at reflecting the real world business processes and would like to propose three variations and an implementation trick.
Here is a really good write up on standard markov chains and a really good graphical representation.
In the business world a markov chain could be used to define a sales process with say nine steps:

Request

Quote

Approval

Contract

Assign job

Implement

Close

Bill

Lost
For every client request there is a good probability that there will be a quote and a small probability it will be lost before quote (you give them a verbal price and they hang up). Once you submit a quote it may be approved or it may be lost due to budget constraints. When you get to contract stage you have now done enough work to know that something may be missing from the original request which means you would also have the probability of going from contract to assign job or back to quote or (if you can’t agree on terms, losing the job). Hopefully 6, 7 and 8 work with close to 100% probability with almost nothing going to lost or back to approvals but that’s not reality.
Herewith the matrix for the process we will use in the post:
Request 
Quote 
Approval 
Contract 
Assign job 
Implement 
Close 
Bill 
Lost 

Request 
0.42 
0.42 
0 
0 
0 
0 
0 
0 
0.17 
Quote 
0 
0.4 
0.4 
0.08 
0 
0 
0 
0 
0.12 
Approval 
0 
0.08 
0.42 
0.42 
0 
0 
0 
0 
0.08 
Contract 
0 
0.06 
0.12 
0.18 
0.59 
0 
0 
0 
0.06 
Assign job 
0 
0.02 
0.06 
0.11 
0.17 
0.57 
0 
0 
0.06 
Implement 
0 
0.01 
0.02 
0.06 
0.11 
0.17 
0.57 
0 
0.06 
Close 
0 
0 
0 
0 
0 
0.02 
0.49 
0.49 
0 
Bill 
0 
0 
0 
0 
0 
0 
0 
1 
0 
Lost 
0 
0 
0 
0 
0 
0 
0 
0 
1 
Reading the above matrix, if you have a request you have a 42% chance of a request being delayed, 42% chance of moving to quote phase (when those product people actually give you the prices) and 17% chance of losing the deal during the request phase (numbers are rounded which gives the slight discrepancy). If a product is billed it will remain billed. If a request is lost it will remain lost, etc.
So the trick is really simple but world drive the stats guys up the wall. If you take the path matrix and use integer counts rather than probabilities it makes it a lot simpler. This means, as per my example, that all I have to do to model the business process is count the number of requests in step 1, then count how many requests move to quote stage, how many stay at request stage and how many move to lost. This makes it a pure counting exercise that can be easily done within any team. Go the the sales team and ask how many requests come in and how many quotes are generated, then count from that stage through to completion and fill in your matrix.
Now, the trick. Take each row of numbers and divide by total row value to give you the probabilities and this can be done in the code. So if you design the code to accept normal integers and then normalize it without user intervention then you can start implementing the chains very quickly.
So the the matrix would actually be simple counts as follows:
Request 
Quote 
Approval 
Contract 
Assign job 
Implement 
Close 
Bill 
Lost 

Request 
50 
50 
0 
0 
0 
0 
0 
0 
20 
Quote 
0 
50 
50 
10 
0 
0 
0 
0 
15 
Approval 
0 
10 
50 
50 
0 
0 
0 
0 
10 
Contract 
0 
5 
10 
15 
50 
0 
0 
0 
5 
Assign job 
0 
2 
5 
10 
15 
50 
0 
0 
5 
Implement 
0 
1 
2 
5 
10 
15 
50 
0 
5 
Close 
0 
0 
0 
0 
0 
2 
50 
50 
0 
Bill 
0 
0 
0 
0 
0 
0 
0 
50 
0 
Lost 
0 
0 
0 
0 
0 
0 
0 
0 
100 
I also would like to point out that for very large markov chains like the one I build for “Starship Titanic” it is more efficient to use Object notation (or sparse data representation) and matrix representation. Let me dig into that a bit deeper.
A matrix has to represent the probability of going from each state to every other state even if the probability is zero. This is awesome if you have a small markov chain and lost of nonzero probabilities all over the place. It also works really well in R, for large sample sets and small markov chains.
If you have thousands, or tens of thousands of states with the probability of going from one state to only two, three or ten other states then it is a lot more efficient to use object notation like JSON to store the information as a key value pair. This implementation works really well in Python, small sample sets and huge markov chains.
On a thousand samples through the above chain you get:
If you've ever looked at signal processing then you would recognize this as an impulse response. We are shocking the system with 1000 samples at the first iteration and watching it goes through the system. If you didn't see it, let's have a look at the volume per stage (line graph) and then you’ll see the starting stage and each intermediate stage (17) resembles a first order impulse response. At each step the sum of all the items in each state must equal the total number of items. As this is a closed this must be consistent throughout the whole process.
Using this model we can then predict the response of the system to any other input. The other option is to model the input as a flow rather than a pulse, so rather than push 1000 samples in initially, select more realistic parameters, like:

Have a number of loops in multiples of 21 to match a month or months (let's assume monday to friday operation and read the second bullet)

Split the samples either evenly. You can’t accommodate weekends if this because not only does your input change (your client put in less or no requests on weekends) but your probabilities change based on staff and supplier availability of the weekends.
Just by adjusting these two items your chain starts giving you much more meaningful information. You are also changing this to an open system, so you need to set criteria for dropping information off the plots (if not in the model). The criteria I will use are if the deal is lost or the deal is billed as we want to see the performance of the intermediate stages.
Using a loop count of 70 and an input per loop (number of requests per day) of 100 gives this output for our chain. So Here you see that any state above 100 is causing a bottleneck. We are necessarily dropping any items that is in an end node (it transitions to no other state) because the analysis thereof is of no use in the process any more.
What you are now looking at are the processes that start having bottlenecks. We are going to look at two different ways to more accurately model this in future parts.
In the next post we’ll be looking at modifying the markov chain to match reality a little more closely.