hmm: implementation of viterbi algorithm (Durbin, 1998) Part 1

Example in the mentioned book goes as following – dishonest casino uses two types of dice. Fair one that has equal probability of landing on either side (1/6), and the loaded one with 50% chance for getting 6. Your task is to figure out which die has been used (states) just based on the sequence of the outcomes (symbols).

Notice that we convert all probabilities in log scale. Viterbi algorithm selects the most probable path that can have very low value by the end of the run. Thus, conversion to appropriate scale helps avoid calculation issues.

data – http://pastebin.com/NzBE4Fm1


# we'll represent loaded die as "L", and the fair one as "F"
states <- c("F", "L")

# following matrix defines the probability of switching the die
transition.matrix <- t(matrix(data = log2(c(0.95, 0.05, 0.1, 0.9)), nrow = 2, ncol = 2, dimnames = list(c("F", "L"), c("F", "L"))))

# emission probabilities tell you what is the change of landing on each side given that the particular die is selected
emission.matrix <- matrix(data = log2(c(rep(1/6, 6), c(rep(1/10, 5), 1/2))), nrow = 6, ncol = 2, dimnames = list(seq(1,6), c("F", "L")))

# initial probabilities define the chance of starting outcome (in our case we are equally likely to start with either states)
initial.matrix <- matrix(data = log2(c(0.5, 0.5)), nrow = 2, ncol = 1, dimnames = list(c("F", "L"), ""))

After we defined the model, we need to initialize two object that will keep the track of probability history and state path (Pi) during the recursion process.


prob.history  <- data.frame()
state.history <- data.frame()

# we start by calculating the probability of being in particular state given the first symbol and initial matrix
# notice a change in log space - every multiplication is converted to summation
prob.history  <- rbind(prob.history, unlist(lapply(states, function(k) {
    initial.matrix[k, 1] + emission.matrix[symbol.sequence[1], k]
})))

state.history <- rbind(state.history, states)

colnames(prob.history)  <- c("F", "L")
colnames(state.history) <- c("F", "L")

Second part coming soon! mintgene

4 thoughts on “hmm: implementation of viterbi algorithm (Durbin, 1998) Part 1

  1. Hello:

    When I run lines 6-8 in the second block of code above, I get the following error: Error in FUN(c(“F”, “L”)[[1L]], …) : object ‘symbol.sequence’ not found

    Any idea why this might happen?

    Thanks.

    Chris

    • Thanks a bunch! I completely overlooked the data. And the code is great–thanks for sharing.

Leave a comment