We will be using the ghmm package for this tutorial. It allows you to predict state transitions and train the model on real data. The documentation is sparse, but hopefully this tutorial will help you to be successsful. You can find the source distribution on psoda.cs.byu.edu in /usr/local/src/ghmm-0.7.0 and /usr/local/src/ghmmwrapper-0.7.
So, lets work through an example or two and then let you try some things on your own.
import ghmm from ghmm import * help('ghmm')
dna = ['a','c','t','g'] sigma = Alphabet(dna) print sigma
A = [[0.9, 0.1], [0.3, 0.7]]
normal = [.25,.15,.35,.25] island = [.25,.25,.25,.25] B=[normal,island]
pi = [0.5] * 2
m=HMMFromMatrices(sigma,DiscreteDistribution(sigma),A,B,pi) print m
obs_seq = m.sampleSingle(200) print obs_seq
train_seq = EmissionSequence(sigma, [ 'g', 'c', 'c', 'g', 'g', 'c', 'g', 'c', 'g', 'c', 'g', 'c', 'c', 'g', 'c', 'g', 'c', 'g', 'c', 'g', 'c', 'c', 'g', 'c', 'g', 'c', 'c', 'c', 't', 't', 't', 't', 't', 't', 'a', 't', 'a', 'a', 'a', 'a', 't', 't', 't', 'a', 't', 'a', 't', 'a', 'a', 'a', 't', 'a', 't', 't', 't', 't', 'g', 'c', 'c', 'g', 'g', 'c', 'g', 'c', 'g', 'c', 'g', 'c', 'c', 'g', 'c', 'g', 'c', 'g', 'c', 'g', 'c', 'c', 'g', 'c', 'g', 'c', 'c', 'c', 't', 't', 't', 't', 't', 't', 'a', 't', 'a', 'a', 'a', 'a', 't', 't', 't', 'a', 't', 'a', 't', 'a', 'a', 'a', 't', 'a', 't', 't', 't', 't' ])You can also use the "char for char" construct.
seqList = [ char for char in 'atgtgtgactctagctacgta' ]I know it is different from normal fasta, but you guys are perl experts, so you should be able to convert fasta to this format and import it from a file.
m.baumWelch(train_seq)
obs_seq = m.sampleSingle(200) print obs_seq
m.viterbi(obs_seq)
seq = EmissionSequence(sigma, ['c'] * 20 + ['t'] * 10 + ['c'] * 40)
m.baumWelch(seq) m.viterbi(seq)
obs_seq = m.sampleSingle(800) print obs_seq
m.viterbi(obs_seq)