Palash's GSoc Blog : June 2015

Tuesday, 30 June 2015

Inference in Dynamic Bayesian Network (continued)

For the past 2 weeks I have spent some time understanding the algorithmic implementation for inference and implementing it. Today I will be talking about the junction tree algorithm for inference in Dynamic Bayesian Networks.

For processing the algorithm, here are the following steps

1) Initialization :- This requires constructing the two initial junction trees J1 and Jt.

J1 is the junction tree created from the initial timeslice. is the junction tree created from the timeslice 1 of the 2TBN(2 - timeslice bayesian network).Jt is the junction tree created from the timeslice 2 of the 2TBN. Time counter is initialized to 0. Also, let the interface nodes(denoted by I1, I2 for the timeslices 1 and 2 respectively ) be those nodes whose children are there in the first timeslice.
If the queries are performed on the initial timeslice. Then the results can be output by the standard VariableElimination procedure where we could have the model having the timeslice 1 of the bayesian network as the base for inference.
For evidence, if the current time in the evidence is 0, then the evidence should be applied to the initial static bayesian network. Otherwise, it has to be applied to the second timeslice of the 2-TBN.
For creating the junction tree J1, the procedure as follows:-

Moralize the initial static bayesian network.
Add the edges from the interface nodes so as to make I1 a clique.
Rest of the procedure is the same as it was before. The above step is the only difference.

For the junction tree Jt, a similar procedure is followed, where there is a clique formed for I2 as well.

2) Inference procedure:- In this procedure, the clique potential from the interface nodes is passed onto the interface clique. (similar to the message passing algorithm).

The time counter is incremented accordingly.

So basically the junction tree Jt` seems some sort of the engine where the in-clique is where the values are supplied and the out-clique is where the values are obtained, given the e

The variables in the query are taken out as always at each step, and the evidence is applied also.

The best part about this procedure, that this method eliminates entanglement and only the out-clique potential is required for inference.

The implementation is still in progress.

Wednesday, 17 June 2015

Inference in Dynamic Bayesian Networks

Today, I will be talking about how the inference works in Dynamic Bayesian Networks.
We could have applied the following methods,
1) Naive Method:- There is one way where we could unroll the bayesian network as much as we'd like and then apply inference methods that we applied for the standard bayesian network. However, this methods lead to exponentially large graphs, thereby increasing the time that it takes for inference.
(It was a surprise to me, wondered if we could do so, but then leads to a lot of problems)
2) Forward and Backward Algorithm:- We could apply this algorithm but then this applies exclusively for hidden markov model(hmm). This involves converting each of those nodes into state spaces, basically increasing the sizes, again leading to huge complexity.
So as to reduce the complexity of inference, the methods are as follows:-
We could compute a prior belief state by using some recursive estimation.
Let's assume that

Then we could propagate the state forward by the following algorithm.

So this complicated jargon would be always multiplied for a recursive procedure so as to do the filtering algorithm.
Other Algorithms are the Frontier Algorithms and the Interface Algorithms which are more popular for the inference on a large scale.
Apart from that, there could be certain expectation maximization algorithms which may help in computing the most probable path.
This is what I am planning to implement in a couple of weeks.

Wednesday, 3 June 2015

Hidden Markov Models

In the previous week, I have described about what dynamic bayesian network. I have also written the

basic framework that the dynamic bayesian network encompasses. Currently, I am planning the hidden markov model framework. The functionality of the hidden markov model, will be similar to that of matlab.
1st week report
I have finished off coding the dynamic bayesian network that has the methods unrolling and removing time slices.

What are hidden markov models(HMM)?

They are basically dynamic bayesian networks on a smaller scale, which makes them lesser complex and more efficient. The above example, consists of nodes Q1 and Q2 which are known as hidden nodes while the nodes Y1 and Y2 are known as the observed nodes.
The hidden states are represented by discrete random variables. Again this is a 2-TBN similar to the dynamic bayesian networks.

The probabilities of the observed nodes are given by the formula.
$P(Z_t|Z_{t-1}) = \prod_{i = 1}^{N} P({Z_t}^{i}| Pa({Z_t}^{(i)})) $
where $Z_t$ is the $i^th$ node in the time slice $t$ also ( $N = N_h + N_o$) and $Pa({Z_t}^i)$ are the parents of the $Z_t^i$ node which may be in the same time slice or the previous one.
The above figure shows a 2-TBN. The probabilities can be given as follows:-
$P(Q_2, Y_2) = P(Q_2|Q_1)\times P(Q_1) \times P(Y_1|Q_1)$
The general equation becomes:
$P(Q_{1:T}, Y_{1:T}) = P(Q_1)P(Y_1|Q_1) \times \prod_{t= 2}^{T}P(Q_t|Q_{t-1})P(Y_t|Q_t)$
There are more standard sophisticated models such as the coupled HMM's
which look like this

Now in the above chain there are several hidden variables that are interacting with each other, with the observed variables $Y_1,Y_2$ and $Y_3$
The chains interact directly with each other affecting the adjacent chains. So, there might be a possibility of $X_2'$ interacting with the 1st chain if unrolled further.
The coupled HMM's are usually used in detecting temperatures for fire alarms and help in explicitly encoding the relation between them.
Another application, that the HMM helps in understanding is the most probable states in a sequence of time slices. The maximum probable state is then found by the popular Viterbi Algorithm.
The states are found out as follows:-
\begin{array}{rcl}
V_{1,k} &=& \mathrm{P}\big( y_1 \ | \ k \big) \cdot \pi_k \\
V_{t,k} &=& \max_{x \in S} \left( \mathrm{P}\big( y_t \ | \ k \big) \cdot a_{x,k} \cdot V_{t-1,x}\right)
\end{array}
Basically in every state, the maximum probable state is always chosen and then the probabilities of the maximum probable state is found out using the simplistic chain rule.
The algorithm is remarkably similar to the one used in belief propagation used in bayesian model.
These algorithms make hidden markov model remarkably significant, and makes them of serious application to speech recognition and convolutional codes.
This is what I am planning to implement within the next 2 weeks. Then I will start with inference in dynamic bayesian networks.