I am really excited for my selection in GSoc'15. I hope that I will live up to the expectations of my mentors and be able to complete my project in the allotted duration.
My project is about dynamic Bayesian networks, which is time-variant extension of the static Bayesian networks,and have a wide range of applications in protein sequencing and voice recognition systems.
So what is a Bayesian network?
Bayesian network is a directed acyclic graph(DAG) that is an efficient and compact representation for a set of conditional independence assumptions about distributions. They are an elegant framework for learning models from data that can be combined with prior expert knowledge.
Here is what a static Bayesian Network looks like,
(The conventional student example presented in the book
Probabilistic Graphical Models:Principles and Techniques by Daphne Koller and Nir Freidman is as follows.)
The above directed graph tries to represent the random variables as nodes in a graph.These nodes represent the random variables and the edges represent the direct influence of one variable of one another.
Here we are trying to monitor the grade the student that the student is going to get, which is conditionally dependent on the
difficulty of the subject. The recommendation
letter assigned to the student will now be stochastically dependent on the
grade assigned by the professor.
Here I have assumed that the grade assigned by the professor is ternary valued and the rest of the variables are secondary valued.
$Difficulty :- Domain = Val(D) = \{d^0(easy),d^1(hard)\}$
$Grade:- Domain = Val(G) = \{g^1,g^2,g^3\}$
$Letter:- Domain = Val(L) = \{l^0(strong),l^1(weak)\}$
$Intelligence:- Domain = Val(I) = \{i^0,i^1\}$
$SAT:- Domain = Val(S) = \{s^0, s^1\}$
In general each random variable is associated with a
Conditional Probability Distribution also called as a
CPD that specifies the distribution over the values of the random variable associated with its parents.The CPD encodes the distribution of the variables and help in precisely determining the output of the variable. Here is what a CPD encoded Bayesian network would look like:-
One such model $P(I)$ represents the distribution of intelligent versus the less intelligent students. Another model $P(D)$ represents the distribution such that the difficult classes are distinguished from the lesser difficult ones. (Let's call the above Bayesian Network as $B_{Student}$ for future reference)
Let's consider some particular case for this example.
$P(i^0,d^1,g^2,s^1,l^0)$
The probability of this event can be computed from the events comprising it so by this formula by the
chain rule
$P(I,D,G,S,L) = P(D)P(I)P(G|I,D)P(S|I)P(L|G)$
Thus the probability of this state can be used by the formula, so the probability of this event can be given by
$P(i^0,d^1,g^2,s^1,l^0) = P(d^1)P(i^0)P(g^2|i^0, d^1)P(s^1|i^0)P(l^0|g^2)$
$ = 0.4*0.7*0.25*0.05*0.4 = 0.014 $
Thus using the chain rule we can compute the probability of any given state.
Now, let's come to Dynamic Bayesian Networks
Dynamic Bayesian Networks(DBN's) are static Bayesian networks that are modeled over an arrangement of time-series. In a Dynamic Bayesian Network, each
time slice is conditionally dependent on the previous one.Suppose that $B_{student}$ is duplicated for a time series to form
a two time slice Bayesian Network
(2-TBN). (The variables are shown with a single letter notation for a compact representation.The value in the subscript notation denotes the value of the time series that the variable belongs to),it will look as follows
Assuming that the output $l$ is the observed output(or the evidence) along the course of time,there are two sets of edges in the above Bayesian Network. The first set of edges are the
intra-slice edges that represent the influence between the random variables inside the original static Bayesian Network. Thus, the topology of the initial Bayesian Network is same as that of the $B_{student}$.
The second set of the edges are the
inter-slice edges that represent the conditional influence between random variables in two different slices that are adjacent to each other(here the underlying assumption is that this a
2-TBN for a lesser complexity.If this were a
n-TBN, the inter-slice edges could branch out from the first slice to any of the
n-th time slice).
The probabilities among the original distribution now determine the probabilities in the successive time series.The conditional influence between $D^0$ and $G^0$ will remain the same as that of $D^1$ and $G^1$.Thus the CPD's among the
intra-slice edges will remain the same.
However there will be a requirement for CPD's along the inter slice edges, that will further provide more information about the random variables in other states.
Now if we were to determine the probability of $P(L_{1}) $, it would not only be dependent on the random variables that are present in the time series but also the previous state's variables too.
In the next blog post, I will further inform how to compute the conditional probabilities in a DBN.
Here is another complicated example that could demonstrate how tedious DBN's could look.
This example is of a
BATmobile: Towards a Bayesian Automated Taxi suggested by forbes.
.
Source:-
http://bnt.googlecode.com/svn/trunk/docs/usage_dbn.html