# Is this how you use Naive Bayes?

 0 Hello All, I have been going over the Naive Bayes algorithm as presented by Professor Andrew Ng in his CS229 lectures, and I wanted to make sure I understand how it is implemented (here I am talking about the simplest case without Laplace Smoothing, and not dealing with the multivariate Naive Bayes). So as I understand it, the goal is to ultimately calculate $$P(y \mid x=1)$$ by creating the models $$P(x=1 \mid y)$$ and $$P(y)$$ (essentially solving for Bayes' Rule by finding the values for $$P(x=1 \mid y), P(Y), and P(X)$$). In the case where the feature vector $$x$$ is binary-valued, such that $$x_j, j = 1, ..., n, x_j \in \{0, 1\}, where x = [ x_1, x_2, ..., x_n ]^T$$, we can say that the model is parameterized by $$\phi_{j \mid y=0} = p(x_j=1 \mid y=0), \phi_{j \mid y=1} = p(x_j=1 \mid y=1)$$ and $$\phi = p(y=1)$$. Therefore we can model the joint-distributions as: $$P(x \mid y=0) = \prod_{j=1}^n p(x_j \mid y=0) = \prod_{j=1}^n (\phi_{j \mid y=0})^{x_j} (1 -\phi_{j \mid y=0})^{(1-x_j)}$$ $$P(x \mid y=1) = \prod_{j=1}^n p(x_j \mid y=1) = \prod_{j=1}^n (\phi_{j \mid y=1})^{x_j} (1 -\phi_{j \mid y=1})^{(1-x_j)}$$ $$P(y) = (\phi_{y})^{y} (1 -\phi_{y})^{(1-y)}$$ And we can plug this back into Bayes' Rule as: $$P(y \mid x=1) = \frac{P(x=1 \mid y) P(y=1)}{P(x)}$$ $$P(y \mid x=1) = \frac{P(x=1 \mid y) P(y=1)}{P(x=1 \mid y=1) P(y=1) + P(x=1 \mid y=0) (1-P(y=1)}$$ $$P(y \mid x=1) = \frac{\prod_{j=1}^n (\phi_{j \mid y=1})^{x_j} (1 -\phi_{j \mid y=1})^{(1-x_j)} (\phi_{y})^{y} (1 -\phi_{y})^{(1-y)}} {\prod_{j=1}^n (\phi_{j \mid y=1})^{x_j} (1 -\phi_{j \mid y=1})^{(1-x_j)} (\phi_{y})^{y} (1 -\phi_{y})^{(1-y)} + \prod_{j=1}^n (\phi_{j \mid y=0})^{x_j} (1 -\phi_{j \mid y=0})^{(1-x_j)} (1 - (\phi_{y})^{y} (1 -\phi_{y})^{(1-y)})}$$ We then solve for the parameters to maximize the likelihood function, which gives us: $$\phi_{j \mid y=1} = \frac{\sum_{i=1}^m 1\{x_j^{(i)} = 1 \wedge y^{(i)} = 1 \}}{\sum_{i=1}^m 1\{y^{(i)} = 1 \}}$$ $$\phi_{j \mid y=0} = \frac{\sum_{i=1}^m 1\{x_j^{(i)} = 1 \wedge y^{(i)} = 0 \}}{\sum_{i=1}^m 1\{y^{(i)} = 0 \}}$$ $$\phi_{y} = \frac{\sum_{i=1}^m 1\{y^{(i)} = 1 \}}{m}$$ Going back to the model and plug in the values calculated of the parameters to find $$P(y=1 \mid x)$$, which if we were still using the spam filter example, a value of $$P(y=1 \mid x) \geq 0.5$$ would indicate that the email is spam. Is this correct reasoning? Thanks! (Please forgive me if the LaTeX formatting is poor - this is my first time using it here) Cheers asked 02 Mar '12, 12:11 jinho 71●4 I didn't know naive bayes this complex! Which lecture are you addressing? or maybe his book? (02 Mar '12, 15:31) comptrol Haha, it's very possible I'm making it overly complicated. This is from Professor Andrew Ng's lecture (I believe #4 or 5), and while he goes over the theory pretty much in depth, he sort of glossed over the implementation. I just wanted to see whether my idea of how it might be implemented is right, or along the right path. Thanks! Just a thought, but perhaps I should have asked whether this applies to Generative Algorithms in general. Eager to hear your thoughts. (02 Mar '12, 18:37) jinho
Be the first one to answer this question!
 toggle preview community wiki