Members don't see the ad below. Register now!

Hello All,

I am currently self-studying ML and one of the places where I keep on getting tripped up is in understanding the mathematical notation used. For instance I was following Prof. Andrew Ng's lecture on GLM, which ended with him explaining how the K-classifier problem can be solved using the multinomial distribution which leads to the softmax function being used in the hypothesis. The function is as follows:

alt text

I was able to follow Prof. Ng's derivation of that formula and understood how he got to the cost function. However I'm having difficulty trying to see how to vectorize it so that I can actually write code that will implement it.

Now I thought I understood how to vectorize equations like these, in that I was able to do so for the much simpler ordinary means squared and logistic regression cost functions, but when I look at the cost function for softmax regression...I am stumped.

Can someone please explain to me how, when they see equations like these, they convert them to vectorized solutions - more specifically, is there a methodical way in which to do so? Many thanks in advance!

Cheers

asked 21 Feb, 10:24

jinho's gravatar image

jinho
713

edited 21 Feb, 16:10

rhasarub's gravatar image

rhasarub
3.8k214


@jinho I found vectorizing the hardest part of most of the ML programming assignments (I took the initial class). I found I had the most luck with finding/deriving the equations in vector/matrix form from the start. One of the best reasons (IMHO) to use the vector form is the conciseness and expressiveness of the notation (computational efficiency is another good reason). I'm interested in a direct answer to your question as well, but until then I would recommend either finding a derivation of the softmax function in vector form or using a built in function which will be fully optimized (I'm not sure if the functions I reference below do what you need).

Matlab softmax: http://www.mathworks.com/help/toolbox/nnet/ref/softmax.html

Probabilistic toolkit softmax: http://code.google.com/p/pmtk3/source/browse/trunk/util/softmax.m?r=14

Link to stackoverflow question: http://stackoverflow.com/questions/8998321/vectorized-implementation-of-softmax-regression

Have you already seen this link? http://ufldl.stanford.edu/wiki/index.php/Neural_Network_Vectorization

link

answered 21 Feb, 11:03

rseiter's gravatar image

rseiter
2.2k18

edited 21 Feb, 11:05

I agree with @rseiter that it is best to formulate the equations in matrix form from the start. However, as the formulas are rarely given in that form, I find the next best thing is to work "inside out".

What I mean by that is that the core data is usually expressed in the innermost layer of the summations and you can note the form that data is in. I tend to add the dimensions (m,n and 1 for example) of my variables so I can visualize what I have on paper, making it easier to know when I need to transform or not and what the resultant matrix structure will be.

Then I look at the operation being performed and the dimensions being iterated over. If you can align the variables to allow for a standard matrix multiplication to handle the innermost operation and then sum across the result (either once to get a vector or twice in the case of scalars like cost functions) the code then just falls out of that analysis.

That is what most of the algorithms primary steps ended up being: replicating a vector into a matrix so I can apply that feature vector to all the data rows in one operation, then summing once or twice. Once you write it out, it is so short that it feels like it shouldn't have taken so much work. Then again, the Standard Model of particle physics fits on a line, so compact doesn't mean it isn't complex.

link

answered 21 Feb, 16:03

Godeke's gravatar image

Godeke ♦
7.3k11328

1

Thanks @rseiter and @Godeke. @Godeke, I've been doing that for all of my questions but for this one, I've been getting tripped up a bit more. Specifically, is the double summation of the product of the two inner matrices a Frobenius Inner Product?

(21 Feb, 18:30) jinho jinho's gravatar image
2

I think you will find the notes here: http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression useful in this case. The insight is found in the definition of p() for the ugly inner term.

This is a general truth: when you need to have a recursive level of vectorization, creating a vectorized function is usually the best course of action. This may also involve repmat() to make duplicate rows or bsxfun() ( http://octave.sourceforge.net/octave/function/bsxfun.html ) to apply your function to the matrix.

I also find "truth" matrixes to be valuable when using repmat() to control which rows actually impact the final calculation: by filtering a replicated matrix you can knock out unneeded rows (although this seems unnecessary in this case).

(22 Feb, 14:22) Godeke ♦ Godeke's gravatar image
1

Nice find @Godeke. I think the related link http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial would be worth adding to the ML wiki.

(22 Feb, 14:35) rseiter rseiter's gravatar image
Members don't see the ad below. Register now!
Your answer
toggle preview

Follow this Question via Email

Once you sign in you will be able to subscribe for any updates here

Q&A Editor Basics

  • to upload an image into your question or answer hit
  • to create bulleted or numbered lists hit or
  • to add a title or header hit
  • to section your text hit
  • to make a link clickable, surround it with <a> and </a> (for example, <a>www.google.com</a>)
  • basic HTML tags are also supported (for those who know a bit of HTML)
  • To insert an EQUATION you can use LaTeX. (backslash \ has to be escaped, so in your LaTeX code you have to replace \ with \\). You can see more examples and info here

powered by OSQA