|
Hello All, I am currently self-studying ML and one of the places where I keep on getting tripped up is in understanding the mathematical notation used. For instance I was following Prof. Andrew Ng's lecture on GLM, which ended with him explaining how the K-classifier problem can be solved using the multinomial distribution which leads to the softmax function being used in the hypothesis. The function is as follows:
I was able to follow Prof. Ng's derivation of that formula and understood how he got to the cost function. However I'm having difficulty trying to see how to vectorize it so that I can actually write code that will implement it. Now I thought I understood how to vectorize equations like these, in that I was able to do so for the much simpler ordinary means squared and logistic regression cost functions, but when I look at the cost function for softmax regression...I am stumped. Can someone please explain to me how, when they see equations like these, they convert them to vectorized solutions - more specifically, is there a methodical way in which to do so? Many thanks in advance! Cheers |
|
@jinho I found vectorizing the hardest part of most of the ML programming assignments (I took the initial class). I found I had the most luck with finding/deriving the equations in vector/matrix form from the start. One of the best reasons (IMHO) to use the vector form is the conciseness and expressiveness of the notation (computational efficiency is another good reason). I'm interested in a direct answer to your question as well, but until then I would recommend either finding a derivation of the softmax function in vector form or using a built in function which will be fully optimized (I'm not sure if the functions I reference below do what you need). Matlab softmax: http://www.mathworks.com/help/toolbox/nnet/ref/softmax.html Probabilistic toolkit softmax: http://code.google.com/p/pmtk3/source/browse/trunk/util/softmax.m?r=14 Link to stackoverflow question: http://stackoverflow.com/questions/8998321/vectorized-implementation-of-softmax-regression Have you already seen this link? http://ufldl.stanford.edu/wiki/index.php/Neural_Network_Vectorization |
|
I agree with @rseiter that it is best to formulate the equations in matrix form from the start. However, as the formulas are rarely given in that form, I find the next best thing is to work "inside out". What I mean by that is that the core data is usually expressed in the innermost layer of the summations and you can note the form that data is in. I tend to add the dimensions (m,n and 1 for example) of my variables so I can visualize what I have on paper, making it easier to know when I need to transform or not and what the resultant matrix structure will be. Then I look at the operation being performed and the dimensions being iterated over. If you can align the variables to allow for a standard matrix multiplication to handle the innermost operation and then sum across the result (either once to get a vector or twice in the case of scalars like cost functions) the code then just falls out of that analysis. That is what most of the algorithms primary steps ended up being: replicating a vector into a matrix so I can apply that feature vector to all the data rows in one operation, then summing once or twice. Once you write it out, it is so short that it feels like it shouldn't have taken so much work. Then again, the Standard Model of particle physics fits on a line, so compact doesn't mean it isn't complex. 1
2
I think you will find the notes here: http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression useful in this case. The insight is found in the definition of p() for the ugly inner term. This is a general truth: when you need to have a recursive level of vectorization, creating a vectorized function is usually the best course of action. This may also involve repmat() to make duplicate rows or bsxfun() ( http://octave.sourceforge.net/octave/function/bsxfun.html ) to apply your function to the matrix. I also find "truth" matrixes to be valuable when using repmat() to control which rows actually impact the final calculation: by filtering a replicated matrix you can knock out unneeded rows (although this seems unnecessary in this case). 1
Nice find @Godeke. I think the related link http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial would be worth adding to the ML wiki. |