Members don't see the ad below. Register now!

# softmax regression - has anyone successfully implemented it ?

 0 1 I am working on UFLDFL tutorial for softmax regression. Due to some bug in my implementation my gradients are not matching.It may due to an error in using one of those heavily vectorized equations but i am having a hard time tracking it. If someone has implemented it any help would be greatly appreciated. i am getting a cost arnd 0.0323 which seems reasonable to me. const=(lambda*sum((sum(theta.^2)-theta(1).^2)))/2; for i=1:m cost=cost +(1-y(:,i))'*log(1-hx(:,i)) +y(:,i)'*log(hx(:,i)) ; end cost=-cost/m + const; thetagrad=((hx-y)*data')/m+lambda*theta; thetagrad(1)=thetagrad(1)-lambda*theta(1);  hx is the hypothesis asked 30 Mar '12, 01:53 trailblazer1019 503●1●15
Members don't see the ad. Register now!

 1 AFAIK the gradient is Ok, the problem seems to be in the cost computation, my version is as follows: cost = - mean(sum(y .* log(hx))) + (lambda/2.0) * sum(sumsq(theta)); answered 30 Mar '12, 11:58 Ale 703●7 that fixes my code :) ....but can you elaborate on the mean part i am not really getting the jist of how it matches up to the equation given in text (30 Mar '12, 12:15) trailblazer1019 I've just used mean as a shorthand for (1/m)*sum feel free to change if you prefer for clarity reasons! (30 Mar '12, 12:53) Ale
 0 my code is,but it can't work very well,the result of gradient-checking is low enough,got 3.7769e-10.,but when I use it to train with minfunc,it got "Function Value changing by less than TolX" in 20 iterations.could you help me? numCases = size(data, 2); groundTruth = full(sparse(labels, 1:numCases, 1)); M = theta*data; M = bsxfun(@minus, M, max(M, [], 1)); h = exp(M); h = bsxfun(@rdivide, h, sum(h)); cost = -1/numCases*sum(sum(groundTruth.*log(h)))+lambda/2*sum(sum(theta.^2)); thetagrad = -1/numCases*((groundTruth-h)*data')+lambda*theta;%log(h)  answered 11 Mar, 09:46 darkscope 11●1
 1 I think some people might be getting confused when he starts talking about softmax reducing to logistic regression (1-y terms). The pure loop form should look like this:  m = size(data, 2); denom = zeros(1,m); for i = 1:m for k = 1:numClasses denom(i) = denom(i) + exp(theta(k,:) * data(:,i)); endfor endfor for i = 1:m for k = 1:numClasses if (labels(i) == k) cost = cost + log( exp(theta(k,:) * data(:,i)) / denom(i) ); endif endfor endfor cost /= -m;  Or using vectorized form: y = full(sparse(labels, 1:m, 1)); z = theta * data; z = bsxfun(@minus, z, max(z, [], 1)); h = exp(z); h = bsxfun(@rdivide, h, sum(h)); cost = -1/m* sum( (y .* log(h))(:) ) + lambda/2*sum(theta(:).^2);  answered 04 Sep '12, 21:55 Charles Beyer 26●2
 0 ...actually I wonder if a later version of Octave supports [param1,~] and/or sumsqr (for compatibility. I'll have to download latest version and try (I was using Octave 3.2.4). answered 22 Jul '12, 06:13 itooam 1●1 well if you are not big on piracy issues you can always get a copy of matlab from piratebay. (22 Jul '12, 18:40) trailblazer1019
 0 Thank you so much trailblazer1019... this means so much to me. I haven't yet looked at the code to try and understand what is happening but hope to today. I spent some time trying to get this working in Octave as don't have Matlab. My fixes to make compatible in case of any use to anybody else(?): 1) "sumsqr" needed replacing with "sumsq" (Octave equivalent - in the context it was used). 2) the minFunc library (written by Mark Schmidt) caused an error in the polyinterp.m file with the line: % Find interpolating polynomial [params,~] = linsolve(A,b); I downloaded the latest version thinking it was "linsolve" causing the issue - I couldn't find where this function was as doesn't appear in Octave help so I assumed part of his library (I thought I had a file missing)... http://www.di.ens.fr/~mschmidt/Software/minFunc_2012.zip Anyway still didn't work, finally found that Octave doesn't seem to like the comma tilda in [params,~]. I assume the comma tilda is a way to use dummy variables if output variables have to be declared for a correct function call? (maybe someone can confirm)? Anyway replacing with: % Find interpolating polynomial params = linsolve(A,b); seemed to do the trick. Then your program worked in Octave :D answered 22 Jul '12, 06:08 itooam 1●1
 1 https://www.dropbox.com/sh/p347fohdoby0zuv/hXTyhyipKc This should get you all the files. Its been a while i looked at it, so it might contain some of the code i have written.Anways, it should have all the default code that you get from UFLDL. Let me know if you need any help! answered 21 Jul '12, 21:53 trailblazer1019 503●1●15
 0 ...and if so... please could you post here? answered 21 Jul '12, 17:32 itooam 1●1
 0 Thanks Ale... don't suppose you still have the calling code/files do you? answered 21 Jul '12, 17:30 itooam 1●1
 0 Ale or trailblazer1019, if you are still around please could you post your vectorised/full version of softmax with an example of usage...? Last week the UFLDL website "disappeared"(?) before I got chance to implement. Would really really appreciate it... I would use code above but I don't know what the "groundTruth matrix" is? So I assume I need the calling code also? answered 20 Jul '12, 09:32 itooam 1●1 groundTruth is a re-arrangement of labels for data instances: groundTruth = full(sparse(labels, 1:numCases, 1)); And yes you would need the rest of the code since this part only computes the cost and gradient. (21 Jul '12, 15:58) Ale
 0 well i think it should be ok since its based on ufldfl tutorials which are basically for self learning. by the way, here is my whole code sample .  %% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Compute the cost and gradient for softmax regression. % You need to compute thetagrad and cost. % The groundTruth matrix might come in handy. % sizes (debug mode)---- data 8 100, theta 10 8,z and hx 10 100 m=size(data,2); z=theta*data; z = bsxfun(@minus, z, max(z, [], 1)); hx=exp(z); hx = bsxfun(@rdivide, hx, sum(hx)); y=groundTruth; const=(lambda*sum((sum(theta.^2)-theta(1).^2)))/2; %cost= -sum(sum(((1-y)*log(1-hx)' + y*log(hx)')/m)) +const for i=1:m cost=cost +(1-y(:,i))'*log(1-hx(:,i)) +y(:,i)'*log(hx(:,i)) ; end cost=-cost/m + const; thetagrad=((hx-y)*data')/m+lambda*theta; % ------------------------------------------------------------------ % Unroll the gradient matrices into a vector for minFunc grad = [thetagrad(:)]; end  the code above shows my implementation. I tried doing the vectorized way, u can see that line is commented out but the cost doesn't come out to be the same. Down below are my gradient results they are off , so there is some tiny thing that i am not catching or i haven't understood well.Any help would be really appreciated.  -0.0287 -0.0259 0.0334 0.0301 0.0675 0.0607 -0.0141 -0.0127 0.0499 0.0449 -0.0018 -0.0017 0.0117 0.0105 -0.0443 -0.0399 0.0186 0.0167 -0.0920 -0.0827 -0.0171 -0.0154 0.0191 0.0171 -0.0062 -0.0055 -0.0076 -0.0068 -0.0228 -0.0206 0.0106 0.0095 0.0417 0.0376 0.0113 0.0101 -0.0151 -0.0136 -0.0139 -0.0126 0.0297 0.0267 -0.0241 -0.0216 0.0077 0.0068 -0.0474 -0.0426 -0.0797 -0.0717 -0.0006 -0.0006 0.0321 0.0289 0.0083 0.0074 0.0071 0.0064 0.0670 0.0603 -0.0478 -0.0430 -0.0481 -0.0432 -0.0196 -0.0176 0.0351 0.0317  answered 30 Mar '12, 11:19 trailblazer1019 503●1●15
 1 I've implemented the full vectorized version (without any loops) following the guidelines on UFLDL but, as I new in this forum I don't know if it's ok to publish the code, let me know. BTW the last line of your code should'nt be there, you need to apply regularization to all parameters since the form used is overparameterized. Hope it helps answered 30 Mar '12, 10:14 Ale 703●7
 toggle preview community wiki