Members don't see the ad below. Register now!
0
1

I am working on UFLDFL tutorial for softmax regression. Due to some bug in my implementation my gradients are not matching.It may due to an error in using one of those heavily vectorized equations but i am having a hard time tracking it. If someone has implemented it any help would be greatly appreciated.

i am getting a cost arnd 0.0323 which seems reasonable to me.

const=(lambda*sum((sum(theta.^2)-theta(1).^2)))/2;
for i=1:m
  cost=cost +(1-y(:,i))'*log(1-hx(:,i)) +y(:,i)'*log(hx(:,i)) ;
end

cost=-cost/m + const;

thetagrad=((hx-y)*data')/m+lambda*theta;
thetagrad(1)=thetagrad(1)-lambda*theta(1);

hx is the hypothesis

asked 30 Mar '12, 01:53

trailblazer1019's gravatar image

trailblazer1019
503115

edited 30 Mar '12, 01:53

Members don't see the ad. Register now!

AFAIK the gradient is Ok, the problem seems to be in the cost computation, my version is as follows:

cost = - mean(sum(y .* log(hx))) + (lambda/2.0) * sum(sumsq(theta));

link

answered 30 Mar '12, 11:58

Ale's gravatar image

Ale
7037

that fixes my code :) ....but can you elaborate on the mean part i am not really getting the jist of how it matches up to the equation given in text
(30 Mar '12, 12:15) trailblazer1019 trailblazer1019's gravatar image
I've just used mean as a shorthand for (1/m)*sum feel free to change if you prefer for clarity reasons!
(30 Mar '12, 12:53) Ale Ale's gravatar image

my code is,but it can't work very well,the result of gradient-checking is low enough,got 3.7769e-10.,but when I use it to train with minfunc,it got "Function Value changing by less than TolX" in 20 iterations.could you help me?

numCases = size(data, 2);
groundTruth = full(sparse(labels, 1:numCases, 1));  
M = theta*data;     
M = bsxfun(@minus, M, max(M, [], 1));
h = exp(M);
h =  bsxfun(@rdivide, h, sum(h));
cost = -1/numCases*sum(sum(groundTruth.*log(h)))+lambda/2*sum(sum(theta.^2));
thetagrad = -1/numCases*((groundTruth-h)*data')+lambda*theta;%log(h)
link

answered 11 Mar, 09:46

darkscope's gravatar image

darkscope
111

I think some people might be getting confused when he starts talking about softmax reducing to logistic regression (1-y terms). The pure loop form should look like this:

  m = size(data, 2);
  denom = zeros(1,m);
  for i = 1:m
    for k = 1:numClasses
      denom(i) = denom(i) + exp(theta(k,:) * data(:,i));
    endfor
  endfor

  for i = 1:m
    for k = 1:numClasses
      if (labels(i) == k)
          cost = cost + log( exp(theta(k,:) * data(:,i)) / denom(i) );
      endif
    endfor
  endfor
  cost /= -m;

Or using vectorized form:

y = full(sparse(labels, 1:m, 1));
z = theta * data;
z  = bsxfun(@minus, z, max(z, [], 1));
h = exp(z);
h = bsxfun(@rdivide, h, sum(h));
cost = -1/m* sum( (y .* log(h))(:) ) + lambda/2*sum(theta(:).^2);
link

answered 04 Sep '12, 21:55

Charles%20Beyer's gravatar image

Charles Beyer
262

...actually I wonder if a later version of Octave supports [param1,~] and/or sumsqr (for compatibility. I'll have to download latest version and try (I was using Octave 3.2.4).

link

answered 22 Jul '12, 06:13

itooam's gravatar image

itooam
11

well if you are not big on piracy issues you can always get a copy of matlab from piratebay.
(22 Jul '12, 18:40) trailblazer1019 trailblazer1019's gravatar image

Thank you so much trailblazer1019... this means so much to me.

I haven't yet looked at the code to try and understand what is happening but hope to today. I spent some time trying to get this working in Octave as don't have Matlab.

My fixes to make compatible in case of any use to anybody else(?):

1) "sumsqr" needed replacing with "sumsq" (Octave equivalent - in the context it was used).

2) the minFunc library (written by Mark Schmidt) caused an error in the polyinterp.m file with the line:

% Find interpolating polynomial [params,~] = linsolve(A,b);

I downloaded the latest version thinking it was "linsolve" causing the issue - I couldn't find where this function was as doesn't appear in Octave help so I assumed part of his library (I thought I had a file missing)... http://www.di.ens.fr/~mschmidt/Software/minFunc_2012.zip Anyway still didn't work, finally found that Octave doesn't seem to like the comma tilda in [params,~]. I assume the comma tilda is a way to use dummy variables if output variables have to be declared for a correct function call? (maybe someone can confirm)? Anyway replacing with:

% Find interpolating polynomial params = linsolve(A,b);

seemed to do the trick.

Then your program worked in Octave :D

link

answered 22 Jul '12, 06:08

itooam's gravatar image

itooam
11

https://www.dropbox.com/sh/p347fohdoby0zuv/hXTyhyipKc

This should get you all the files. Its been a while i looked at it, so it might contain some of the code i have written.Anways, it should have all the default code that you get from UFLDL.

Let me know if you need any help!

link

answered 21 Jul '12, 21:53

trailblazer1019's gravatar image

trailblazer1019
503115

...and if so... please could you post here?

link

answered 21 Jul '12, 17:32

itooam's gravatar image

itooam
11

Thanks Ale... don't suppose you still have the calling code/files do you?

link

answered 21 Jul '12, 17:30

itooam's gravatar image

itooam
11

Ale or trailblazer1019, if you are still around please could you post your vectorised/full version of softmax with an example of usage...? Last week the UFLDL website "disappeared"(?) before I got chance to implement. Would really really appreciate it...

I would use code above but I don't know what the "groundTruth matrix" is? So I assume I need the calling code also?

link

answered 20 Jul '12, 09:32

itooam's gravatar image

itooam
11

groundTruth is a re-arrangement of labels for data instances: groundTruth = full(sparse(labels, 1:numCases, 1)); And yes you would need the rest of the code since this part only computes the cost and gradient.
(21 Jul '12, 15:58) Ale Ale's gravatar image

well i think it should be ok since its based on ufldfl tutorials which are basically for self learning.

by the way, here is my whole code sample .

    %% ---------- YOUR CODE HERE --------------------------------------
%  Instructions: Compute the cost and gradient for softmax regression.
%                You need to compute thetagrad and cost.
%                The groundTruth matrix might come in handy.

% sizes (debug mode)---- data 8 100, theta 10 8,z and hx 10 100
m=size(data,2);
z=theta*data;
z = bsxfun(@minus, z, max(z, [], 1));
hx=exp(z);
hx = bsxfun(@rdivide, hx, sum(hx));
y=groundTruth;
const=(lambda*sum((sum(theta.^2)-theta(1).^2)))/2;

%cost=  -sum(sum(((1-y)*log(1-hx)' + y*log(hx)')/m)) +const

for i=1:m
 cost=cost +(1-y(:,i))'*log(1-hx(:,i)) +y(:,i)'*log(hx(:,i)) ;
end
cost=-cost/m + const;

thetagrad=((hx-y)*data')/m+lambda*theta;

% ------------------------------------------------------------------
% Unroll the gradient matrices into a vector for minFunc
grad = [thetagrad(:)];
end

the code above shows my implementation. I tried doing the vectorized way, u can see that line is commented out but the cost doesn't come out to be the same. Down below are my gradient results they are off , so there is some tiny thing that i am not catching or i haven't understood well.Any help would be really appreciated.

     -0.0287   -0.0259
    0.0334    0.0301
    0.0675    0.0607
   -0.0141   -0.0127
    0.0499    0.0449
   -0.0018   -0.0017
    0.0117    0.0105
   -0.0443   -0.0399
    0.0186    0.0167
   -0.0920   -0.0827
   -0.0171   -0.0154
    0.0191    0.0171
   -0.0062   -0.0055
   -0.0076   -0.0068
   -0.0228   -0.0206
    0.0106    0.0095
    0.0417    0.0376
    0.0113    0.0101
   -0.0151   -0.0136
   -0.0139   -0.0126
    0.0297    0.0267
   -0.0241   -0.0216
    0.0077    0.0068
   -0.0474   -0.0426
   -0.0797   -0.0717
   -0.0006   -0.0006
    0.0321    0.0289
    0.0083    0.0074
    0.0071    0.0064
    0.0670    0.0603
   -0.0478   -0.0430
   -0.0481   -0.0432
   -0.0196   -0.0176
    0.0351    0.0317
link

answered 30 Mar '12, 11:19

trailblazer1019's gravatar image

trailblazer1019
503115

edited 30 Mar '12, 11:20

I've implemented the full vectorized version (without any loops) following the guidelines on UFLDL but, as I new in this forum I don't know if it's ok to publish the code, let me know.

BTW the last line of your code should'nt be there, you need to apply regularization to all parameters since the form used is overparameterized.

Hope it helps

link

answered 30 Mar '12, 10:14

Ale's gravatar image

Ale
7037

Your answer
toggle preview

Follow this Question via Email

Once you sign in you will be able to subscribe for any updates here

Q&A Editor Basics

  • to upload an image into your question or answer hit
  • to create bulleted or numbered lists hit or
  • to add a title or header hit
  • to section your text hit
  • to make a link clickable, surround it with <a> and </a> (for example, <a>www.google.com</a>)
  • basic HTML tags are also supported (for those who know a bit of HTML)
  • To insert an EQUATION you can use LaTeX. (backslash \ has to be escaped, so in your LaTeX code you have to replace \ with \\). You can see more examples and info here

powered by OSQA