Who else thinks Problems with kNN lecture 36 was voodoo? I did not understand a single damn thing, it gets frustrating at times, these people are great at what they do, but they either can't or don't bother to be clear and explain properly. Hands up if you are getting frustrated as hell :)

asked 31 Oct '11, 17:10

calculemus1988's gravatar image

calculemus1988
2.1k1849

retagged 31 Oct '11, 18:53

rhasarub's gravatar image

rhasarub
4.0k719


I admit, they went through this rather quickly, so I had to rewatch a couple of times. The main point here is the problem with high-dimensional data. To get a better feel for what's going on, google the curse of dimensionality and check out the top few hits, e.g. this quora post.

The specific problem with kNN is basically this: suppose you put points randomly on an interval [0,1]. The average distance between them will be 1/3. On a 1 x 1 square (n=2), it will be more like 0.52. There's no simple solution for arbitrary number of dimensions n (see here for details), but you can sort of intuitively see that this number will grow with dimensionality. Thus, if points are distributed randomly, in a high dimension, everything is far apart.

Now, suppose you have 100 features and of these, 80 are important, and 20 are random. The fact that the distances along random dimensions will be, on average, quite large, makes it hard to create robust clusters that represent sensible discrimination of the data. That's my take on this.

link

answered 31 Oct '11, 19:27

dnquark's gravatar image

dnquark
4958

edited 31 Oct '11, 23:36

kNN was pretty straightforward.

If you only have two classes: Connect to the nearest neighbor of the point you're trying to classify, and find the neighbor's class.

Keep connecting to the next nearest neighbor, find its class, until you find (k+1)/2 neighbors of the same class.

When you've found (k+1)/2 neighbors of the same class, assign that class to the point you're trying to classify. You're done.

So for two classes, k=7 is just like the World Series (American baseball championship): while it's nominally a seven game series, once one team (class) wins four games, the remaining games don't matter and aren't played. For k=7, (k+1)/2 = (7+1)/2 = 4.

k = 5 is like a Division Series (best of 5); if one team wins three, any remaining games are not played: For k = 5. (k + 1)/2 = 3.

For k = 3, (k + 1)/2 = 2. If your first two nearest neighbors are in the same class, you're done. Don't even bother looking for the third neighbor.

Conversely, if you can tell that three points are your nearest neighbors, but you can't tell which of the three is closest or farthest, no problem. Just find out which class in in the majority among those three. Similarly for higher ks, too.

link

answered 31 Oct '11, 17:38

tpdi's gravatar image

tpdi
1.2k19

edited 31 Oct '11, 20:33

I don't understand the number of upvotes here ... This answer has even nothing to do with the question (it's about the "problems with kNN" lecture, not kNN in general.
(31 Oct '11, 21:11) KillianDS KillianDS's gravatar image

I don't know what you're talking about. kNN classifiers are about the simplest AI subject of all, and I find no fault with the way it was handled in the course.

link

answered 31 Oct '11, 18:30

xperroni's gravatar image

xperroni ♦
1.4k21123

He is talking about problems of kNN (most importantly dimensionality) which was mentioned rather quick for people not at home in advanced linear algebra/abstract geometrics.
(31 Oct '11, 21:06) KillianDS KillianDS's gravatar image

There are videos that teach us things. There are other - supplementary - videos that show practical examples or provide additional info. Don't worry if you cannot imagine a 100-dimensional space well. ;-)

link

answered 31 Oct '11, 17:38

dlask's gravatar image

dlask ♦
9.7k218

When he started talking about many many dimensions I took it as a hypothetical problem (or network) with many many parameters. Not sure if that is accurate but it made sense in my head at the time.

link

answered 31 Oct '11, 21:48

LostInMoneta's gravatar image

LostInMoneta
92412

calculemus1988, did you see this thread? http://www.aiqus.com/questions/7405/knn-homework-questions-difficult-without-coordinates-any-way-to-get-some-kind-of-distance-indication

It might help you understand the material better and how to answer the homework question. The tip of printing the graph and drawing areas around the data points and the image posted by dougfinn helped me solve it.

link

answered 01 Nov '11, 00:15

Veronique's gravatar image

Veronique
1187

Your answer
toggle preview

Follow this Question via Email

Once you sign in you will be able to subscribe for any updates here

Q&A Editor Basics

  • to upload an image into your question or answer hit
  • to create bulleted or numbered lists hit or
  • to add a title or header hit
  • to section your text hit
  • to make a link clickable, surround it with <a> and </a> (for example, <a>www.google.com</a>)
  • basic HTML tags are also supported (for those who know a bit of HTML)
  • To insert an EQUATION you can use LaTeX. (backslash \ has to be escaped, so in your LaTeX code you have to replace \ with \\). You can see more examples and info here

powered by OSQA