Members don't see the ad below. Register now!

# Who else thinks Problems with kNN lecture 36 was voodoo?

 0 Who else thinks Problems with kNN lecture 36 was voodoo? I did not understand a single damn thing, it gets frustrating at times, these people are great at what they do, but they either can't or don't bother to be clear and explain properly. Hands up if you are getting frustrated as hell :) asked 31 Oct '11, 17:10 calculemus1988 2.1k●1●8●41 rhasarub 4.0k●5●19

 3 I admit, they went through this rather quickly, so I had to rewatch a couple of times. The main point here is the problem with high-dimensional data. To get a better feel for what's going on, google the curse of dimensionality and check out the top few hits, e.g. this quora post. The specific problem with kNN is basically this: suppose you put points randomly on an interval [0,1]. The average distance between them will be 1/3. On a 1 x 1 square (n=2), it will be more like 0.52. There's no simple solution for arbitrary number of dimensions n (see here for details), but you can sort of intuitively see that this number will grow with dimensionality. Thus, if points are distributed randomly, in a high dimension, everything is far apart. Now, suppose you have 100 features and of these, 80 are important, and 20 are random. The fact that the distances along random dimensions will be, on average, quite large, makes it hard to create robust clusters that represent sensible discrimination of the data. That's my take on this. answered 31 Oct '11, 19:27 dnquark 495●8
Members don't see the ad. Register now!
 1 kNN was pretty straightforward. If you only have two classes: Connect to the nearest neighbor of the point you're trying to classify, and find the neighbor's class. Keep connecting to the next nearest neighbor, find its class, until you find (k+1)/2 neighbors of the same class. When you've found (k+1)/2 neighbors of the same class, assign that class to the point you're trying to classify. You're done. So for two classes, k=7 is just like the World Series (American baseball championship): while it's nominally a seven game series, once one team (class) wins four games, the remaining games don't matter and aren't played. For k=7, (k+1)/2 = (7+1)/2 = 4. k = 5 is like a Division Series (best of 5); if one team wins three, any remaining games are not played: For k = 5. (k + 1)/2 = 3. For k = 3, (k + 1)/2 = 2. If your first two nearest neighbors are in the same class, you're done. Don't even bother looking for the third neighbor. Conversely, if you can tell that three points are your nearest neighbors, but you can't tell which of the three is closest or farthest, no problem. Just find out which class in in the majority among those three. Similarly for higher ks, too. answered 31 Oct '11, 17:38 tpdi 1.2k●7 I don't understand the number of upvotes here ... This answer has even nothing to do with the question (it's about the "problems with kNN" lecture, not kNN in general. (31 Oct '11, 21:11) KillianDS
 1 I don't know what you're talking about. kNN classifiers are about the simplest AI subject of all, and I find no fault with the way it was handled in the course. answered 31 Oct '11, 18:30 xperroni ♦ 1.4k●2●11●23 He is talking about problems of kNN (most importantly dimensionality) which was mentioned rather quick for people not at home in advanced linear algebra/abstract geometrics. (31 Oct '11, 21:06) KillianDS
 0 There are videos that teach us things. There are other - supplementary - videos that show practical examples or provide additional info. Don't worry if you cannot imagine a 100-dimensional space well. ;-) answered 31 Oct '11, 17:38 dlask ♦ 9.7k●2●18
 0 When he started talking about many many dimensions I took it as a hypothetical problem (or network) with many many parameters. Not sure if that is accurate but it made sense in my head at the time. answered 31 Oct '11, 21:48 LostInMoneta 924●12
 0 calculemus1988, did you see this thread? http://www.aiqus.com/questions/7405/knn-homework-questions-difficult-without-coordinates-any-way-to-get-some-kind-of-distance-indication It might help you understand the material better and how to answer the homework question. The tip of printing the graph and drawing areas around the data points and the image posted by dougfinn helped me solve it. answered 01 Nov '11, 00:15 Veronique 118●7
 toggle preview community wiki