|
Who else thinks Problems with kNN lecture 36 was voodoo? I did not understand a single damn thing, it gets frustrating at times, these people are great at what they do, but they either can't or don't bother to be clear and explain properly. Hands up if you are getting frustrated as hell :) |
|
I admit, they went through this rather quickly, so I had to rewatch a couple of times. The main point here is the problem with high-dimensional data. To get a better feel for what's going on, google the curse of dimensionality and check out the top few hits, e.g. this quora post. The specific problem with kNN is basically this: suppose you put points randomly on an interval [0,1]. The average distance between them will be 1/3. On a 1 x 1 square (n=2), it will be more like 0.52. There's no simple solution for arbitrary number of dimensions n (see here for details), but you can sort of intuitively see that this number will grow with dimensionality. Thus, if points are distributed randomly, in a high dimension, everything is far apart. Now, suppose you have 100 features and of these, 80 are important, and 20 are random. The fact that the distances along random dimensions will be, on average, quite large, makes it hard to create robust clusters that represent sensible discrimination of the data. That's my take on this. |
|
When he started talking about many many dimensions I took it as a hypothetical problem (or network) with many many parameters. Not sure if that is accurate but it made sense in my head at the time. |
|
calculemus1988, did you see this thread? http://www.aiqus.com/questions/7405/knn-homework-questions-difficult-without-coordinates-any-way-to-get-some-kind-of-distance-indication It might help you understand the material better and how to answer the homework question. The tip of printing the graph and drawing areas around the data points and the image posted by dougfinn helped me solve it. |