Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Your naive understanding is supported by at least one deep learning authority:

> I haven’t found a way to properly articulate this yet but somehow everything we do in deep learning is memorization (interpolation, pattern recognition, etc) instead of thinking (extrapolation, induction, etc). I haven’t seen a single compelling example of a neural network that I would say “thinks”, in a very abstract and hard-to-define feeling of what properties that would have and what that would look like.

> All the while I'm thinking: this thinking process this person goes through as he analyzes this data: THAT is what Machine Learning SHOULD do

-- Andrej Karpathy

Deep learning for image recognition works because our visual world is made up of structured hierarchical features: Dark/Light, Texture, Edge, Part of Object, Object, Scene. Deep learning layers create increasingly higher-level features in a computationally feasible way.



So a better name for "deep learning" would be "shallow understanding"?


I personally prefer 'generic hashing/parsing'; deep learning excels at the automatic creation of a mapping of unstructured information to structured information, after a sufficient period of training.


Hmm... but isn't that what our brains do as well? Unstructured intensities of light bouncing off our retinas which becomes a structured recognized object.


It definitely seems to be part of what our brain does. The visual cortex is an apt comparison since that's where a lot of the structural inspiration for modern ANNs comes from. But, there does seem to be a little more than that too; it's not clear whether all the brain does is reducible to a hash function (reducible in any useful sense, at least; a very very very big, very very very sparse hash function, perhaps).


Our brain can understand that a cartoon-picture of a cat is a cat. Also, our brain can understand that a picture of a cat taken from a hugely different angle than seen before is a cat. Deep learning cannot do those kind of tricks.

A related problem is "one shot learning", [1].

[1] https://en.wikipedia.org/wiki/One-shot_learning


Maybe something like subtle memorization?


It's obfuscated memorization though. Otherwise wouldn't youtube search on my "smart" TV yield videos that contain the search term?

I searched for "face detection" and got "face recognition" videos. I felt like a linear model would have been more useful.


What if humans learn through memorisation and pattern recognition instead of thinking?


There's quite probably some of that. A quote from J.S. Mill on the distinction between science and technology strikes me as useful:

"One of the strongest reasons for drawing the line of separation clearly and broadly between science and art is the following:—That the principle of classification in science most conveniently follows the classification of causes, while arts must necessarily be classified according to the classification of the effects, the production of which is their appropriate end."

Essays on some unsettled Questions of Political Economy

http://www.gutenberg.org/ebooks/12004?msg=welcome_stranger#E...

Deep Learning is finding associated effects. It does not find the underlying causes. It is a mode of technical rather than scientific advance.


What are your thoughts on newer recurrent architectures like the DNC (or its predecessor, the neural Turing machine)? While the demonstrated results with DNCs so far are pretty limited, it seems that they embody a push towards allowing a neural network to actually "think" over multiple steps: storing complex information, formulating a plan, and acting on that plan.


Yes. I think these architectures are very exciting and a step in the "right" direction. Eventually we will want to move from rote memorization and pattern matching to more challenging aspects of intelligence.

https://arxiv.org/abs/1601.01705v4 (Learning to Compose Neural Networks for Question Answering) comes close to breaking this barrier.


As much as I dislike calling on the neural net / biological net metaphor, I do think that computer science has made some headway in how "useful codes", in the sense of semantically-meaningful interpolation, can be derived from natural scene stimuli, and therefore the onus that "we do something different" is to some extent now on the neuroscientists to think about and try to prove that "reasoning" in the human sense is anything other than an algebra of latent codes, i.e., linear or non-linear combinations of codified summaries of sensory input.


What do you mean by an "algebra of latent codes"?


I mean being able to combine latent codes through some form of algebra (e.g. linear combinations) and have it retain coherent semantics:

https://github.com/Newmu/dcgan_code/raw/master/images/faces_...


Geoff Hinton refers to thought vectors performing reasoning by analogy using algebra [1] in his Royal Society Lecture.

The other widely reported vector algebras in a semantic space were discovered by Mikolov et al when producing ~300 dimensional vectors for a billion word Wikipedia corpus.

If one performs vector algebra and ~= is near by cosine distance then using Mikolov's Vectors[3].

  King - Man + Woman ~= Queen

  France - Paris + Gernmany ~= Berlin
Surprisingly this works for other modalities, Chintala, Radford & Metz found a latent semantic space in images, that adds vectors for glasses or smiles to peoples faces. [4] With a generative model new images can be created as outlined in this blog post by Soumith [5]

Karpathy shows trained nets can be assembled like lego across modalities, slice off the classifier to reveal the rich semantic 'thought vector' layer of an Imagenet trained Alexnet, plug in a RNN sentence generator using word2vec and ( some over simplification ... ) you get a convincing image captioner [6].

The thought vectors are akin to high level representations of the world and can cross modalities . Text to Images using thought Vectors ( from hnnews discussion [7] )

So the vectors of though are in some way a an AI mentalese or encoding of a symbolic representation of the world derived from the data and can ( again drastic over simplification ) transfer modalities and even between previously unlinked languages [8]

Also see Anything2Vec https://gab41.lab41.org/anything2vec-e99ec0dc186

[1] https://youtu.be/izrG86jycck?t=25m58s

[2] The paper Geoff Hinton is reffering to : Sequence to Sequence Learning with Neural Networks by Ilya Sutskever, Oriol Vinyals, Quoc V. Le https://arxiv.org/abs/1409.3215

[3] Efficient Estimation of Word Representations in Vector Space by Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean https://arxiv.org/abs/1301.3781

[4] Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Alec Radford, Luke Metz, Soumith Chintala https://arxiv.org/abs/1511.06434

[5] https://code.facebook.com/posts/1587249151575490/a-path-to-u...

[6] SF Machine Learning: Automated Image Captioning with ConvNets and Recurrent Nets by Karpathy https://youtu.be/ZkY7fAoaNcg?t=38m31s

[7] https://news.ycombinator.com/item?id=12366684

[8] https://github.com/Babylonpartners/fastText_multilingual




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: