Edward – A Turing-Complete Language for Deep Probabilistic Programming

KasianFranks · on Oct 19, 2017

Speaking of which, check out Michael I. Jordans work on Probabilistic Graphical Models https://www.google.com/search?q=michael+i+jordan+probalistic...

Mentor to Andrew Ng, former head of Google AI, Baidu and a few other things. https://en.wikipedia.org/wiki/Michael_I._Jordan

Saira, Mina and David worked on some interesting stuff related to using ML/AI in extending human life span, nematodes a while back. Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to life span - Blei DM, Franks K, Jordan MI, Mian IS. - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1533868

kuwze · on Oct 19, 2017

There is also ZhuSuan which also leverages TensorFlow.

https://arxiv.org/abs/1709.05870v1

https://github.com/thu-ml/zhusuan

https://i.imgur.com/gzfhS29.png

carbocation · on Oct 19, 2017

The software library is located here: http://edwardlib.org/ . Notably, Edward is layered on TensorFlow.

Regarding the significance of the authors, David Blei first described latent Dirichlet allocation (LDA), an important algorithm for generative topic modeling, in ~2003. Interestingly, the last I checked, LDA couldn't be done in Edward (yet).

shmageggy · on Oct 19, 2017

I also briefly tried it out, being drawn to the claim of Turing completeness, but I wasn't able to get inference working over any model with interesting control flow (e.g. loops). It seemed to have about the same expressive power of PyMC3, albeit running over Tensorflow which seemed neat. It would be very cool to see something with the expressive power of, say, Church running on tf.

eli_gottlieb · on Oct 19, 2017

In complete sincerity, I think that speeding up Turing-complete probabilistic programming to the kinds of inference speed we can get in the gradient-descent training of deep neural networks would be a "change the world"-level advance for ML/AI.

RJTrolo · on Oct 20, 2017

We already have that - variational inference based algorithms like BBVI use gradient descent for training

eli_gottlieb · on Oct 20, 2017

Variational inference also only works for continuous probability models, so it can't be used for most interesting use-cases of probabilistic programming.

danking00 · on Oct 19, 2017

Why do you want Turing completeness in your probabilistic modelling language? This seems like a domain where you can specify a lot of useful work with bounded loops and other sub-TC tools.

eli_gottlieb · on Oct 19, 2017

The probability of \Bot is 0 because sampling \Bot requires returning from computational digression. That said, there's all kinds of interesting control flow we can describe in a program, knowing it will return a sample, without having any convenient way to prove to a termination-checker that it will.

hyperbovine · on Oct 20, 2017

Fun fact, LDA was actually first described by three geneticists in 2000:

http://www.genetics.org/content/155/2/945

carbocation · on Oct 20, 2017

Yes! Pritchard remains extremely well known for this and subsequent work. If only they had given it a catchy title ;-)

sjg007 · on Oct 20, 2017

Kevin Murphy wrote the Bayes Net Toolbox which got me started in the area.

Anyway, this paper is really neat. As far as I can tell, it's a big step towards linking theories in Bayesian networks and neural networks.

jmh530 · on Oct 19, 2017

I'll have to read the paper to see what makes it "deep"...

A cursory skim suggests that it is much faster than Stan, but I suppose the more significant question is if it provides the correct results. Stan might take longer, but I'm usually pretty confident that with some simple diagnostics I can see whether the results are what I really need.

jmh530 · on Oct 19, 2017

One thing that looks cool is the tutorial for probabalistic PCA. That is a b of a thing to do in Stan. It really only works under some very limited conditions. Edward has this ability to combine in a KL divergence minimization in there. Not exactly sure how it works. I should look into it more. I don't really have a good sense of it just from reading the paper and a tutorial or two.

groceryheist · on Oct 20, 2017

As someone who just implemented hierarchical probabilistic PCA in stan, I agree that it takes finesse, but it is no means impossible. Doing this sort of work efficiently in stan seems to require a some degree of understanding about how the sampler works. It also may require really thinking through your model. It saves you from deriving your own conditional distributions and writing a gibbs sampler, but you're going to have to do some analysis if you want to fit models of certain complexity.

KL divergence minimization (variational inference) is typically a weak approximation to the model you specified. I have seen it produce inferences on simulated which are just plain wrong. These "wrong" models are still often good predictors, so whether variational inference will work well for you depends on whether you care about making valid inferences or just doing prediction.

jmh530 · on Oct 20, 2017

I would be very interested in seeing how you implemented the hierarchical PPCA.

My problem was that I couldn't identify the coefficients. So for instance, the first principal component could be [x, x, x, ...] or [-x, -x, -x, ...] and the result would be some bimodal distribution. So if you placed restrictions on the first PC it would work (like only positive), but those restrictions may not make sense for the next PCs.

groceryheist · on Oct 20, 2017

Yes, multimodality is often a problem for mcmc clustering or dimensionality reduction. However, if you use the SVD method to estimate PCA you only have a bimodal distribution since SVD is identified up to the sign. Asymmetric initialization is usually enough to solve the problem.

This thread has some good examples of PCA implementations in STAN. https://groups.google.com/forum/#!topic/stan-users/5R2-QUDiy...

flor1s · on Oct 20, 2017

A nice beginner friendly book about Probabilistic Programming is the book by Avi Pfeffer: "Practical Probabilistic Programming" (published by Manning). The only downside of the book is that it used Pfeffer's own Scala library called Figaro, which does not seem to get as much attention as projects such as Stan and Edward.

frabcus · on Oct 19, 2017

Anyone recommend any good resources for learning to use Edward?

The tutorials on the main site are good? http://edwardlib.org/tutorials/

AlexCoventry · on Oct 19, 2017

Yes. I would start there.

hardbyte · on Oct 20, 2017

There is another Tensorflow bayesian programming library called Aboleth - https://github.com/data61/aboleth

foxfired · on Oct 19, 2017

#not related but:

Maybe nobody else cares, but the name does matter. Edward, Stan, Cassandra. Have we run out of computer (or programming) sounding names?

This is Computatrum Antropomorphicus.

matt4077 · on Oct 19, 2017

I don't even know what a "computer-sounding name" is. C64 and "International Business Machines"? In that case, Amiga and Apple came next, and you must have been suffering since. (Gooogol? Yahoo!)

FWIW Edward is named for https://en.wikipedia.org/wiki/George_E._P._Box, so they're not actually thinking as far outside the box as one might think.

In general, people are too paranoid about naming. It's one of those topics where nobody actually has a problem with a suggested name, but everyone fears others might. That's how you end up with Alexion, Allegion, Alliant, Altria, Ameren, and other names that probably cause every new employee to suffer a midlife crisis.

The best names have always been evocative, i. e. telling a story. And it's actually helpful if that story isn't just easy and happy. That's how "Plan B" works, "Virgin", or HN's perennial favourite: "CockroachDB"

yen223 · on Oct 19, 2017

This is bikeshedding at its finest

justinjlynn · on Oct 19, 2017

Well, I think we should call the nuclear reactor complex "George gorge" and that it should be painted hot pink. Oh, and by the way backup safety seal 1a9-562 needs to have it ts annular tolerance reduced by 0.5mm at 230C or there may be a 5:1 exponential increase in failure probability over 10 year replacement lifetimes in class two failure scenarios.

visarga · on Oct 20, 2017

I know people who got stuck at picking a name and gave up writing the program. What can you do, when there is no name that makes you happy?

goatlover · on Oct 19, 2017

Ruby, Perl, Python, Java? The days of Lisp, Cobol and Fortran are long gone for naming. Even Smalltalk wasn't computer sounding. Basic, maybe?

patorjk · on Oct 19, 2017

I agree that a name matters, though I disagree about needing to sound a certain way. A name is a first impression and a small form of marketing. At it's best, a name should try and say something about what it's representing. However, at the end of the day, it is just a label. Using a common given name isn't terrible or bad, it just seems like a wasted opportunity.

anigbrowl · on Oct 19, 2017

Edward literally means treasure-guardian, and so would seem more suitable for security tools. Anthropomorphic names don't bother me, but they're not as fun as bombastic ones like Ultron or Galactor (hint hint).

gigatexal · on Oct 19, 2017

I like the syntax