Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Edward – A Turing-Complete Language for Deep Probabilistic Programming (arxiv.org)
206 points by xtacy on Oct 19, 2017 | hide | past | favorite | 30 comments


Speaking of which, check out Michael I. Jordans work on Probabilistic Graphical Models https://www.google.com/search?q=michael+i+jordan+probalistic...

Mentor to Andrew Ng, former head of Google AI, Baidu and a few other things. https://en.wikipedia.org/wiki/Michael_I._Jordan

Saira, Mina and David worked on some interesting stuff related to using ML/AI in extending human life span, nematodes a while back. Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to life span - Blei DM, Franks K, Jordan MI, Mian IS. - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1533868



The software library is located here: http://edwardlib.org/ . Notably, Edward is layered on TensorFlow.

Regarding the significance of the authors, David Blei first described latent Dirichlet allocation (LDA), an important algorithm for generative topic modeling, in ~2003. Interestingly, the last I checked, LDA couldn't be done in Edward (yet).


I also briefly tried it out, being drawn to the claim of Turing completeness, but I wasn't able to get inference working over any model with interesting control flow (e.g. loops). It seemed to have about the same expressive power of PyMC3, albeit running over Tensorflow which seemed neat. It would be very cool to see something with the expressive power of, say, Church running on tf.


In complete sincerity, I think that speeding up Turing-complete probabilistic programming to the kinds of inference speed we can get in the gradient-descent training of deep neural networks would be a "change the world"-level advance for ML/AI.


We already have that - variational inference based algorithms like BBVI use gradient descent for training


Variational inference also only works for continuous probability models, so it can't be used for most interesting use-cases of probabilistic programming.


Why do you want Turing completeness in your probabilistic modelling language? This seems like a domain where you can specify a lot of useful work with bounded loops and other sub-TC tools.


The probability of \Bot is 0 because sampling \Bot requires returning from computational digression. That said, there's all kinds of interesting control flow we can describe in a program, knowing it will return a sample, without having any convenient way to prove to a termination-checker that it will.


Fun fact, LDA was actually first described by three geneticists in 2000:

http://www.genetics.org/content/155/2/945


Yes! Pritchard remains extremely well known for this and subsequent work. If only they had given it a catchy title ;-)


Kevin Murphy wrote the Bayes Net Toolbox which got me started in the area.

Anyway, this paper is really neat. As far as I can tell, it's a big step towards linking theories in Bayesian networks and neural networks.


I'll have to read the paper to see what makes it "deep"...

A cursory skim suggests that it is much faster than Stan, but I suppose the more significant question is if it provides the correct results. Stan might take longer, but I'm usually pretty confident that with some simple diagnostics I can see whether the results are what I really need.


One thing that looks cool is the tutorial for probabalistic PCA. That is a b of a thing to do in Stan. It really only works under some very limited conditions. Edward has this ability to combine in a KL divergence minimization in there. Not exactly sure how it works. I should look into it more. I don't really have a good sense of it just from reading the paper and a tutorial or two.


As someone who just implemented hierarchical probabilistic PCA in stan, I agree that it takes finesse, but it is no means impossible. Doing this sort of work efficiently in stan seems to require a some degree of understanding about how the sampler works. It also may require really thinking through your model. It saves you from deriving your own conditional distributions and writing a gibbs sampler, but you're going to have to do some analysis if you want to fit models of certain complexity.

KL divergence minimization (variational inference) is typically a weak approximation to the model you specified. I have seen it produce inferences on simulated which are just plain wrong. These "wrong" models are still often good predictors, so whether variational inference will work well for you depends on whether you care about making valid inferences or just doing prediction.


I would be very interested in seeing how you implemented the hierarchical PPCA.

My problem was that I couldn't identify the coefficients. So for instance, the first principal component could be [x, x, x, ...] or [-x, -x, -x, ...] and the result would be some bimodal distribution. So if you placed restrictions on the first PC it would work (like only positive), but those restrictions may not make sense for the next PCs.


Yes, multimodality is often a problem for mcmc clustering or dimensionality reduction. However, if you use the SVD method to estimate PCA you only have a bimodal distribution since SVD is identified up to the sign. Asymmetric initialization is usually enough to solve the problem.

This thread has some good examples of PCA implementations in STAN. https://groups.google.com/forum/#!topic/stan-users/5R2-QUDiy...


A nice beginner friendly book about Probabilistic Programming is the book by Avi Pfeffer: "Practical Probabilistic Programming" (published by Manning). The only downside of the book is that it used Pfeffer's own Scala library called Figaro, which does not seem to get as much attention as projects such as Stan and Edward.


Anyone recommend any good resources for learning to use Edward?

The tutorials on the main site are good? http://edwardlib.org/tutorials/


Yes. I would start there.


There is another Tensorflow bayesian programming library called Aboleth - https://github.com/data61/aboleth


#not related but:

Maybe nobody else cares, but the name does matter. Edward, Stan, Cassandra. Have we run out of computer (or programming) sounding names?

This is Computatrum Antropomorphicus.


I don't even know what a "computer-sounding name" is. C64 and "International Business Machines"? In that case, Amiga and Apple came next, and you must have been suffering since. (Gooogol? Yahoo!)

FWIW Edward is named for https://en.wikipedia.org/wiki/George_E._P._Box, so they're not actually thinking as far outside the box as one might think.

In general, people are too paranoid about naming. It's one of those topics where nobody actually has a problem with a suggested name, but everyone fears others might. That's how you end up with Alexion, Allegion, Alliant, Altria, Ameren, and other names that probably cause every new employee to suffer a midlife crisis.

The best names have always been evocative, i. e. telling a story. And it's actually helpful if that story isn't just easy and happy. That's how "Plan B" works, "Virgin", or HN's perennial favourite: "CockroachDB"


This is bikeshedding at its finest


Well, I think we should call the nuclear reactor complex "George gorge" and that it should be painted hot pink. Oh, and by the way backup safety seal 1a9-562 needs to have it ts annular tolerance reduced by 0.5mm at 230C or there may be a 5:1 exponential increase in failure probability over 10 year replacement lifetimes in class two failure scenarios.


I know people who got stuck at picking a name and gave up writing the program. What can you do, when there is no name that makes you happy?


Ruby, Perl, Python, Java? The days of Lisp, Cobol and Fortran are long gone for naming. Even Smalltalk wasn't computer sounding. Basic, maybe?


I agree that a name matters, though I disagree about needing to sound a certain way. A name is a first impression and a small form of marketing. At it's best, a name should try and say something about what it's representing. However, at the end of the day, it is just a label. Using a common given name isn't terrible or bad, it just seems like a wasted opportunity.


Edward literally means treasure-guardian, and so would seem more suitable for security tools. Anthropomorphic names don't bother me, but they're not as fun as bombastic ones like Ultron or Galactor (hint hint).


I like the syntax




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: