Saira, Mina and David worked on some interesting stuff related to using ML/AI in extending human life span, nematodes a while back. Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to life span - Blei DM, Franks K, Jordan MI, Mian IS. - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1533868
The software library is located here: http://edwardlib.org/ . Notably, Edward is layered on TensorFlow.
Regarding the significance of the authors, David Blei first described latent Dirichlet allocation (LDA), an important algorithm for generative topic modeling, in ~2003. Interestingly, the last I checked, LDA couldn't be done in Edward (yet).
I also briefly tried it out, being drawn to the claim of Turing completeness, but I wasn't able to get inference working over any model with interesting control flow (e.g. loops). It seemed to have about the same expressive power of PyMC3, albeit running over Tensorflow which seemed neat. It would be very cool to see something with the expressive power of, say, Church running on tf.
In complete sincerity, I think that speeding up Turing-complete probabilistic programming to the kinds of inference speed we can get in the gradient-descent training of deep neural networks would be a "change the world"-level advance for ML/AI.
Variational inference also only works for continuous probability models, so it can't be used for most interesting use-cases of probabilistic programming.
Why do you want Turing completeness in your probabilistic modelling language? This seems like a domain where you can specify a lot of useful work with bounded loops and other sub-TC tools.
The probability of \Bot is 0 because sampling \Bot requires returning from computational digression. That said, there's all kinds of interesting control flow we can describe in a program, knowing it will return a sample, without having any convenient way to prove to a termination-checker that it will.
I'll have to read the paper to see what makes it "deep"...
A cursory skim suggests that it is much faster than Stan, but I suppose the more significant question is if it provides the correct results. Stan might take longer, but I'm usually pretty confident that with some simple diagnostics I can see whether the results are what I really need.
One thing that looks cool is the tutorial for probabalistic PCA. That is a b of a thing to do in Stan. It really only works under some very limited conditions. Edward has this ability to combine in a KL divergence minimization in there. Not exactly sure how it works. I should look into it more. I don't really have a good sense of it just from reading the paper and a tutorial or two.
As someone who just implemented hierarchical probabilistic PCA in stan, I agree that it takes finesse, but it is no means impossible. Doing this sort of work efficiently in stan seems to require a some degree of understanding about how the sampler works. It also may require really thinking through your model. It saves you from deriving your own conditional distributions and writing a gibbs sampler, but you're going to have to do some analysis if you want to fit models of certain complexity.
KL divergence minimization (variational inference) is typically a weak approximation to the model you specified. I have seen it produce inferences on simulated which are just plain wrong. These "wrong" models are still often good predictors, so whether variational inference will work well for you depends on whether you care about making valid inferences or just doing prediction.
I would be very interested in seeing how you implemented the hierarchical PPCA.
My problem was that I couldn't identify the coefficients. So for instance, the first principal component could be [x, x, x, ...] or [-x, -x, -x, ...] and the result would be some bimodal distribution. So if you placed restrictions on the first PC it would work (like only positive), but those restrictions may not make sense for the next PCs.
Yes, multimodality is often a problem for mcmc clustering or dimensionality reduction. However, if you use the SVD method to estimate PCA you only have a bimodal distribution since SVD is identified up to the sign. Asymmetric initialization is usually enough to solve the problem.
A nice beginner friendly book about Probabilistic Programming is the book by Avi Pfeffer: "Practical Probabilistic Programming" (published by Manning). The only downside of the book is that it used Pfeffer's own Scala library called Figaro, which does not seem to get as much attention as projects such as Stan and Edward.
I don't even know what a "computer-sounding name" is. C64 and "International Business Machines"? In that case, Amiga and Apple came next, and you must have been suffering since. (Gooogol? Yahoo!)
In general, people are too paranoid about naming. It's one of those topics where nobody actually has a problem with a suggested name, but everyone fears others might. That's how you end up with Alexion, Allegion, Alliant, Altria, Ameren, and other names that probably cause every new employee to suffer a midlife crisis.
The best names have always been evocative, i. e. telling a story. And it's actually helpful if that story isn't just easy and happy. That's how "Plan B" works, "Virgin", or HN's perennial favourite: "CockroachDB"
Well, I think we should call the nuclear reactor complex "George gorge" and that it should be painted hot pink. Oh, and by the way backup safety seal 1a9-562 needs to have it ts annular tolerance reduced by 0.5mm at 230C or there may be a 5:1 exponential increase in failure probability over 10 year replacement lifetimes in class two failure scenarios.
I agree that a name matters, though I disagree about needing to sound a certain way. A name is a first impression and a small form of marketing. At it's best, a name should try and say something about what it's representing. However, at the end of the day, it is just a label. Using a common given name isn't terrible or bad, it just seems like a wasted opportunity.
Edward literally means treasure-guardian, and so would seem more suitable for security tools. Anthropomorphic names don't bother me, but they're not as fun as bombastic ones like Ultron or Galactor (hint hint).
Mentor to Andrew Ng, former head of Google AI, Baidu and a few other things. https://en.wikipedia.org/wiki/Michael_I._Jordan
Saira, Mina and David worked on some interesting stuff related to using ML/AI in extending human life span, nematodes a while back. Statistical modeling of biomedical corpora: mining the Caenorhabditis Genetic Center Bibliography for genes related to life span - Blei DM, Franks K, Jordan MI, Mian IS. - http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1533868