More

viciousplant · on April 3, 2023

Time will tell, and it's a destined trend for more devs close sourcing their code no matter what curious angle you are trying to justify large firms using AI exploiting money, and I doubt you are working for one of them.

viciousplant · on June 8, 2022

people wear different shoes, there is simply no single truth. What's the difference between your gratitude for problems / inspiring thankfulness and this guy's "bitching", when you think your reality is 'the' right one? I wish you all good luck in your rest life stranger.

logical_ferry · on June 8, 2022

My truth is that this guy is bitching. That's it. No point in being nice to everyone about everything.

viciousplant · on Jan 12, 2022

That "Unreliable gradients" part is what I worry about most, why would I use a product when it just "returning incorrect gradients"?

sethaxen · on Jan 12, 2022

I'd be interested to know whether that occurred pre- or post- Zygote.jl starting to use ChainRules.jl internally. Zygote's own rules used to be tested with a half-baked finite differences scheme, and it was easy to miss bugs. All rules in ChainRules are tested using the testing helpers in ChainRulesTestUtils.jl, which are very thorough.

Then, some bugs are not actually bugs. For example, the gradient of some function wrt to elements of a symmetric matrix is and should be different depending on whether the matrix is structurally symmetric or not. Or, when a function is not differentiable in the calculus sense, different ADs often have different conventions for what they will return. These cases come up all the time.

wnoise · on Jan 12, 2022

> For example, the gradient of some function wrt to elements of a symmetric matrix is and should be different depending on whether the matrix is structurally symmetric

Ugh. Yes and no, but mostly no.

A gradient is a collection of partial derivatives. The normal meaning of partial derivative with respect to symmetric matrix elements does not actually exist, because you can't vary them independently. This is actually a generic problem for any overparameterized constrained system. People still want to be able to use the standard tools of calculus though. You can reasonably do things like use Lagrange multipliers to do constrained optimizations. Or you can express a symmetric matrix in terms of only the n(n+1)/2 variables, treat them as a vector and get the gradient over that. (This is often what is done without actually describing the underlying method. This is the so-called "symmetric gradient"). But it's generally not what you actually want, even though it can lead to the same stationary points.

The best you can do is the unconstrained gradient restricted to the manifold of symmetric matrices (or whatever constrained object you are interested in). But even this is a (subtly) different thing than the gradient in the ambient space, even though it works in as large a set of cases as possible. Happily for symmetric differentials applied to symmetric matrices, the naive formulas of symmetrizing the gradient just work.

See, e.g. https://arxiv.org/abs/1911.06491 for some tracking down of the history and cogent analysis of what went wrong and https://saturdaygenfo.github.io/posts/symmetric-gradients/ for why it matters in practice (e.g. wrong and changing direction of descent for gradient descent, though still "downwards").

sethaxen · on Jan 12, 2022

My point is I think simpler than what you're talking about, so let's replace "symmetric" with "diagonal." Suppose I have a structurally diagonal matrix, represented by a type `Diagonal` that can only represent diagonal matrices, and then a non-structurally diagonal matrix, represented by a normal dense matrix filled with zeros except on the diagonal.

I pass this as input to some function that uses more than just the diagonal and compute the gradient (here a matrix). In general, for the dense matrix, the resulting gradient will not be diagonal. Why should it be? If your function doesn't care if the matrix is diagonal, neither does the AD.

But what about for `Diagonal`? Well, the AD could handle this different ways. First, it could interpret all matrices as embedded in the dense, unconstrained matrices, in which case it would return a dense non-diagonal matrix. Or it could ignore that `Diagonal` represents a matrix entirely and instead return some nested structure that contains gradients of its stored fields, here the gradient of a stored diagonal vector. Similarly, it could represent this nested structure as just another `Diagonal`.

There are various reasons why an AD might do any one of these; neither is right or wrong. But note that the only case that gives you the same answer as the dense one is the one that promotes to a dense input, which is basically discarding all structure to begin with.

Most ADs don't ever have to work with matrix polymorphism, so their user bases don't need to think about this, but Julia has _many_ polymorphic matrix types, so Julia ADs have to take this seriously, and if new users naively pass a polymorphic matrix type as input to some `gradient` function, they might be perplexed at what they get back and think the AD engine has a bug. See the ChainRules talk at EuroAD 2021 for more details: https://youtu.be/B3bC49OmTdk?t=761

wnoise · on Jan 13, 2022

Ah, yes. That is a different point than the one I thought you were making.

Both the symmetric and diagonal cases are lucky ones where the constrained derivative is the same shape and size as the original (as the constrained manifolds are flat). Keeping it as a Diagonal in code then just works.

This is in marked contrast to "unit vectors", "orthogonal matrices", or similar non-flat manifolds. The derivatives in these cases although locally are the same dimension as the manifold, usually must be represented as something that spans the entire containing space once you deal with derivatives at multiple points.

sethaxen · on Jan 13, 2022

Yes, precisely. It gets complicated very quickly. Cotangents for structural matrices in reverse-mode AD tend to wander away from the cotangent space at the corresponding point into the tangent space of the embedding for various reasons. Any time they wander away, there's the potential for huge performance losses. We use projections to routinely project them back to the correct cotangent space, but this at best a band-aid, and we could certainly use a better solution.

For the interested, this arises from the combination of implicitly embedding in the forward pass (e.g. multiplying an orthogonal matrix by a vector is implicitly the same as making the matrix dense first and then multiplying), where you would then need some sort of projection in the reverse pass. If your AD operates on the scalar level, you get this for free. But such ADs don't make full use of optimized BLAS routines. You can get that by writing custom AD rules, but then in the reverse pass you usually end up in some embedded cotangent space and need a projection to set things right. There are definitely trade-offs.

mcabbott · on Jan 13, 2022

Thanks for the references. We're trying to write up a paper about all this work, and (to us) it seems obvious that differential geometry is the right framework.

What reverse mode AD talks informally about as "gradients" are cotangents, and if you are constrained to a submanifold of another manifold (such as the space of all symmetric matrices, embedded in the space of all NxN matrices) then your cotangent is a projection of that in the larger space. This reduces to the obvious thing for symmetric matrices, as you say, but there are less obvious AD-relevant cases (like the space of orthogonal matrices, or the space of "ranges", vectors with uniformly spaced elements) where getting it right with your bare hands is tricky.

I don't see any of the relevant terminology in these links, but they do seem to be thinking about related problems. Perhaps they have re-invented the wheel? Amused to read that standard cookbooks seem to have faithfully reproduced the recipe for a square wheel, for decades.

viciousplant · on Sept 27, 2021

Have you ever thought how and why the “it’s not my job” attitude fostered? Most people at least myself on their first day working would never thought they would end up at some point as people they despised so much.

edmundsauto · on Sept 27, 2021

I have had a similar career experience. In time, I’ve decided to accept that I’m much more like those people than I wanted to accept when I felt I was a young and hungry “disruptor”

I am much more a product of the systems than I wanted to accept. That’s the most mature perspective I have had to date - and has led to more empathy than I thought I was capable of.

toxik · on Sept 27, 2021

The reason you didn’t like those people is that somewhere deep inside you knew it could be you in ten years. I had the same thoughts, and came to the same insight.

It’s a lot like how you dislike the people who remind you of your bad qualities.

bumby · on Sept 27, 2021

Like most problems I think there’s the actual issue at hand and also the feelings that fosters. Feeling under appreciated is the #1 complaint amongst employees, I believe, and even out-ranks salary. There’s probably a balance. Sometimes the employer isn’t appreciative and sometimes the employee overly inflates their contribution.

viciousplant · on July 6, 2021

kudos for this.

exikyut · on July 6, 2021

> Maybe you're a person who works to live? If so, embrace it.

> Those who enjoy their work are often the opposite: they live to work.

Huh. Squinting at this from exactly the right angle, that can kinda read the very idea of living to work is paradoxically catastrophic, at once fulfilling and shallow and existentially meaningless. As though work-life balance were interpreted backwards, or something.

I've obviously been looking at "do what you love" positively for a while now, but now I really wonder... maybe working to live, and enjoying what you do just a little bit, but not completely, is the ideal balance.

Hmm. Now I realize that not being entirely consumed by work and finding identity in that loss of self also has the benefit of making emotional and mental job flexibility easier - and in tech, that's a necessity.

viciousplant · on Sept 13, 2017

highlight: Windows Subsystem for Linux (WSL)

jb1991 · on Sept 13, 2017

WSL is only one option. As noted in another comment here:

The guide is split into separate sections for clang (cross-compiling), MSVC, clang-cl, and WSL. Compiled swift binaries can be native windows executables when using the non-WSL compilers.

viciousplant · on July 2, 2015

"Error establishing a database connection"

But I would say it's because others are easy.

Yes others are hell lot easier.

viciousplant · on July 29, 2014

transform logic to sound? hmm interesting idea

andrewchambers · on July 29, 2014

memory allocations to sound.

viciousplant · on July 15, 2014

Can you share the technology & methods sourcegraph use for syntax/semantic analysis?

If there is a brief comparison of tools you have tried and abandoned it would be really really helpful.

The accuracy of whatever working behind the scene is simply amazing.

sqs · on July 15, 2014

Yeah, it's almost all open source now and we're releasing the last bits (plus the coordination code) between now and next week.

We will post another article when it's ready next week. Email me at sqs at sourcegraph.com in the meantime if you are too curious to wait. :)

viciousplant · on April 1, 2014

don't you see the purpose is purely the anime?

jzelinskie · on April 1, 2014

I'm a pretty big fan of anime, but I still don't see the point in randomly sprinkling internet memes or other random television screenshots in blog posts. At first, I thought it might be a refreshing change of pace, but now I just feel like it detracts from the article.