Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sourcegraph CEO here. Thanks for this post. We packed a lot of stuff into 2.4: faster, more powerful code search, Google Alerts-style search monitoring, diff searches, and more.

It’s now free on a single server for any number of users and repositories.

Happy to answer questions here.



What benefits does it have over git grep, esp. if I'm using a monorepo? What new patterns/possibilities does it enable? Is it maybe speed somehow? (I assume it could then be more "live search/exploration"/rapid exploration than git grep - but OTOH wouldn't it require some slow reindexing after each change?)


For code search on a monorepo, Sourcegraph is often faster in UX and execution time for a lot of tasks. It's easier/faster to filter the results than `git grep`, you can see more on your screen, it's easier to jump to the full file, it's easier to see blame info for particular lines, etc.

Sometimes while coding you just need to find where something is so you can edit it or jump to it. In that case, your editor's search or `git grep` is definitely better. But when you're looking for example code, reviewing/reading code, or debugging code, it's often better to do it in a UI that's more optimized for those tasks than `git grep` and your editor.

And then Sourcegraph also has code intelligence, code host browser extension integrations, saved queries, etc., beyond the basic code search.

Google has a massive monorepo, and they have a similarly advanced code search system that they describe publicly. It's very well loved and frequently used by their developers. If you know any ex-Googlers (or ex-Facebookers, who have a similar system), ask them, and check out https://static.googleusercontent.com/media/research.google.c... and https://docs.google.com/document/d/1LQxLk4E3lrb3fIsVKlANu_pU....

BTW, Sourcegraph doesn't use an index for search. We heavily optimized the performance of searching an arbitrary revision that has never been indexed. So no slow reindexing after each change.


Just tried this out on some of our code bases. It appears to fail to generate snippets/highlights for all files that contain non-Unicode text, e.g. "Müller" in ISO-8859-1. Known issue?

Many queries times out for me although I'm running it on a pretty beefy AWS c5 instance with SSDs. Queries such as "type:diff" doesn't seem to work at all on my code bases. It also does not appear to cache any data from previous runs of "git log", so attempting to do the suggested reload doesn't really workaroudn the issue. Are you working on improving the performance?


Sorry to hear that it didn't work out of the box for your repositories. I've filed the non-Unicode text issue here: https://github.com/sourcegraph/issues/issues/32

We're actively working on improving the performance of diff search, but I would expect other types of queries to complete quickly. Would you mind sharing more about the size / characteristics of your repositories? Feel free to email me at beyang@sourcegraph.com if a private channel is better.


Thanks, I'll keep an eye on this.

Three examples: 300MB size, 600 branches, 25k commits. 250MB, 250 branches, 15k commits. 80MB, 100 branches, 4k commits. Textwise it is a mix of Golang, JS and Python. Most of the repo size comes from binary resources (images etc).


> Google Alerts-style search monitoring

I expected email notifications but all I managed to do was to add the saved queries on the home page. Am I missing something?


Email and other kinds of notifications for saved queries are coming in the next release in early Feb. Email me (sqs@sourcegraph.com) if you'd like to preview them sooner. I agree they are crucial for this feature to feel truly complete and awesome.

The homepage does show a nice sparkline and results summary, though. Easy to see at a glance if new secret keys, deps, etc., are added to your repositories if you set up the queries.


For those of us associated with very many repos is there a way to limit cloning to only the ones we work with?


What's the analytics/telemetry feature? I don't see it discussed on the site at all.


Analytics lets you see statistics about how your own server's are using it (each user's total count of pageviews and searches).

Telemetry lets you see the telemetry data it sends to Sourcegraph (which you can disable in the site config and never contains code/paths/repo names or anything derived from them).


Can this be run without docker, somehow? It's nigh-unusable on OS X with it.


It requires Docker for now. A lot of people use it on macOS successfully. What are you seeing?


Installed it, configured it, pointed it at the public cpython repo, it said 'cloning' and then pegged cpu for about 15 minutes (with about a third to a half of the usage in 'system') and eventually kernel-panicked.


Sorry about that. What version of macOS and Docker for Mac? We’ll look into that now.


OS 10.13.3 (17D34a) and Docker 17.12.0-ce-mac46 (21698). I don't think it's you, Docker is just still a bit flaky there, which is why I'm asking. I'll try it with something smaller.


Thanks. Theoretically it'd be possible for us to ship a static Go binary for everything except for our syntax highlighter and the (very convenient) bundled PostgreSQL and Redis. We'll monitor the feedback we get and see how to prioritize this. It definitely helps to hear that Docker did not work well for you.


I've heard that docker does have a (inconsistent) CPU usage issue on OS X. I would definitely pin it on docker.


Docker for Mac has known issues with file system. You may want to try docker-sync for the volumes.


Do you have a non-docker installation? The equivalent of gitlab's omnibus would be awesome.

Particularly for things like postgres configuration etc.


We'll definitely consider it. What benefits would you get from having non-Docker? (Not saying there aren't any, just curious what your biggest needs are.)

Re: PostgreSQL configuration, is it that you want to be able to manage and back up the data yourself (not using the Docker container's internal PostgreSQL), or is it a tuning/performance concern, or something else?

BTW, someone else down-thread asked for this, too (https://news.ycombinator.com/item?id=16121182).


unrelated question to the server but related to Sourcegraph: Why did you guys switch away from the VS code style editor on the web to an uneditable one? I loved using it.

on an unrelated note: did you work on AWS SQS?


Do you have any publicly available security review documents?


Our security page is at https://about.sourcegraph.com/security. We have a security assessment that we can share with customers, but not one that we post publicly yet.

We have customers who run Sourcegraph on machines that are completely blocked off from the Internet and only have access to the specific IP ranges of their code hosts on the same network. You can set it up like that if you'd like, which would significantly reduce the risks without needing to trust any third parties (us or the security reviewer).


Do you support mercurial? If not, is it planned?


Sourcegraph only supports Git natively, not Mercurial. You could use Sourcegraph with Git mirrors of your Mercurial repository, if that is appealing, and we'd definitely consider adding in some extra translation work so that the Mercurial metadata embedded in the Git mirror repository would be respected. Does your code host (or do you internally) already have a Git mirror of your repository?


I wonder if the work done for git-cinnabar, which maps a Mercurial repository to something git understands, could help at all here?

https://github.com/glandium/git-cinnabar


Do you accept REMOTE jobs?


Yes, we have some fantastic international and non-SF-based teammates at Sourcegraph, and we'd love to have more.


non US?


Yes. I was wrong in my previous answer - after sqs's response, I looked them up on linkedin, they have at least a german developer in Berlin.

(FWIW - and I'm only saying this in case you were in a similar situation: I applied to them for a job not because I was looking for one, but because I accidentally saw it on HN and the match between my skills and their apparent need was simply "too good to be true" territory. I wasn't necessarily expecting an offer, but I expected to talk to someone - was curious to learn more about what they're doing. However, I got rejected straight away - so I just assumed that they said "REMOTE" for the heck of it... I know it sounds arrogant, but I have a hell of a hard time believing their other applications outclass me so obviously that it was not even worth talking to me, so I assumed it must be something else)


what's the underlying search engine?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: