Speed Up Git Pull

lsb · on Sept 5, 2013

A 50x speedup is pretty cool in its own right. Kudos.

However, I wonder if this isn't treating a symptom versus a root cause.

Is saving that 5s round-trip so common in your workflow that you needed to optimize it, and would it be more productive to refactor the app so you and collaborators are working on different files?

Also, this has the entirely valuable guidance that pushing to a local server is much faster than a remote server. There's a Github Enterprise product, for you to run close to you. It'd be an interesting calculation to see the performance hit from waiting 5s to push to a remote server, versus the performance hit from keeping your nearby server up and patched.

But nice to read, kudos all the same!

robin_reala · on Sept 5, 2013

http://xkcd.com/1205/

If you pull 10 times a day (conservative for me) then you’ll save ~5 hours over a year.

gcr · on Sept 5, 2013

But couldn't it very well take more than 5 hours to set this up?

eru · on Sept 6, 2013

Doesn't matter too much. A short wait can throw you out of focus, and it's the time you spend getting back into focus that's important.

DougWebb · on Sept 5, 2013

More like 5 minutes to set it up.

Cthulhu_ · on Sept 5, 2013

It is when you need to dig into the documentation to figure all of this out, but if you now google 'speed up git pull', you'll find this article, repeat the commands, ?????, profit! five minutes.

nilved · on Sept 6, 2013

It took me less than five seconds to make the SSH changes and less than five minutes to set up the intermediate server. Now I also have a cache for when GitHub goes down (all the time.)

gcr · on Sept 6, 2013

> Now I also have a cache for when GitHub goes down (all the time.)

Isn't the `.git` a cache for when github goes down? Git keeps all of your history inside your repository; you don't need network access to do anything.

nilved · on Sept 6, 2013

Sort of. I've automated synchronizing it with the GitHub repo, so my local repo may be behind the cached repo. I'm also protected if a repository is deleted/moved/DMCAed.

marblar · on Sept 6, 2013

What benefits does this provide that a cron job running `git fetch` doesn't?

nilved · on Sept 6, 2013

Redundancy, public access and not having hundreds of unused repositories on my computer. I'm not trying to sell anyone on the idea, it just works for me and I like it so whatever.

pugz · on Sept 6, 2013

How would refactoring help? You still need to fetch before you can push to git.

shizcakes · on Sept 5, 2013

If you're a heavy SSH user, using multiplexing in this manner can have negative consequences [1]. Downsides include having all your multiplexed connections exiting if the master exits!

[1] http://www.anchor.com.au/blog/2010/02/ssh-controlmaster-the-...

andrewaylett · on Sept 5, 2013

More recent SSH clients can use "ControlPersist" to establish the master connection in the background, so the first session doesn't control the lifetime of the connection. This makes using ControlMaster workable.

I usually set ControlPersist to 30 seconds, which may not be long enough for people hoping to get performance improvements from GitHub, . Setting it to too large a value increases the risk that you'll have stale server sockets after a network outage.

craigyk · on Sept 5, 2013

> This makes using ControlMaster workable.

Best part of reading this article. I had turned off connection sharing because of this.

So what, in more details, are the downsides to ControlPersist?

georgebashi · on Sept 5, 2013

One that I frequently run into is that if you use SSH tunneling (like -L), you have to specify it the first time you ssh to that machine (i.e. when the ControlMaster is connected) and can't change it later. Using -L on later ssh's to the same machine silently fail, which can be infuriating if you don't realise it's happening. The best you can do at that point is to kill the ControlMaster ssh (disconnecting you across all your sessions), and then reconnecting with the right -L.

croikle · on Sept 6, 2013

You can skip the master and spawn a fresh connection for your tunnel using `-o ControlPath=none`.

croikle · on Sept 7, 2013

In fact, even better: you can add forwarding to your existing connection. <newline>~C opens a command line, which accepts the following commands:

    ssh> help
    Commands:
      -L[bind_address:]port:host:hostport    Request local forward
      -R[bind_address:]port:host:hostport    Request remote forward
      -D[bind_address:]port                  Request dynamic forward
      -KR[bind_address:]port                 Cancel remote forward

(If you're not familiar with them, some of the other escape sequences are useful too. ~? lists them all.)

[EDIT] Apparently, if you have a recent enough version, you can add a forward to the master with `ssh -O forward ...` [1]

[1] http://serverfault.com/questions/237688/adding-port-forwardi...

leokun · on Sept 5, 2013

You can limit the sharing to just GithHub with a host line:

  Host github.com
  ControlMaster auto
  ControlPath /tmp/%r@%h:%p
  ControlPersist yes

dminor · on Sept 5, 2013

> Downsides include having all your multiplexed connections exiting if the master exits!

I believe this is what ControlPersist is meant to solve - it may not have existed when that blog post was written.

CoffeeDregs · on Sept 5, 2013

Meta question: assuming that lots of GH users do this (nice trick), would GH have loads of dormant SSH connections? At scale, this could be a huge number. Would this be an issue?

ceejayoz · on Sept 5, 2013

No. I have this enabled and Github closes my connections after a very short period. I use it primarily for SSHing into my cluster of EC2 instances (which does massively speed things up).

susi22 · on Sept 5, 2013

Same here. I do a

    (ssh -fqN -o "StrictHostKeyChecking no" git@bitbucket.org >&/dev/null &)

with bitbucket and github in my zshrc and bitbucket stays open but github is getting closed at some point. It used to stay open however.

eru · on Sept 6, 2013

They could offer it as a premium feature.

theepauk · on Sept 5, 2013

Hrm, so this isn't about making git 50x faster but fast network communication.

> Establishing an SSH connection every time you perform a Git operation costs many round-trips

I don't really understand why the author is saying this. The whole point of git is to be distributed and not to push/pull at each commit.

That being said, he found something that speeds up his workflow tremendously, so congratulations.

dlitz · on Sept 6, 2013

Even with a short timeout (say, 10 seconds), it could be useful for a maintainer who pulls from several other users' GitHub repositories. Instead of establishing a new SSH connection for each remote, a complete "git remote update" of multiple repositories could be done over a single connection.

eru · on Sept 6, 2013

Oh, you might still want to share your commits with your coworkers, and for that you need to push and/or pull. (You could do that asynchronously, though.)

dminor · on Sept 5, 2013

Note that if you are on centos 6, the openssh version isn't new enough to support this feature.

er0k · on Sept 5, 2013

this is not true

dminor · on Sept 5, 2013

Yes it is. ControlPersist was introduced in openssh 5.6.

Centos 6 ships with a patched version of 5.3.

jlgreco · on Sept 5, 2013

Both ssh and sshd I guess? Do both ends need to support the feature?

er0k · on Sept 5, 2013

ah ok. ControlMaster and ControlPath are both supported though.

scotty79 · on Sept 5, 2013

...and keep your development copy of the project on ramdisk. Have a script that you launch as you start work that makes the ramsdisk and puts files from persistent location there using rsync, then periodically launches rsync to copy changes you make on our ramdisk back to persistent location.

I used this setup for almost a year. It saved me lot of time and sanity.

jlgaddis · on Sept 5, 2013

https://github.com/graysky2/anything-sync-daemon

scotty79 · on Sept 5, 2013

Cool. I still prefer my 5 line bash script manually started in separate console.

gte910h · on Sept 5, 2013

Does this buy you that much over a SSD?

scotty79 · on Sept 5, 2013

Not sure. I was using it on a laptop with spinning rust for fairly large rails project.

txutxu · on Sept 5, 2013

swap on ssd is (pseudo and slower) increased ram on big needs.

Example: $5 digital ocean droplet (comes without swap, but it's over SSD, so you can create a swap file, and is less painful than mechanical swap).

gte910h · on Sept 6, 2013

Apologies, I think you know the answer, but I couldn't understand what it was from this post. Could you reword that?

dkl · on Sept 5, 2013

Too bad it doesn't work on Cygwin. (I share my ssh config between Linux and Windows.) Too bad ssh doesn't have conditional configuration. (Yes, I know I could script this, but it's a little more pain than I want for this gain.)

AceJohnny2 · on Sept 5, 2013

How about VirtualBox in seamless mode instead of Cygwin?

voltagex_ · on Sept 5, 2013

Are there any "shorter" options for ControlPath that are still unique? I've had a few instances of silly hostnames that have caused an error about the name being too long for the socket.

verbatim · on Sept 6, 2013

Doesn't this only help if github lets you leave ssh connections open and not doing anything for long periods of time?

Surely if they do, they won't for too long if lots of people start doing this.

nilved · on Sept 6, 2013

What can be done to prevent the "Connection to github.com closed by remote host" error that comes a few minutes after a push/pull with the Control settings enabled? Since I'm normally in vim by then, it ruins the layout.

frankil · on Sept 6, 2013

Not sure how you can stop that error, but you can run "Ctrl-L" or ":redraw" in vim to fix your layout.

hesselink · on Sept 6, 2013

I just tried the first part of this (the ssh multiplexing) and instead of getting faster, 'git fetch' got slower (1.9 to 2.4 seconds). Any ideas why, and how I can debug/improve it?

alexchamberlain · on Sept 5, 2013

Dear GitHub,

Please setup local ssh termination to your network.

Thanks,

Alex

warmwaffles · on Sept 6, 2013

git fetch

git merge origin/master

A much preferred workflow.