Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's kinda the point, though. I don't want to run a single command to both define and complete the commit. I want to be able to do it in steps. Review my modified files, pick and choose what's staged, maybe make some more edits, stage those, double-check the staged diffs, verify the commit has what I want, and _then_ hit the button.


But isn't this what changelists do? I've certainly done pretty much what you describe in hg, svn and Perforce, which don't have staging areas.

Also, if you end up committing only parts of files, then how do you know that what you commit works?


Well, you don't, so... don't do that. Same thing goes for Perforce; you can already create multiple changelists that you commit independently, so it's easy enough to create a bad commit, because it contains only some of the changes you've got locally.

What you're supposed to do in Perforce of course is to create your changelist, shelves the others, make sure your proposed changelist works independently, and commit it only when it does. Which is analogous to what you're supposed to do with git:

    - stage parts of files
    - `git stash -k'
    - check it works
    - `git stash pop'
    - commit
But let's be serious here - the true advantage of git comes when you do the pro move.

    - stage parts of files
    - commit
    - (90% of the time) it's good
    - (10% of the time) bzzt... it's bad
Now, if you were using Perforce, in the 10% case, you'd be stuffed. Somebody would come over to your desk.

    - (them) "Your commit is broken"
Now what can you do? There's no denying it... you gone done fucked up, and there's no denying it.

But with git, you have options. So let's try that again.

    - (them) "Your commit is broken"
    - (you) (confident tone of voice) "Ah no, you are using git incorrectly, you forgot to git fetchpack
      --inflate-upstream-probability-leaves --deny-downstream-packs 
      --integrate-sideway-arguments --discombobulate-flat-file-compressed-alignment-jobs
      -P -Q -J 5 -N 311 -X 0771 -R 0x134774"
    - (them) (uncertain)
    - (you) (confident gaze. Straight into their eyes. YOU = TIGER)
    - (them) (increasingly uncertain)
    - (you) (unblinking gaze. You have the upper hand. YOU = 2 TIGERS)
    - (them) (wanders off to try your technobabble)
    - (you) (fixes your commit)
There's an xkcd about this.

The advantages, I think, are clear.


That's a personal problem. Don't blame the tooling for a bad coworker.

As long as a change or refactoring stands independently of the intended feature change, there is often value in keeping that fine-grained version history. It replicates your thought process at the time in the history messages, gives more manageable git-bisects, etc.

If you can't reliably look at the modified files and pick out what changes stand independently then this approach is obviously not for you, or at least you're gonna need some help from your tooling.

TeamCity has the concept of a 'delayed commit' that is submitted as a patch and tests are run. If the tests fail the patch is not accepted. You can also do a ghetto version of this in any system just by running a local copy of your CI server. Push to its repository target and have it run the tests. If it works push it out to the real build server, if not then there's no problem just force-pushing an amended commit since everything is still local.

Alternately you can commit the independent changeset and then stash everything else and run a build/test.

Obviously you can't always pull out an independent changeset and in those cases you should commit the whole thing at once. But a lot of times it's not actually that difficult to identify independent steps that could be built/tested independently.


Aah, sweet, someone else who uses Perforce! I've been wondering, do you use version control as:

1. a way to record logical changes to files (e.g. implement two features without making any commits, then when you're done, pick out the files/chunks that encode each feature and create commits out of them),

2. a record of history (e.g. just start writing code and make a commit every time you compile/run tests without unexpected failures),

3. something else?

I've found it very painful to apply my git-adapted workflow to Perforce: I _want_ to just start coding, testing out various possible design choices and implementations instead of only theorizing about them, but can't (e.g. "I wonder if I could factor out these methods + fields into a different class?" Perforce: oh well, I guess I should write it down somewhere to remember for later. Git: branch my current work, spend five minutes sketching an extraction, then realize it's insane and continue working). Am I crazy and just don't realize ow much better the Perforce model is?


Well, I used to use it, at least!

I actually came to quite like it, but the workflow was sometimes a bit tiresome :( When I found myself the situation you describe, potentially wanting to make an additional change while already working on another, I did exactly what you suggest: add a todo item, and carry on until I'm done with the task at hand. Then start my second change with nothing checked out, so I can undo checkout on everything should I make a mess.

This works, and you get used to it, and of course many would say that it's a better approach - but it would be nice if the Perforce client tools could be a bit more imaginative.

As for how I think of version control, if it's git, #1. If it's Perforce, #2, plus #3 - backup and distribution.

I don't know how much value I get from being careful about my commits with git, but it does make me feel better (which I suppose could be reason enough). On the large-scale, goal-oriented projects that I've used Perforce for, worrying about logical changes has never felt very important. Does it make a big difference if you have one commit that implements 3 features, or 3 commits? When you're trying to fix a bug, you don't really mind either way, because (a) bugs don't respect feature boundaries, (b) all the features are non-negotiable, so it's not like you can back one out and carry on anyway, and (c) the project is large and fast-moving enough that even if you could, there'd be a good chance that actually, you couldn't.

What history tends to come in useful for when it comes to this sort of project: finding the code that corresponds to a particular build that you have a bug report for, and finding who made a particular change so that you can ask them about it (using the imeline view say). These both work fine whether you make granular changes or not.

You might want to be a bit more careful about things if you're working on changes that might want to be merged into another branch on an individual basis. (Suppose you're moving spot fixes from release branch to main branch - some you'll want, some you won't.) But you usually know when you're making this sort of change, and so you work that way anyway.


> Also, if you end up committing only parts of files, then how do you know that what you commit works?

Because one part was, say, whitespace-errors corrected by my editor, or added/updated comments, or removed dead-and-commented code paths, while the other part was an actual logic change. There are many reasons to modify a file of source-code beyond making the code behave differently, but frequently those aspects come together as "oh, I'll just fix this while it's staring me in the face" and must be picked back apart later.


> Also, if you end up committing only parts of files, then how do you know that what you commit works?

You use your knowledge as a programmer to determine what that part of the file does and whether it can stand independently as a change. This is not always obvious, so in that case, you don't do it. But when it is obvious, it makes it quick and easy to commit it separately.

For example, say you rewrote a function in a script, and while you were at it you improved the usage info (--help). Then you go to commit your changes. You look at each changed hunk, and you see that there are two changed hunks: the function you changed and the help message. With Magit, it's dead simple to commit them separately: 1) select the first hunk, 2) s to stage, 3) c to commit, 4) write the commit message and execute the commit, 5) select the second hunk...

And let's say that, while you were rewriting the function, you commented out some code in the old function. You obviously don't want to keep that old code in the new function, so when you are looking through the changed hunks, you select those lines and press k to revert (or kill) those lines containing the commented code. If you have already staged them, then you can simply select those lines and press u to unstage just those lines, removing them from the staging area but preserving them in the working tree.

The workflow of 1) develop, 2) review and stage related hunks, 3) commit staged hunks, 4) goto 2, also helps a lot by forcing the developer to look at the code he's about to commit. Of course you can just stage everything that's changed and make a big commit, but I find that using the staging area to build up commits before making them helps me catch things that I don't want to commit in the first place. It's really useful for managing config files, because you can commit only the parts you want to save or sync to other systems, while leaving some parts active in the working tree but out of the committed branch.

For example, the Picard music tagger stores window and dialog positions in the same config file along with behavioral settings. I want to store and sync changes to the options, but I don't want to commit and sync every time the window position changes.

The staging area is a powerful (yet essentially optional) feature that, when used correctly, gives the developer great power to make commits more logically independent, while allowing him to develop freely, with little concern for how he will eventually commit the changes he's making.


Thank you. Exactly the kind of workflow I was trying to describe, but said better.


I do this all the time and the answer is:

I test it.

If there's a certain set of changes i wish to commit in one unit, but i have other changes in flight, then i use staging to prepare and reason about them while being able to diff both staging<->previous and staging<->hdd, make the commit, then stash the hdd state and run tests. Afterwards i can restore the hdd from stash and either amend the commit if necessary, or keep on working.


This. I can't test the staging area so I have zero interest in committing it. Actually I have less than zero interest; I would like my team to be prevented from ever creating untested commits.


This isn't an argument against using staging areas, but against creating untested commits.

If your workflow is that every commit must be tested, the correct approach is to enforce that rule, rather than banning an only tangentially related feature.


Does the staging area have any use other than creating a commit that differed from the workspace, which guarantees that it cannot have been tested?


If you're trying to enforce tested commits without running tests on what's actually committed, you're doing it wrong.

There's an inherent incompatibility between the supposed ways that the staging area can be replaced with other things. If you replace it with a commit you keep amending as you go, but also all commits must pass tests, then you can never commit partially complete code, which defeats the purpose of having the amending commit.

If you interactively choose what to commit at commit time, then you run the risk of failing the test and then, after fixing whatever was wrong, having to remember which parts of which files it was you chose to commit last time. Now, you could choose to have your CVS remember that for you, but then you're basically reinventing a less-capable version of the staging area.

If you simply commit everything, then you can never have any code in your working tree which breaks tests, nor can you have code which isn't logically part of the commit you're currently working on. In practice that means that when you find things that need fixing that aren't directly related to what you're doing, you either fix it and commit it with unrelated changes - which interacts poorly with merging or reverting those commits - or you don't fix it and, most probably, forget about it.


I worked for many years with version systems that didn't include staging. People still frequently forgot to commit necessary files. Opinions may differ, but I feel the best place to enforce that constraint is with a CI server that merges only working changes.


This is a symptom of offering an option to perform an incomplete commit. Word processors don't have this problem because they don't make you specify which paragraphs are important enough to save.

I commit the workspace. If I'm not happy with some of the workspace, I fix it (or stash it for later when chaos happens).


Its use is to help in reasoning about the changes by offering an easy way to diff staging to the previous commit and to hdd, while being easy to change quickly.

As for testing, it is trivial to stash or commit your current changes, go go the commit just made, test it, then carry on. A commit having been made partial only guarantees it wasn't tested if the developer is bad.


To me that sounds harder than making a known-good commit of my entire workspace, and refactoring that commit into several by selectively applying it to a clean workspace (this doesn't need to be done at the same time or by the original author).


The staging area is extremely helpful, beyond just making a single selective/partial commit.

Other uses:

1. Breaking a large commit into multiple smaller commits.

2. As a simple "review" system. You stage files that have been reviewed so you know you don't have to look at them again, while continuing on a change.

3. Cleaning the working copy of superfluous changes/files. If you stage the files you want to keep, then clean and/or checkout force, it wipes out the files you didn't stage. Then just reset to unstage all files and continue making changes.

4. Splitting a branch into two different branches, when you realize that a branch could be broken down into smaller separate changes that are both incomplete, but COULD be committed separately. You merge squash to master, staging and commit creating a new branch for feature1, then stage, stash, and stash pop the rest back on to master (since sometimes you cannot just switch back to master), and create the feature2, and feature3 branches, etc.

I also use Magit in Emacs, which makes the staging area like the swiss army knife of Git. I cannot image the amount of extra pointless tedious work that would be required using commits, branches, and merges to simulate the equivalent behavior of the staging area. The staging area is used heavily by folks who like to get a lot of things done with minimal effort required.

I understand the difficulty for new users switching to Git, and I agree that the commands could be more clear and consistent, but beginner (commit all) and novice (selective commit) users should not be left to determine the utility of Git features they very likely do not fully understand or appreciate.


> 2. As a simple "review" system. You stage files that have been reviewed so you know you don't have to look at them again, while continuing on a change.

(How) does this work if you want/need a virtual paper trail of reviews and/or in a distributed team? Do you email your changes for review, without making a commit and push (to a branch and/or fork)? Apologies if I misunderstand what you mean, but I read "review" here as colleague walks by and have a look, and says lgtm?


This is personal review, not peer review. I try to review my code after completion and a period of rest before submitting for peer review. This discipline finds a surprising number of issues with my code that I just didn't see when I was tired, sick of looking at the code, and just ready to be done with it. I stage the changes as I go (particularly the small changes) and leave the meaty changes for last to be analyzed carefully against the original source.


> I would like my team to be prevented from ever creating untested commits

Frankly, this demonstrates that you have exactly zero understanding of how git works. If I were the sort to force my team to do something, I would force them to never test uncommitted code. Committed /= pushed. A commit means you can keep working on the code while the tests run, and you won't forget what you actually tested. Then, when the tests come back clean, you won't accidentally push some additional, untested change along with the tested changes.


If you can concurrently build/test and edit, you necessarily have more than one workspace, one of which can take the place of the staging area.

I do like the idea of having a snapshot of a test run, though we'd have to amend the commit or something to indicate that it passed its tests and is a sane parent for new commits on its branch, and I would switch to working on an independent feature rather than trying to do work on code which may or may not be broken.


> If you can concurrently build/test and edit, you necessarily have more than one workspace, one of which can take the place of the staging area.

This takes a really narrow view of what software projects can be hosted in git. If you're working on a huge project which must be built (at least for full builds) and tested "in the cloud", which you trigger by pushing to a temporary remote branch, then no, you need not have more than one workspace. If you're a webdev, then yeah, gitless or any other git workflow without the staging area might be useful for you.


I'm sure that workflow can be made to work, but if I had to answer the question "assume you can't test on your editing box nor edit on your testing box, so how do you get source from your editing box to your testing box?" git would not be anywhere near the top of my list of solutions because it leaves known-bad commits pushed that need to be contained like waste.


I think you still don't get this whole "git" thing. Commits are just read-only collections of diff chunks plus metadata, nothing more. It's up to you to define where the boundary is for "bad commits shouldn't be here". Git commits are nothing like SVN or CVS commits (or whatever they're called).

Trying to use some other arcane or error-prone mechanism for getting changes onto test machines is exactly what will leak bad code, because inevitably people will end up sending around zips full of code from who-knows-where for who-knows-which-change.


If you're going to condescend to a stranger, be right. A commit conceptually has a tree of complete blobs that git will diff as needed. git has options that make it more or less sensitive to differences even between unrelated files, so it ignores any delta encoding that may or may not be used in packfiles (and never in loose objects).

If you're making an efficiency argument, git doesn't really try to beat the rsync algorithm even if you ignore the metadata. If you're making a "some people know git and nothing else" argument, fix the team, because that's a serious deficiency.


> A commit conceptually has a tree of complete blobs that git will diff as needed

You're a bit confused about the difference between "concept" and "implementation". There's multiple ways to conceptualize git, but the one I offered is the one presented by the default porcelain while the one you offered is closer to the implementation. The encoding used for storage is entirely irrelevant to my point.

Anyways, my point had nothing to do with efficiency but was related to the topic at hand; that is, whether it it safe to use git to put code on test machines, and indeed whether that's safer than the alternatives. Nothing you said really addresses that point.


You make it sound like a commit is some final immutable set-in-stone thing and so you need staging area. Commits are only "set in stone" after you share (push) them with others.


You're still missing my point. My preferred workflow is to iteratively put together the pieces I want, and only hit the "commit" button when I'm pretty sure I've got it crafted the way I want. Sure, I _can_ rebase, amend, and rewrite commits after creating them, assuming I haven't pushed them yet, but I'd much rather do it _before_ the commit is created. At first glance, this "gitless" tool doesn't seem to really support that workflow.


Why, though? What's the difference between a staging area that you convert into a commit, and an initially-empty commit that acts as a staging area, that you "commit" by just not modifying it any more, and instead creating a new empty commit (new staging area) to work in?

They're bijective mappings of the same data, with the same actions triggered at the same points in a workflow. The only difference is what you label the actions.

To go further: how do you feel if you think of all your un-pushed commits as simply "a chain of staging areas"? What if "committing" was something that happened at git-push time, and until then, you just created staging-area after staging-area and populated them with patch-chunks? Because this would also be a bijective mapping, involving the same actions in the same places.


My commit workflow goes like this: I run git diff, look at the first file that has changed, and then if it looks good, I add that file. At this point I may notice a stray print() statement or something, and delete it, then hit add. I rinse and repeat until I've read everything that I'm about to commit and verified it's what I want.

I've been using mercurial for a couple months for work, and I miss the git staging area workflow. I dislike just assuming all the modifications I've made are good to go, I think the focus git's staging area steers you toward is really valuable.


Mercurial patch queues are very similar to the index:

http://stevelosh.com/blog/2010/08/a-git-users-guide-to-mercu...


Did you reply to the wrong comment? Your comment seems off topic relative to the parent.

Regardless, your workflow of 'git diff' on staging with 'add's functions the same as 'git diff HEAD~' on a commit with 'commit --amend's


I think you (and the parent) are totally misunderstanding what I and probably most other people use git's staging area for if you think that git diff HEAD~ does the same thing.

To illustrate a little more explicitly, a typical workflow goes like this:

    git diff
    git add file1
    git diff
    git add file2
    git diff
    ...
    git commit
Now yes, you could replace `git add fileX` with `git commit --amend fileX` but I don't see that as a better interface. There are lots of circumstances where I rewrite history, but if there's a workflow that doesn't involve rewriting history, I'll take that one please.


To make this point a little more directly, and in a similar way to what the article talks about, using "git commit" for this is using the same command for two different things, and makes it easier to make mistakes. Treating the staging area differently from commits means that you'll never accidentally alter a commit that has already been pushed, or accidentally add changes to a commit that you didn't intend them to be in. Yes, you can implement this workflow with commit --amend, but it's nice to have support from the tools to distinguish between "possibly-incomplete changes that I've signed off on as part of preparing a larger change set" and "completed, approved changes that may have been pushed to other people". Yes, you can implement this by committing each change and then later rebasing them, but that's even worse from a perspective of the tools supporting the workflow directly.


I absolutely agree with you. There is good reason to differentiate staging from commits, I was just curious why he seemed to think it couldn't be done.


I am imagining treating the last commit as your staging area. Am I missing something that makes that impossible? Specifically, what is the operation that you can't map from staging to a final commit?


Unstaging, maybe? I use SourceTree, so I don't have to remember the specifics of staging-related commands vs commit-related commands. It's just right-click > "Stage / Unstage File", click "Stage / Unstage Hunk" or shift-click "Stage/Unstage Lines".

Undoing part of a commit to be the equivalent of an "Unstage" action is feasible, I'm sure, but I couldn't tell you off the top of my head exactly what that command would be.

So overall, part of it is that it's a mental model, part of it is that the concept is built right into Git.


Which means that if you amend commits, you need to keep track of whether or not you've shared them with others yet.


Which mercurial does for you (see: phases), but you are right, I forget that git does not help you there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: