That's often the case, but far from a steadfast rule. Consider:
1. Sally bought Anne a heifer, but she didn't buy her a dog.
2. Sally bought Anne a heifer, but she hadn't asked for one.
3. Sally bought Anne a heifer, but she didn't produce any milk.
4. Because she bought Anne a heifer, Sally went home happy.
In 1, 2, and 3, she = Sally, Anne, and the heifer, respectively. In 4, she = Sally, which is to the right.
The linguistic phenomenon of reference is "anaphora", and anaphora resolution is a tricky topic in NLP.
I suppose on reflection this could be a journalistic convention rather than a grammatical rule -- while each of those is seemingly a correct sentence they certainly don't match the conventions I'm used to.
Journalistic style tends to favor unambiguous sentence construction over (arguably) more elegant construction. Among other things, this generally leads to, for example, repeating nouns rather than relying on people to decode which noun a given pronoun is referring to. Not that this is bad style in general. My advice is that whenever a sentence requires complex parsing of punctuation or whatever to work out its meaning, it should be rewritten.
Man it's annoying that HN doesn't let you see context when replying.
1. Sally bought Anne a heifer, but not a dog.
3. Sally bought Anne a heifer, but it didn't produce any milk.
The other two adhere mostly to convention (I'm okay with #4).
I always favor this style because it's easier for the person you're trying to communicate with to understand you, and that's more important than really anything else.
Do you know of any experiment where machine learning was applied to building an algorithm that learned anaphora resolution based on a human generated training set. I'd love to read any articles/papers you have on this. Thanks!
1. Sally bought Anne a heifer, but she didn't buy her a dog.
2. Sally bought Anne a heifer, but she hadn't asked for one.
3. Sally bought Anne a heifer, but she didn't produce any milk.
4. Because she bought Anne a heifer, Sally went home happy.
In 1, 2, and 3, she = Sally, Anne, and the heifer, respectively. In 4, she = Sally, which is to the right. The linguistic phenomenon of reference is "anaphora", and anaphora resolution is a tricky topic in NLP.