Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Then you need to start passing options and configuration into your helper method... and before long your helper method is extremely difficult to reason about

In which case, you should split the helper function ( extract sub-part common to all cases, and report the differences where the helper function is called).

I think I would most of the time go with de-duplicating as early as possible, as long as the helper function only has few parameters and can be described in plain english rather easily.

To me the cost of having to refactor the helper function later in the process is less than dealing with duplicated code.

Duplicated code causes many issues. You mentioned introducing bugs, but it also makes the code harder to read. Every person who reads the code has to make the de-duplication effort mentally (check that the duplicated parts are indeed duplicated, figure out what they do, and on what parameters they differ...).



Premature optimization. Duplicated code is only evil when there's a bug you only fix in one place and forget about the duplicates; in almost every other case, it's easier to reason about and is more resilient in the face of local changes.

Abstraction is often like compression, and compressed data is easier to corrupt. Change the implementation of an abstraction, and you put all consumers of it at risk. It's not an absolute good.


> it's easier to reason about

Consider you have 4 times a block of 10 lines of code, they are identical except for a couple of parameters. The person who reads the code has to 1. figure out what the code does 2. see if the duplicated parts differ in some subtle way.

The alternative is to replace the duplicated parts with a function that has a meaningful name. This makes the code easier to read. It's not a premature optimization. It would make sense to keep it duplicated if you're in an explorative phase and not sure yet what the final design will be, but I wouldn't submit a patch where parts are obviously similar. I'm not sure it would pass review.


It really depends. For test setup, I'm inclined to leave alone; duplication is often easier to reason about when a test breaks. For code with control flow changes in the middle, abstraction is honestly dubious. For mere value differences, maybe, but if the values are complex and not merely scalar, maybe not. More duplication is needed to justify when the abstraction needs more edge cases to cover them all, especially control flow more than different values.


I generally find it pretty easy to reason about code structured like:

switch(object)

type1:

(bunch of code)

type2:

(bunch of code)

type3:

(bunch of code)

etc...

Even if the function is long it's pretty easy to skip over the irrelevant parts.

When you get in trouble is when you discover a bug (or have changed requirements) in something that gets duplicated several times and have to remember to hit all of them. The last one especially, it's the one that seems to be missed the most often.

Overall the tradeoff is generally worth it though, because you only need to care about one case at a time.


> Even if the function is long it's pretty easy to skip over the irrelevant parts.

> Overall the tradeoff is generally worth it though, because you only need to care about one case at a time.

Which part is irrelevant? As a programmer, I don't generally know which value `object` has, so if I need to understand the whole statement, I need to look at every case, so I often need to check whether they are identical or slightly different.

Duplicate code like this is a well-known source of bugs, one of the cases most often highlighted by static analysis tools.


It kind of depends what you're doing, but a lot of the time you are in a situation like "there's a bug when you do X", and when you find code like this you scroll through the options until you get to the one labeled as function X.

Inside of there you may run into stuff that was set up earlier in the function, but it's pretty easy to backtrack to the top of the switch statement to see what was done before then.

There's nothing quite like the joy of being asked to fix something and discovering that it's a big old top down project without excessive abstraction/compartmentalization. Just skim through the code in order till you get to the part that's misbehaving and start investigating. No need to jump around a dozen factory template engines or custom generic template wrappers to find the 3 lines of business logic that have an off-by-one error or something.


I am such a huge fan of Big 'Ol Switch Statements over using polymorphism/types/generics/whatever. So easy to understand, and it's all there in a huge scrolling list of cases. If something needs to change, it's easy to change it, and you know what else is affected.


That works until you have 20 different Big Ol' Switch Statements, each switching on (mostly) the same cases, essentially implementing a set of related behaviors in 20 different places instead of grouping the 20 behaviors under the same umbrella.

Overall, I think there is an equilibrium between the number of cases in the switch and the number of different switches with the mostly the same cases. The fewer cases ypu have and the more times you handle the same cases, the more it will help to group these different behaviors in separate places: each case would correspond to a different class, each switch statement to a different method in each class. The fewer classes and the larger they are, the more I think it helps to apply this transformation.

By the way, this has nothing to do with the discussion above. The alternative to the switch that GP commenter presented isn't polymorphism, it is simply extracting the common lines into a separate function.


you're entirely right, of course. I just wanted to express my enthusiasm for the approach :)


> Duplicated code causes many issues. You mentioned introducing bugs, but it also makes the code harder to read

That is I believe contestable. Yes, it can end up easier to read but there is a big tradeoff - when you remove code from its context it is much harder for a reader to reason about it. Once something is extracted into a function you can't see, you have to mentally ask the questions like "could this return null / None?", "how will it behave if the input is negative?", "does it throw exceptions", "is it thread safe?" etc etc. All this is directly observable in context when the code is inline, not so when it is removed to a central place.


Ideally, these questions should be answered by the function name, type, and its documentation. And if not, one can always jump to the function definition and the above elements (missing if the piece of code is inlined) will give additional hints to what the code does.


Upvoted. We'd probably come up with different implementations, I strongly prefer composition and "interpreter" style code, but whenever I've seen casual mutations in different dataflows, it was because of doing one off changes while ignoring the whole.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: