Since the results of each operation can be manually annotated (helpful, not helpful) maybe the intention of this project is more to use the data as a human feedback loop? Therefore such fine-grained data is of course better.
Other than that, it might also be useful for the user when either of both operations is not good enough yet and error propagation might be worse than human correction between the two.
Seems like it would make sense to run both at the same time.