Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm curious about 2 things.

1) Why did you not test the standard Collatz sequence? I would think that including that, as well as testing on Z+, Z+\2Z, and 2Z+, would be a bit more informative (in addition to what you've already done). Even though there's the trivial step it could inform how much memorization the network is doing. You do notice the model learns some shortcuts so I think these could help confirm that and diagnose some of the issues.

2) Is there a specific reason for the cross attention?

Regardless, I think it is an interesting paper (these wouldn't be criteria for rejection were I reviewing your paper btw lol. I'm just curious about your thoughts here and trying to understand better)

FWIW I think the side quest is actually pretty informative here, though I agree it isn't the main point.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: