I've been participating in AISHub for a few years and really enjoy it. One of my projects is @sfships, a Twitter bot that announces major ships entering and leaving SF Bay: https://twitter.com/sfships
If anybody ends up using this, please let me know. On Twitter, I'm @williampietri. I have also been keeping a copy of all the AIS data for the last few years, and am happy to share it.
Ships like you describe are also often weirder with what they transmit. Cargo and cruise ships go from known port to known port in reasonably predictable ways. (Reasonably predictable meaning that 50 or so regexes can usually extract a little sense from what some sailor types into a bridge console.) But a lot of data from smaller, less predictable ships is much less regular.
Now that it has been running a while, I should definitely go back and see what else I can extract from the data. But one of my problems is that this stuff is poorly documented. What I really need is connections to maritime experts who can look at the data and say, "Oh, that ship is..."
I just recorded everything. There are some gaps, but I have something like 3 years at 50 million AIS lines per day. I keep 90 days locally, and then throw the rest in S3.
I should say this isn't all AIS data, just all that comes through this particular network of receivers. For truly global coverage, there are satellite AIS receivers, but I don't have access to that.
I only have one receiver myself, on the north edge of SF, but I'm happy to set up ADS-B if you aren't covered here yet. Contact me via email or Twitter if you'd like.
I also built a Python AIS parsing library with a bunch of command-line tools (e.g., aisgrep, ais2json): https://github.com/wpietri/simpleais
If anybody ends up using this, please let me know. On Twitter, I'm @williampietri. I have also been keeping a copy of all the AIS data for the last few years, and am happy to share it.