A killer feature would be doing semantic analysis to attach steps to AST nodes instead of lines.
EG a class or function.
Without this it seems tours could be broken/outdated very quickly in active code bases.
Do you have any plans for that?
tree-sitter [1] supports parsing a lot of languages and could be a good way to make that happen without too much effort. (as long as items stay in the same file)
Or IMHO this is much better solved with a literate programming approach where the tour content lives inside the code as comments. Then as code is refactored and changed it's very obvious that the related docs and tour data has to change too.
With CodeTour, you can have "content only steps" (introductories, interstitials), steps that speak to directories, and also, add steps to files that don't necessarily support comments (e.g. JSON). So there seemed to be some value in having the tour be more flexible, than what what might be achievable code comments alone.
Additionally, after speaking with a bunch of folks, there are definitely teams that weren't interested in "polutting" their code with comments that might be tailored to onboarding new team members, and therefore, didn't need to be always visible.
That said, I totally agree with the value of a literate programming-based solution. But there may also be some nice properties to a "side car" file as well, and so I'm primarily trying to explore how well we could make that work, in a resilient and easy-to-maintain way. We'll see how it goes!
Not an equivalent of literate programming. Here you can jump to arbitrary files at arbitrary points, in literate programming your code must flow from top to bottom linearly, which is most of the times impossible to achieve.
No not with Knuth's original ideas for literate programming and cweb. There's an explicit abstraction between the location of prose and the final source code output and they can be woven in any way the author desires. You can even insert intermediate steps, like pseudocode that show building up the architecture in small incomplete steps, and all of that is removed entirely from the final source code result. It is actually quite complex and why more modern literate programming systems ditch it and basically turn into fancier comments.
I recommend reading up on cweb and its history. The tangling and weaving metadata and prose lives in the code with special fenced blocks. Noweb is a modern version of it: https://github.com/nrnrnr/noweb (but it's not really used or maintained much anymore)
But many developers might be very much against littering code with such tour comments, and be very much opposed to any kind of responsibility of updating tours during refactoring.
That doesn't really answer the question posed though--how does a tour stay up to date as code is changed? Does someone have to go in constantly and keep it up to date, pointing at the right spots, etc? It sounds like an incredible burden without dedicated staff or time to maintain--i.e. this is fine for projects established enough to have technical writers, evangelists, etc. but for 99% of projects it's just more burden and burnout. It's not really something you can farm out to your community or first time contributors either as deep analysis and understanding of a codebase takes real time and effort from the core devs.
I originally added the Git ref solution, as a simple way to enable "resilient playback" for some scenarios. In practice, that seems to work pretty well for many folks. But I agree that this isn't a full solution to the problem of code churn. I'm working on an enhancement right now, that will attach the steps to code in a more robust way.
In general, I've seen a pretty great reaction from folks about the concept of CodeTour, and so I'm very focused on making them maintainable, since I believe that's the "big rock" needed to make them a worthy investment for more teams.
Pointing to git refs should be sufficient for code that's not super volatile.
I personally envision using it for onboarding new developers to a code base, in which case I think being on an older ref should be fine, since I'm just trying to show the general structure of a project.
I could also see using it in a code review context, in which case pointing it at the branch would also be fine.
Also, if you look at the schema it generates for a tour, it would be pretty easy to go through and update the line numbers directly in the JSON.
Sorry for the naive question but how is user privacy handled? Is data about the repo sent externally? I assume this would primarily be useful for OSS projects and not private or internal/confidential projects?
Hey! When you record a tour, it simply creates a JSON file that can be committed/maintained as part of the associated codebase. Then, when someone takes the tour later, they're simply "reading" that file locally in their editor, and so no data about the codebase is ever sent externally.
I tried to record one, and after it was done it seems order was messed up - some steps from beginning were pushed to the end. Not sure if I can reproduce, but thought you might find it useful to know.
oh wow, what an amazing concept! i hope to see it evolve beyond vs code, like the language server concept.
the first thing that came to mind are walkthroughs by original authors. in fact, i recently downloaded the source code for the first IRC server/client by the creator of the protocol. i could use a walkthrough. the c code is quite old and nothing online helps you understand it.
this could help explain old code bases like the first unixes or the first c compilers. or maybe we can get id software people to do walkthroughs for doom, quake, etc.
I really like the idea of CodeTour, but unfortunately I'm the only person at the office that uses VS Code.
I'm hoping that some day it'll be possible for my coworkers to follow a CodeTour without installing VS Code.
Either when GitHub Codespaces reaches general availability or if I ever find the time and motivation to learn enough about github1s [1] and CodeTour to integrate the two.
This is a fantastic idea. The concept of code tours is something I've been trying to push for via thorough readmes combined with example PRs to show users how to actually do something but this takes it to another level that I'm very happy to see.
If developers don't read documentation these days, even if you point exactly were there is a solution to their problem, are they going to watch animations that are too long for their attention span to contain?
In most projects, outside of publicly published interfaces (and even that is a stretch), documentation is incomplete, badly written and not updated thus misleading. Having a tool that links directly to the code is a great idea.
Great idea! I've been thinking about creating a repo that could store tours for other repos. That way, the community could contribute them, without needing to check them into the target repo itself. Similar-ish to the definitelytyped ecosystem that TypeScript built up
But annotations are linked to lines. As soon as one line changes (e.g., its content is no longer in line N but line N+ M, the annotation is meaningless). So tours would be need constant updates (otherwise tours would work for fixed versions of the repo only).
For vim I use the CodeReviewer plugin [1] to capture comments for files. I modified it to use differently named comment files and to jump to the file locations from the comment line. I am planning to include support for hierarchical comments or referring other comment files from within one. Also the problem of tracking code changes or referring to a certain git commit is still there - in clearcase it can be easily done with a reference to the branch, but not sure how to do it in git.