Hacker Newsnew | past | comments | ask | show | jobs | submit | sgillen's commentslogin

I've only been playing with it recently ... I have mine scraping for SF city meetings that I can attend and public comment to advocate for more housing etc (https://github.com/sgillen/sf-civic-digest).

It also have mine automatically grabs a spot at my gym when spots are released because I always forget.

I'm just playing with it, it's been fun! It's all on a VM in the cloud and I assume it could get pwned at any time but the blast radius would be small.


>It also have mine automatically grabs a spot at my gym when spots are released because I always forget.

seems far more efficient/reliable to get codex/claude code to write and set up a bot that does this.


>set up a bot that does this

But he already did this. With a bonus of it will continue to work in the future if something breaks or changes. Human time is more precious than computing resources nowadays.


> seems far more efficient/reliable to get codex/claude code to write and set up a bot that does this.

I think Simon Willison said it best some weeks ago: He's capable of writing a bot like this - both before and after LLMs came on the scene. However, the reality is he never wrote one, despite wanting to many times.

Yet in just 2-3 weeks of using OpenClaw[1], I did this a few times.

Recall a year or so ago in the early days of vibe coding when people kept saying "I don't need AI to write code. It does a crap job and I can do it myself. Who needs LLMs to do it?" - You'd get lots of people countering with "Oh, in a few weeks I've written lots of automations that I'd been thinking about for months/years - that I likely would never have written without AI coding tools".

The key is the lower barrier to producing something. OpenClaw is to using CC to write that bot as using CC was to writing code by hand. I can be doing work, shopping, etc and when an idea pops into my head, I casually send a note to my Claw instance (voice or text) asking it to look into it or try making it. It doesn't do a great job, but the expectations of success are similarly low. But when it does do precisely what you need it to: Oh boy, you're happy that it saved you time, etc.

[1] I no longer run it, for very boring reasons.


[flagged]


No? The comment was admittedly ambiguous but if you go to repo it's far clearer:

>I use it to give me a weekly digest of what happened in my neighborhood and if there are any public hearings or trash pickups I might want to attend.


that does not seem like something you need an 'autonomous' agent for.

What would you propose as an alternative?

Anything not relying on an LLM likely means having to write bespoke scripts. That's not really worth the time, especially when you want summaries and not having to skim things yourself.

Going from doing it manually on a regular basis to an autonomous agent turns a frequent 5-15 minute task into a 30 second one.


> Anything not relying on an LLM likely means having to write bespoke scripts.

The very first line in your readme is "CivicClaw is a set of scripts and prompts" though? And almost the entire repo is a bunch of python scripts under a /scripts folder.

I looked at one randomly chosen script (scripts/sf_rec_park.py) and it's 549 lines of Python to fetch and summarise data that is available on an RSS feed ( https://sanfrancisco.granicus.com/ViewPublisher.php?view_id=... )


Parent isn't saying that bespoke scripts are bad, just that it's not worth their time to write them. The value of the bot is that it can do that for you.

They've created a public bulletin board for themselves, like a café's blackboard, or a city telephone pole.

I use it all the time now, switching between claude code, codex, and cursor. I prefer CC and codex for now but everyone is copying everyone else's homework.

I do a lot of green field research adjacent work, or work directly with messy code from our researchers. It's been excellent at building small tools from scratch, and for essentially brute forcing undocumented code. I can give it a prompt like "Here is this code we got from research, the docs are 3 months out of date and don't work, keep trying things until you manage to get $THING running".

Even for more production and engineering related tasks I'm finding it speeds up velocity. But my engineering is still closer to greenfield than a lot of people here.

I do however feel less connected to the code, even when reviewing thoroughly, I feel like I internalize things at a high level, rather than knowing every implementation detail off the dome.

The other downside is I get bigger and more frequent code review requests from colleagues. No on is just handing me straight up slop (yet...)


To be fair to the agent...

I think there is some behind the scenes prompting from claude code (or open code, whichever is being used here) for plan vs build mode, you can even see the agent reference that in its thought trace. Basically I think the system is saying "if in plan mode, continue planning and asking questions, when in build mode, start implementing the plan" and it looks to me(?) like the user switched from plan to build mode and then sent "no".

From our perspective it's very funny, from the agents perspective maybe it's confusing. To me this seems more like a harness problem than a model problem.


Asking a yes/no question implies the ability to handle either choice.


This is a perfect example of why I'm not in any rush to do things agentically. Double-checking LLM-generated code is fraught enough one step at a time, but it's usually close enough that it can be course-corrected with light supervision. That calculus changes entirely when the automated version of the supervision fails catastrophically a non-trivial percent of the time.


To an LLM, answering “no” and changing the mode of the chat window are discrete events that are not necessarily related.

Many coding agents interpret mode changes as expressions of intent; Cline, for example, does not even ask, the only approval workflow is changing from plan mode to execute mode.

So while this is definitely both humorous and annoying, and potentially hazardous based on your workflow, I don’t completely blame the agent because from its point of view, the user gave it mixed signals.


Yeah but why should I care? That’s not how consent works. A million yesses and a single no still evaluates to a hard no.


The point is that if the harness’ workflow gives contradictory and confusing instructions to the model, it’s a harness issue, not necessarily a model issue.


First it was a model issue, then it was a prompting issue, then it was a context issue, then it was an agent issue, now it's a harness issue. AI advocates keep accusing AI skeptics of moving goalposts. But it seems like every 3-6 months another goalpost is added.


Your comment doesn’t make as strong of a point as you think it does; it might make the opposite point.

Because, yes, first, it was a model issue, and then more advanced models started appearing and prompting them correctly became more important. Then models learned through RLHF to deal with vague prompting better, and context management became more important. Then models became better (though not great) at inherent context recollection and attention distribution, so now, you need to be careful what instructions a model receives and at what points because it’s literally better at following them. It’s not so much that the goalposts are being moved, it’s that they’re literally being, like, *cleared*.

This isn’t a tech that’s already fully explored and we just need to make it good now, it’s effectively an entirely new field of computing. When ChatGPT came out years ago no one would have DREAMT of an LLM ever autonomously using CLI tools to write entire projects worth of code off of a single text prompt. We’d only just figured out how to turn them into proper chatbots. The point is that we have no idea where the ceiling is right now, so demanding well-defined goalposts is like saying we need to have a full geological map of Mars before we can set foot on it, when part of the point of going to Mars is to find out about that.

As a side point, the agent is the harness; or, rather, an agent is a model called on a loop, and the harness is where that loop lives (and where it can be influenced/stopped). So what I can say about most - not all, but most, including you, seemingly - AI skeptics is that they tend to not actually be particularly up-to-date and/or engaged with how these systems actually work and how capable they actually are at this point. Which is not supposed to be a dig or shade, because I’m pretty sure we’ve never had any tech move this fast before. But the general public is so woefully underinformed about this. I’ve recently had someone tell me in awe about how ChatGPT was able to read their handwritten note and solve a few math equations.


Not when you're talking with humans, not really. Which is one of the reasons I got into computing in the first place, dangit!


But I think if you sit down and really consider the implications of it and what yes or not actually means in reality, or even a overabundance of caution causing extraneous information to confuse the issue enough that you don't realise that this sentence is completely irrelevant to the problem at hand and could be inserted by a third party, yet the AI is the only one to see it. I agree.


It's meant as a "yes"/"instead, do ..." question. When it presents you with the multiple choice UI at that point it should be the version where you either confirm (with/without auto edit, with/without context clear) or you give feedback on the plan. Just telling it no doesn't give the model anything actionable to do


It can terminate the current plan where it's at until given a new prompt, or move to the next item on its todo list /shrug


It definitely _could be_ an agent harness issue. For example, this is the logic opencode uses:

1. Agent is "plan" -> inject PROMPT_PLAN

2. Agent is "build" AND a previous assistant message was from "plan" -> inject BUILD_SWITCH

3. Otherwise -> nothing injected

And these are the prompts used for the above.

PROMPT_PLAN: https://github.com/anomalyco/opencode/blob/dev/packages/open...

BUILD_SWITCH: https://github.com/anomalyco/opencode/blob/dev/packages/open...

Specifically, it has the following lines:

> You are permitted to make file changes, run shell commands, and utilize your arsenal of tools as needed.

I feel like that's probably enough to cause an LLM to change it's behavior.


There is the link to the full session below.

https://news.ycombinator.com/item?id=47357042#47357656


Do we know if thinking was on high effort? I've found it sometimes overthinks on high, so I tend to run on medium.


it was on "max"


If we’re in a shoot first and ask questions later kind of mood and we’re just mowing down zombies (the slow kind) and for whatever reason you point to one and ask if you should shoot it… and I say no… you don’t shoot it!


This is probably just OpenCode nonsense. After prompting in "plan mode", the models will frequently ask you if you want to implement that, then if you don't switch into "build mode", it will waste five minutes trying but failing to "build" with equally nonsense behavior.

Honestly OpenCode is such a disappointment. Like their bewildering choice to enable random formatters by default; you couldn't come up with a better plan to sabotage models and send them into "I need to figure out what my change is to commit" brainrot loops.


This. The models struggle with differentiating tool responses from user messages.

The trouble is these are language models with only a veneer of RL that gives them awareness of the user turn. They have very little pretraining on this idea of being in the head of a computer with different people and systems talking to you at once. —- there’s more that needs to go on than eliciting a pre-learned persona.


The whole idea of just sending "no" to an LLM without additional context is kind of silly. It's smart enough to know that if you just didn't want it to proceed, you would just not respond to it.

The fact that you responded to it tells it that it should do something, and so it looks for additional context (for the build mode change) to decide what to do.


I agree the idea of just sending "no" to an LLM without any task for it to do is silly. It doesn't need to know that I don't want it to implement it, it's not waiting for an answer.

It's not smart enough to know you would just not respond to it, not even close. It's been trained to do tasks in response to prompts, not to just be like "k, cool", which is probably the cause of this (egregious) error.


> It's smart enough to know that if you just didn't want it to proceed, you would just not respond to it.

No it absolutely is not. It doesn't "know" anything when it's not responding to a prompt. It's not consciously sitting there waiting for you to reply.


I didn't mean to imply that it was. But when you reply to it, if you just say "no" then it's aware that you could've just not responded, and that normally you would never respond to it unless you were asking for something more.

It just doesn't make any sense to respond no in this situation, and so it confuses the LLM and so it looks for more context.


> it's aware that you could've just not responded

It's not aware of anything and doesn't know that a world outside the context window exists.


No, it has knowledge of what it is and how it is used.

I'm guessing you and the other guy are taking issue with the words "aware of" when I'm just saying it has knowledge of these things. Awareness doesn't have to imply a continual conscious state.


I think to many people awareness does imply consciousness, i.e. the thing that is aware of the knowledge.


Meh I looked up the definition:

"having knowledge or perception of a situation or fact."

They do have knowledge of the info, but they don't have perception of it.


I think there is some behind the scenes prompting from claude code for plan vs build mode, you can even see the agent reference that in it's thought trace. Basically I think the system is saying "if in plan mode, continue planning and asking questions, when in build mode, start implementing the plan" and it looks to me(?) like the user switched from plan to build mode and then sent "no".

From our perspective it's very funny, from the agents perspective maybe very confusing.


Can you say more about the nav stack? I thought nav2 was considered one of the better more mature packages in ROS2, but it's not my area of expertise.

| As robotics moves toward end-to-end AI systems, stuff needs to stay on GPU memory, not shuttled back and forth across processes through a networking stack.

NVIDIA actually is addressing this with NITROS: https://nvidia-isaac-ros.github.io/concepts/nitros/index.htm...

And ROS native buffers: https://discourse.openrobotics.org/t/update-on-ros-native-bu...



Very interesting. There is nothing that would prevent PeppyOS nodes from running on the GPU. The messaging tech behind PeppyOS is Zenoh (it's swappable), it can run on embedded systems (PeppyOS nodes will also be compatible with embedded in the future). That being said, at the moment the messaging system runs exclusively on the CPU.


In my experience a lot of tech companies, at least in the Bay Area, have all copied this system.


Not sure if I agree with the christian references being incidental ... the first book is literally a retelling of the The Canterbury Tales, all the characters are on a pilgrimage. there are a bunch of religious groups with at least one being central to the story, there are cross shaped parasites that grant eternal life.

I still think you can enjoy it without caring much about religion.


>there are cross shaped parasites that grant eternal life

Without giving away any spoilers to the books, the parasites are only that on the surface. If anything, the books present a wary picture of religion, especially the last two Endymion books, but also a wary picture of technology.


> the first book is literally a retelling of the The Canterbury Tales, all the characters are on a pilgrimage.

As we have both read the books, it's notable that you associate pilgrimage with Christianity. This illustrates the point.


This is very interesting because I see a lot of AI detractors point to the original study as proof that AI is overhyped and nothing to worry about. In this new study the findings are essentially reversed (20% slowdown to 20% speedup).


I think their old findings were hard to treat as gospel just due to the kind of comparison + the sample, but this new result is probably much noisier.

It’s hard to make reliable, directional assumptions about the kind of self-selection and refusal they saw, even without worrying about the reward dropping 66%.


fwiw i think the interesting part about the original study wasn't so much the slowdowm part, but the discrepancy between perceived and measured speedup/slowdown (which is the part i used to bring up frequently when talking to other devs)


AI detractors loved that previous study so much. It seems to have been brought up in the majority of conversations about AI productivity over the past six months.

(Notable to me was how few other studies they cited, which I think is because studies showing AI productivity loss are quite uncommon.)


Or maybe there’s just not that many good studies, period?

A lot of them barely rise above the level of collected anecdote, nor explore long term or more elusive factors (such as cross-system entropy). They’re also targeting an area that is fairly difficult to measure and control for.


not enough people look at the slope, just the coords


The study was designed to have devs who are comfortable with AI perform 50% of tasks with AI and 50% without. So the problem is the population of "Developers who use AI regularly but are willing to do tasks without AI" is shrinking.

>> Are they worried that by splitting devs into groups of AI experience they might be measuring some confounder that causes people to choose AI / not AI in their careers?

The developer sample size was small (16 people in the original study) and the task sample size is larger (~250 tasks). I think the worry is variance in developer productivity would totally wash out any signal.


An alternative hypothesis might be "Developers who consistently use AI become unable to work without AI". It used to be well known that after a year or two away from writing code, a new manager would be a much worse dev than previously. Is a similar sort of skill shift happening? If we raise a cohort of new devs who never work without AI, do they never gain the ability?


Very cool! shameless self promotion but check out greenwave-monitor[1] for the 'Diagnostics TUI'. I'll get it into the buildfarm soon.

[1] https://github.com/NVIDIA-ISAAC-ROS/greenwave_monitor


Nice, thanks! looks like a good one..


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: