Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The amount of developers I've met who will just download, compile, and run stuff from GitHub in the same way as if it was closed-source, i.e. paying no attention to the fact that the source is available for inspection, is surprisingly many.


I think it is worse than that.

I think being on GitHub (and seemingly open source) gives developers a false sense of security in that they assume the code is open and therefore community vetted and that the developer has nothing to hide.

I suspect people who would know not to download and run a random binary off the internet would download, compile and run projects from GitHub.


But, truly, what is the solution?

I mean, you can use static analysis or similar, but you generally can't check every line of code for every open source lib you pull in, let alone its dependencies.

Seems that, once you decide to use open source, you are actually making a choice to trust to some extent.


Commercial Linux distributions like Red Hat, Suse and Canonical stake their reputation on compiling a trustworthy collection of open source software, in exchange for money. Unfortunately they disclaim any legal responsibility, but at least they make reasonable efforts to analyze the security of the software they are distributing, in order to avoid PR disasters.

For some reason the same business model has not made many inroads for higher-level language ecosystems, although many companies are trying - for example the Python Conda distribution.


Winget seems to finally do something similar for Windows: https://github.com/microsoft/winget-cli

Although the "repo" is a list of manifest files that include third-party download sources. So even if there is an approval process it seems to be quite vulnerable to including malware.

Edit: Example https://github.com/microsoft/winget-pkgs/blob/master/manifes...


Of the 351 malicious repositories in the spreadsheet somebody linked, only 4 have more than 10 stars. None of them have more than 30 stars, and none of them have more than 3 forks. None have more than 5 issues, and only 4 have more than one issue.

You don't have to assume that the code is community-vetted. If a repository has at least a couple hundred stars, lots of forks, and an active pull request cadence, then you know that at least some people have gone digging through it.

If not, then that's when you should break out the sandboxing tools and prepare to check the code yourself. At least it should be easy-ish to automatically check/block everything that has the potential to open a network connection, which defeats most profitable malware models.


> But, truly, what is the solution?

Let's use GitHub as an example. We have forks, and stars. Maybe we could also have some kind of build endorsement?

How one would verify that the endorser is worth your trust, I am not entirely sure.

Maybe endorsers could eventually be rated by CVEs found in their endorsements, and that would build trust?


They could build an optional "risk score" that open-source community-oriented projects could turn on. It could include requirements like having something dependabot-esque along with CodeQL enabled. Rules could be created for CodeQL (if they haven't already) that check for obfuscated code, suspicious access (keychain, password storage, etc.) and other items.

On top of that it could have forced release binary scanning via VirusTotal/insert-malware-scanning-vendor-here.


How about directly linking to the CVEs and how quickly they were mitigated and in which commit?

Pay researchers to analyse repos without any. Post results. Link to the repo with mitigation PRs.

It’s insane this isn’t the standard already


> Pay researchers to analyse repos without any.

This is the problem, the best we can do is pay via exposure. But that actually ain't nothing. Not just individuals, but also orgs could then make money from private contracts based on their reputations? This should be the benchmark of trust. Could there be anything better?


Are you certain that CVEs are a good indicator?


Excellent question. No, I am not. I am just attempting to use my very limited knowledge on this subject to hopefully further discussion on a topic that feels really important.

I would love other people to jump in and elevate this conversation.

Sure, CVEs might not be the ideal metric. Could you, or anyone else, suggest a better metric?

If GitHub is too lethargic to do even contemplate this type of change, maybe this could be a differentiator for GitLab?


Copilot for sure should be able to describe the code and spot basic malware


No it won't. I could write you very basic, obvious malware that is obfuscated just enough for copilot to miss it 100% of the time. Let alone things like what JiaTan wrote.


LLM or human, what if they both competed in some sort of "I have the least CVEs in my endorsements list" battle?

This would actually be an excellent LLM coding benchmark,[0] in addition to a human endorser benchmark.

[0] If nobody is already doing this, especially retrospectively, and you do, then please at least give me a shout out. :)


You can get rid of legacy OS like Windows or Linux that cannot run applications in the sandbox and switch to those which can. In this case the malware only gets a sandbox and not the whole system.

If you work for a commercial company then you should not download the code from random users on Github for free but from commercial, safe repositories where the code is inspected, tested and verified. Or from reputable large commercial companies that are unlikely to put backdoors. Microsoft or Apple won't risk their reputation by backdooring an open-source library.


I don't get it, is there priviledge escalation attacks for windows? I haven't logged in as an administrator since 2005 or so.

We know we can hit the windows key and type "sandbox"? (May need to "install" it from windows features.) Right?

There are software packages that let you snapshot the files and checksums, then compare again after you've run your test program / installer / whatever.

You can make this software "portable" so you don't have to install it every time. You can copy and paste into the sandbox from your windows desktop and drives.

Obviously this isn't sanboxie or nix or an immutable file system or anything, but let us not pretend it's 1996 and "GoBack.exe" hasn't been invented yet.


Where did you get the idea that Linux cannot run applications in a sandbox?


They can - if you write the sandbox and adapt applications to it. What I meant is that the sandbox should be built-in into a distribution.

Also, I did some research and the sandbox is difficult to implement because you need to stub literally every facility (because Linux was not designed for sandboxing). For example, I had to write an emulation of /proc in Python using FUSE because many apps rely on reading files there but granting them full access leaks too much information about your system and is not secure. Now think how much time you need to stub every API, including undocumented APIs like /sys, ioctls and so on.


This is a solvable problem thanks to llms


Unless there is a comment "this code is actually safe, it's done this way for optimizations", or a variable called "thisCodeIsSafeItLooksWeirdForPerformance" and the LLM just ignores the backdoors.


This statement is not more corrcect than claiming the halting problem is solvable thanks to LLMs.


Over the past few years, I've seen several github projects that won't build because they rely on private libraries that are downloaded at run-time. I've opened a few of the downloaded libraries, and they're always innocuous. Often, they are just compiled versions of source in a different repo under the same author. But, that mechanism could easily swap the library for a trojan.

It's really absurd how many of these are out there in the wild. Scary really.


Well it all comes down to trust eventually, you cannot inspect every single line of code of every programs you want on run on your computer. Nowadays even Github stars are not worth that much trust because malicious actors can just make fake accounts or buy them.


The number of new GNU/Linux distros that have appeared since 1994 that just compile stuff into binary packages not even paying attention to the fact that the source can be inspected, is just staggering.


is this sarcasm? signed packaging is now a best-practice; linux promoted that since 1994?


Something built from malicious sources would fail to sign, so all is good.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: