carolc's comments

carolc · on Oct 1, 2017

Agreed with lgierth and I believe this is what sets IPFS apart from many similar technologies: integration path for existing technologies. As far as I can tell, it has been an important design decision from early on for IPFS.

carolc · on Oct 1, 2017

See the comment below https://news.ycombinator.com/item?id=15376665. It is not true that distributed systems are only good for static content or "append-only" data. "Mutable systems" can be built on top of immutable systems.

vog · on Oct 1, 2017

I agree with you, but your argument is deeply flawed. There's quite a far stretch between "Y can be built on top of X" and "X is good for Y".

To provide an argument that might fill this gap:

Most systems don't actually have a huge amount of data. Look at the data size and data growth of CRMs, special-purpose wikis, and so on: These are mostly smaller than 500 MB (excluding static content like images), and grow by less than 1MB even on a busy day. And that's the uncompressed size.

Also, most systems, despite being mutable, actually want (or need) an audit trail. So these are really append-only systems which merely have a "mutable look and feel" to the user.

carolc · on Oct 1, 2017

Agreed that many CRMs etc. don't have a lot of data. And that's actually good, it makes the database size very manageable in the context of trustless, distributed networks.

I'm not following the logic of the argument here though, jumping from "X is good for Y" to "...don't actually have huge amount of data", perhaps you can elaborate?

With a merkelized append-only log (immutable DAG), there's always an audit trail. I agree with your point about "mutable look and feel", in a lot of use cases there's only a limited set of "writers" and updates happen infrequently.

Perhaps I should rephrase my previous comment, then, as "immutable systems are good for building mutable systems on top". Does that help to provide a better counter argument?

vog · on Oct 1, 2017

Here's my complete line of reasoning:

You can build mutable systems on top of immutable (append-only) systems. But is that a good idea? Yes, it is, for systems which don't have huge amounts of (non-static) data, and/or system which need an audit-trail anyway. And these are more systems than one may initially think.

carolc · on Oct 1, 2017

"Here's my complete line of reasoning: You can build mutable systems on top of immutable (append-only) systems. But is that a good idea? Yes, it is, for systems which don't have huge amounts of (non-static) data, and/or system which need an audit-trail anyway. And these are more systems than one may initially think."

I disagree that immutability is a negatively defining factor here re. data size or capabilities of the database.

If you look how many Big Data systems process data, you'll find that at the core of many, is an append-only log. For example: Kafka is a log (https://engineering.linkedin.com/distributed-systems/log-wha...), and looking at Apache Samza's architecture, we can see how a log is at the core of it (https://www.confluent.io/blog/turning-the-database-inside-ou...). In less Big Data orientated databases, there's always a log of operations (sometimes also called a transaction log or replication log) to keep the track of changes.

blurbleblurble · on Oct 1, 2017

I think git is a great example of bridging the mutable/immutable gap. The "mutable" stuff happens locally in the ram, or on a local filesystem, as someone edits their files, debugs, whatever. A commit represents a save checkpoint. Somebody has decided that this state is worth snapshotting, that it would be a useful reference down the line. At this point an immutable version is made, ready to be shared.

As with git, even if a version (commit) is immutable, it doesn't mean it's worth saving. Lots of times, you might make a temporary branch locally to do some work. Then you'll merge it and push the merged version upstream. Later you might check out a new copy from upstream, not caring that your temporary working branch isn't there.

User friendly versioning is a major challenge for dynamic, distributed applications. How do we gracefully bridge the gap between long term (distributed) memory and short term (local) memory? Each specific application has its own needs and tradeoffs.

And how do applications communicate about which versions are compatible with the applications' needs? About which versions are worth holding onto?

userpass · on Oct 1, 2017

I don't really get it. Sure it's fine if one p2P app uses 3GB (1GB for the append only log, 2GB for a database with indices that can actually be queried) of data. What if you have several apps? Let's say 10. Then you need 30GB and because people only have 32GB to 64GB of storage on their phones the discussion ends right here.

blurbleblurble · on Oct 1, 2017

I didn't downvote you. But your data sizes are arbitrary.

Why would something like a chat or email app need to hang onto that much history?

Imagine a distributed "email" app that uses networks of mutually trusted peers to deliver encrypted messages ("emails") asynchronously. My device doesn't need to hang onto your emails indefinitely. It only needs to hang onto them until they've been received. This could be done via explicitly sending receipts, or probably in most cases by giving stuff simple expiration dates. The sender would have the most incentive to hang onto the original message until its been delivered.

How this scales in terms of MB and GB is hugely dependent on how your application is configured, how frequently new data is emerging, the limits set by peers for how much they're willing to share, etc. But text is pretty cheap. I can't imagine storing 3 GB of yours or someone else's text emails on your phone, short term or long term. The raspberry pi plugged into the wall at your house can has much more storage anyway ;)

imtringued · on Oct 1, 2017

I don't really see why a CRM needs to be decentralised. You need to host it yourself to avoid a cloud vendor going out of business but other than that what problem do you solve by decentralising it?

OtterCoder · on Oct 1, 2017

Delivery. A cloud provides worldwide availability at the cost of trust. A distributed site can survive any one entity failing, which includes you, and it can serve from anywhere your users want it to.

carolc · on Oct 1, 2017

Databases, and dynamic content in general, can be done with/on IPFS.

Take look at OrbitDB (https://github.com/orbitdb/orbit-db) - "Distributed peer-to-peer database for the decentralized web" or their blog post "Decentralized Real-Time Collaborative Documents - Conflict-free editing in the browser using js-ipfs and CRDTs" (https://blog.ipfs.io/30-js-ipfs-crdts.md).

And all that works in the browser without running a local IPFS in the background. That's pretty amazing imo.

Luker88 · on Oct 1, 2017

In general? No. Just because you can, it does not mean you should use a distributed db. Please remember to say that distributed, open databases have very narrow use cases.

Leaving aside use cases like credit card information, there are a lot of user information that is illegal to share unless the user explicitly consents. In the EU you can't even share your access logs by default.

And how do you handle authentication? Passwords? how do you avoid user enumeration, the collection of user email and info?

Distributed filesystems and CDN in general are great, but let's use them for things that do not actually need a single bit of security, please.

carolc · on Oct 1, 2017

> "Distributed filesystems and CDN in general are great, but let's use them for things that do not actually need a single bit of security, please."

The notion that distributed filesystems are inherently, or can't be, secure is way off. I would argue that with these technologies, such as IPFS, they can be more secure.

The use cases are not only "open databases" (by which I assume you mean open to public), private databases and data sets can be achieved just as well. Just because it's "distributed" doesn't mean it can't be private or access controlled.

Agreed on the comment re. "...illegal to share unless the user explicitly consents" and I believe this will turn out better in the trustless, distributed web, eventually. Our whole current approach is based on the client-server paradigm forcing us to put every user and their data into one massive centralized database. But we can change the model here. Instead, how about you owning your data(base) and controlling who gets to access it? "Allow Facebook to read your social graph?" "Oh, no? How about another social network app?". As a user, I would want to have that choice.

That bridges to your next point on authentication, which can be done on the protocol level with authenticated data structures. You can define who can read/write to a database by using public key signing/verification. It could be just you, or it could be a set of keys. One good example of this is Secure Scuttlebut (http://scuttlebot.io/). I highly recommend to take a look an understanding the data structures underneath.

Luker88 · on Oct 1, 2017

The problem is not that only authorized clients can write, that is the easy part (signatures are enough for that).

The problem is limiting read access. Having a globally distributed db means that anyone can get a copy.

You can use ipfs for public data, and to store private encrypted data (with caveats: make sure you change the encryption key/nonce for every data change).

There is no way to modify private data depending on anonymous access without things like omomorphic encryption, and the whole system is completely void of any form of forward secrecy.

As someone who works with encryption and security, I can not recommend storing anything private on distributed systems. Leaks way too much data, and there are too many caveats. You can have securely designed applications, but I see no way to safely port common websites completely on distributed infrastructure without leaking a lot of data that today is supposed to be kept secret.

userpass · on Oct 1, 2017

http://scuttlebot.io/more/protocols/secure-scuttlebutt.html >"Unforgeable" means that only the owner of a feed can update that feed, as enforced by digital signing (see Security properties).

https://github.com/ssbc/patchwork >You have to follow somebody to get messages from them, so you won't get spammed.

Doesn't that make it completely pointless because updates are still centralised? It merely shifted trusting a single provider to trusting each user which is not a scalable solution. The value add is so low you might as well just use IPNS and make people subscribe to IPNS addresses.

arj · on Oct 1, 2017

But it is scaleable. On scuttlebot you follow people just like you have friends in real life. I also don't need to ask the government permission to talk to that person. That is DEcentralized for you right there.

imtringued · on Oct 1, 2017

Think of orbitdb as a git repository where everyone has write access. Malicious nodes will spam hundreds of gigabytes of data into it until it's large enough nobody can clone it. Even if you solve the spam problem you still have the problem that you need to download the constantly growing dataset. The blockchain is already 160GB big even though it's transaction throughput is anemic.

Full p2p applications can't offload computation, computations have to happen on your computer and for computations you need the entire dataset. This is fine for messaging and static files.

Federation is a far better idea. You get the benefits of centralisation and decentralisation.

lgierth · on Oct 1, 2017

Noms is also a great example of a peer-to-peer database: https://noms.io

carolc · on Oct 16, 2016

Orbit used to use a Redis server for pubsub messaging but it's not anymore as IPFS has implemented a peer-to-peer pubsub. Totally serverless.

fizzbatter · on Oct 16, 2016

Are you familiar with IPFS pubsub? Would you be able to link some information about the implementation/usage?

I'm quite surprised to hear what you said. I've been following multiple Github Issues on IPFS pubsub, and none of them (that i followed) announced success/etc. I thought it was still in the planning phase.

diggan · on Oct 16, 2016

There is some prototype work done in master already (if you build go-ipfs from source).

   $ ipfs pubsub --help
   
   USAGE
     ipfs pubsub - An experimental publish-subscribe system on ipfs.
   
   SYNOPSIS
     ipfs pubsub
   
   DESCRIPTION
   
     ipfs pubsub allows you to publish messages to a given topic, and also to
     subscribe to new messages on a given topic.
     
     This is an experimental feature. It is not intended in its current state
     to be used in a production environment.
     
     To use, the daemon must be run with '--enable-pubsub-experiment'.
   
   SUBCOMMANDS
     ipfs pubsub ls                    - List subscribed topics by name.
     ipfs pubsub peers                 - List all peers we are currently pubsubbing with.
     ipfs pubsub pub <topic> <data>... - Publish a message to a given pubsub topic.
     ipfs pubsub sub <topic>           - Subscribe to messages on a given topic.
   
     Use 'ipfs pubsub <subcmd> --help' for more information about each command.

fizzbatter · on Oct 16, 2016

Wow, thank you! Very interesting! Quite odd that none of my follows mentioned this going live, must be an issue of too many github issues.

Time to dig into this a bit!

edit: For anyone else perplexed and surprised by me, seems `floodsub` is their moniker for the new tech, and it was (in part) merged in here: https://github.com/ipfs/go-ipfs/pull/3202

This is really, really cool! Also, if this implementation is robust and performant, this is huge for IPFS.

lgierth · on Oct 16, 2016

Yes we haven't been very vocal about it yet because it's not yet part of a go-ipfs release -- it'll be in the next release, go-ipfs v0.4.5.

Note on the name: floodsub is one specific implementation of pubsub. There are many ways to do pubsub, and we chose to go with a naive and simple flooding implementation for now. In the future there will be more, and you'll be able to use different implementations simultaneously.

carolc · on June 15, 2016

(IPFS dev here)

We've been working hard to get js-ipfs to a working level and just last week announced it's ready. While it's still early days and we have plenty to do to make it even more robust, we have it fully working with WebRTC support as of today.

We've developed some apps and demos that essentially enable you to create dynamic content and apps on IPFS, purely in the browser. We do have to currently rely on a centralized Pubsub (in a similar fashion to torrent trackers) but we're working on to provide a decentralized pubsub mechanisms as part of IPFS.

Regarding persistency: js-ipfs and the main ipfs (native go-ipfs) networks are not yet connected but we're very close to having that. Once they all communicate in the same network, we can provide a lot better persistency as users don't have to keep the tabs open in the browser.

Take a look at some of the examples of what can be done today with js-ipfs, that is fully in the browser:

https://github.com/haadcode/orbit (specifically https://github.com/haadcode/orbit/tree/js-ipfs)

https://github.com/ipfs/paperhub

https://github.com/haadcode/orbit-db

https://github.com/haadcode/proto2