Migrating from SQLite to PostgreSQL

config_yml · on April 15, 2022

> it's a database for localhost only

This is sticking so hard to Sqlite, but it's really no longer true I think. Given WAL mode and tools like litestream, it's even feasible to run Sqlite in serverless offerings like Cloud Run. The schema change constraints are annoying though, even when there are known workarounds.

And, this is off topic, but Tailwind is really becoming the new Bootstrap. I open a page and it's instantly visible that this has been implemented with Tailwind (and often Tailwind UI).

benbjohnson · on April 15, 2022

Litestream author here. Pure ephemeral serverless offerings aren't currently supported with Litestream so you'll run into issues with something like Google Cloud Run. It's on the roadmap to support in the future though.

mtlynch · on April 15, 2022

FWIW, I've used Litestream on Google Cloud Run and never run into issues, but I haven't pushed it much:

https://github.com/mtlynch/logpaste/blob/master/docs/deploym...

benbjohnson · on April 15, 2022

You're a brave soul, Michael. :)

jpcapdevila · on April 15, 2022

@ben Same as the other commenter, I'm currently using it without problems. I configured max instances to 1 and that seems to be working fine. What am I missing?

benbjohnson · on April 15, 2022

The main issue is if GCR tries to start up another instance before tears down your old instance during a deploy. If GCR only runs a single instance at a time then you should be fine.

tianzhou · on April 15, 2022

One of the Bytebase authors here and I am the one who chose SQLite in the first place.

The project started in 2021 Jan and I did notice litestream. And actually, one of the main reasons I chose SQLite is because of litestream. Because I could see it could be our HA solution when needing it.

SQLite is great, but the schema change constraint is really painful as you mentioned.

As to the Tailwind, as a backend engineer working on database for most of my 10+ career and single-handedly built the site, I can only say it's godsend;)

ochoseis · on April 15, 2022

The bottom of the article states:

> Bytebase offers a web-based collaboration workspace to help DBAs and Developers manage the lifecycle of application database schemas.

So, I guess it’s not surprising you’d be annoyed by not being able to change schema constraints :)

no_wizard · on April 15, 2022

Completely unrelated to the thread, but the graphic on your home page doesn't quite re-size correctly, at least in the latest version of Firefox. It blocks the star button at i'd say tablet -> desktop range (somewhere between full width and not quite tablet)

if you go smaller, it does at certain sizes block a bit of the text.

tl;dr it doesn't look like the main graphic on your home page handles fluid layouts

EDIT: this goes with a few other graphics on the front page as well.

I also noticed some of your copy doesn't read well grammatically. "For Developer" I think reads better as "For Developers", as an example.

Only mentioning this cause this is a really cool concept that I wish I had thought of years ago when I worked more heavily with databases and want you to succeed! One of the reasons ORMs are "nice" is the built in schema migration of a lot of the most mature ones (coming from .NET and previously Django). This is a similar idea, but for teams that need high performance SQL optimizations to be able to change schema that may be written against "raw" SQL / Window functions etc.

tianzhou · on April 16, 2022

Thanks for the feedback and the encouraging words.

I did try, but front-end development is harder for a database engineer...

Definitely could be improved here.

born-jre · on April 15, 2022

any plan to maintain this[0] as separate library so anyone can embed pg in go binary?

[0] https://github.com/bytebase/bytebase/tree/6a72eb630c604e2fe7...

edit: there is already sth like that https://github.com/fergusstrange/embedded-postgres

christophilus · on April 15, 2022

I wouldn't have noticed the Tailwind thing, and I'm building a product with Tailwind at the moment. But yeah. I agree that Tailwind does seem to be gaining that level of traction.

It's so, so nice, though, so I think it deserves its success.

trinovantes · on April 15, 2022

One of my favorite features of SQLite is its memory mode. It makes writing tests really quick and easy compared to having to spin up a test Postgres database and then deleting it.

bob1029 · on April 15, 2022

We use sqlite for everything. One novel application is the evaluation of business rules. What we do is create an in-memory sqlite DB, fill in with relevant domain info, then execute the queries against it to evaluate logical outcomes. Sqlite's unique application-defined function capability is what sold us on this path. It's very very fast too. Thousands of queries per second per instance are no problem.

munk-a · on April 15, 2022

We've found quite a few ways to tune DB testing to make it as light as possible but yea - it's still not free by any measure - you can ensure a sane base data set (any static data, all the table schemas) and have minimally necessary writes to set up test cases but you'll never get as close to pure in memory exercise.

I wonder if you could write a postgres shim layer to run your tests on a SQLite in-memory instance - though I guess you're at the whims of whatever that shim layer is doing matching actual postgres functionality bug for bug.

giraffe_lady · on April 15, 2022

It's no longer technically true that sqlite is limited in that way. But that is relatively new territory (compared to plain sqlite or postgres for sure) so you can expect some unknown troubles there.

I'd be willing to launch a product on it in some circumstances, but depending highly on the resources, composition, goals, and risk tolerance of the team. I'm excited about where it's going but DB is so important and the alternatives are so well understood it still feels like a big move.

_wldu · on April 15, 2022

You can also run a simple rest API on the machine hosting the sqlite DB and access it from other machines. I've found that works fairly well.

Clients -> Internet -> API -> sqlite

theobr · on April 15, 2022

Re: the "tailwind is the new Bootstrap" bit - half agree.

tailwind-typography is highly opinionated and "automatically applied", unlike Tailwind itself. I don't think that Tailwind will be quite as immediately visible outside of blogs....unless TailwindUI really takes off.

Unless you're really sensitive to colors, then you might notice the Tailwind gray

RedShift1 · on April 15, 2022

That's really pushing SQLite where it was clearly not engineered to go. If you need something that is accessible over the network, use a database server that is designed for it. The right tool for the right job.

jpalomaki · on April 15, 2022

"SQLite does not compete with client/server databases. SQLite competes with fopen()."

https://www.sqlite.org/whentouse.html

jhgb · on April 15, 2022

Firebird can do both, though.

mekster · on April 16, 2022

I often see this quoted but I don't see how they compete at all? Or am I just not getting a joke?

antifa · on April 16, 2022

I have seen a lot of enthusiasm for using sqlite instead of postgres for django/laravel/rails type projects, or just using it for your tests. I think both of these opinions are over valued since it's easier to learn postgres once then to learn sqlite then migrate to postgres.

The real developer cost of "install postgres, createuser, createdb" is basically nothing.

I see the cost of ever using sqlite (coupling my tests/and/or app, migrating as soon as you have 2 users) as much higher.

Sqlite does seem like a safe, solid, and sane default for the embedded/fopen people though.

er4hn · on April 15, 2022

Nonsense. Just be like the Tailscale team and use SQLite [1] to serve your entire customer network.

[1] https://tailscale.com/blog/database-for-2022/

simonw · on April 15, 2022

My sqlite-utils CLI tool and Python library includes a workaround for SQLite's ALTER TABLE limitations.

The transform CLI command (and Python method) automates the process of starting a transaction, creating a new table with the required changes, copying across the data from the old table, then dropping the old and renaming the new before committing the transaction.

Documentation here:

- Python library: https://sqlite-utils.datasette.io/en/stable/python-api.html#...

- CLI command: https://sqlite-utils.datasette.io/en/stable/cli.html#transfo...

- Blog entry: https://simonwillison.net/2020/Sep/23/sqlite-advanced-alter-...

CodesInChaos · on April 15, 2022

I'd consider using generated as identity instead of serial in postgres.

https://www.2ndquadrant.com/en/blog/postgresql-10-identity-c...

And if any of your tables might grow beyond 2B rows, I'd use 64-bit IDs (serial8) from the start to avoid an expensive migration later.

ttfkam · on April 16, 2022

Sequential UUIDs: safer for avoiding errant results from mistyped joins and better clustered database support down the line while keeping good performance levels.

https://www.2ndquadrant.com/en/blog/sequential-uuid-generato...

"I say we take off and nuke the whole sequence-generated id installation from orbit. It's the only way to be sure." – Ellen 'SQL' Ripley

jmmv · on April 15, 2022

What I've been doing in some of my recent projects is to support both SQlite and PostgreSQL. I do this via an abstraction layer that hides the two databases behind it, and I run the exact same set of unit tests against the two implementations.

The reason I do this is for testing. In fact, the SQLite backend is only compiled for test builds. With that, I can test my application logic against the SQLite implementation and be reasonably confident that things work with the PostgreSQL backend as well. And this is infinitely easier and faster than relying on a "real database server". No installation involved, and the same tests run in 300ms as opposed to 8+ seconds.

Yes, it's significantly more work to develop around these two implementations, but the benefits in build/test iteration times are immense. I haven't ever had to touch docker yet. And the abstraction layer also keeps me honest by forbidding me from poking into database internals.

ttfkam · on April 16, 2022

Lowest common denominator database feature set makes a DBA sad. I love SQLite, but Postgres is just so much more feature full, it's no contest.

If all you need is what SQLite can provide, go for it! It's lightning fast and astonishingly low overhead.

But other times you just need those event triggers and advanced indexing and range types and rollup and grouping sets and writable CTEs and lateral joins and materialized views and row-level security and…

mekster · on April 16, 2022

Yes, you can't possibly write triggers and checks on columns across different databases to the point you should be doing all data checks in the app if you're going to support multiple databases.

These days, there's docker, so I see less point supporting multiple databases.

throwthere · on April 15, 2022

I tried that but moved completely away from it. Too many issues around sql dialect and type differences for starters. I’d much rather run postgres on my dev/testing boxes, use a single set of sql-based migrations and not have to maintain an abstraction layer.

antifa · on April 16, 2022

I could see OP's idea working only if you never strayed away from an ORM to use advanced database features or performance reasons.

colinchartier · on April 15, 2022

When you say "8+ seconds", is that the startup time for the postgres instance?

tailspin2019 · on April 16, 2022

I’m doing a very similar thing with one of my projects. Iteration speed is much faster for my use case too.

pizza234 · on April 15, 2022

For those who don't know, there's no need to download the full MySQL package(s) that the article references ("200+ MB"); there is the "minimal" package, which includes all the required binaries - as of v8.0.28, it's ~52 MiB.

This is a relatively new introduction; v5.7 doesn't have any minimal packages.

teddyh · on April 15, 2022

The MariaDB packages for Debian and Ubuntu are similarly tiny.

habibur · on April 15, 2022

I was more interested in knowing performance characteristics.

For the same service how did RAM+CPU load change pre and post migration?

dannyibunny · on April 15, 2022

Postgres is much heavier than SQLite in terms of binary size/CPU/RAM, but I think scalability is the win.

masklinn · on April 15, 2022

OTOH if you don't tune it the default postgres configuration remains extremely low-resource (which also hampers its performances): 128MB shared buffers, 8MB temp buffers, 4MB work_mem, 64MB maintenance_work_mem, 100 connections, 1000 files open.

mayli · on April 15, 2022

Compared to less than 1MB for sqlite in this use case.

sebazzz · on April 15, 2022

> Then it is a lot of engineering work to update existing SQL queries. SQLite (similar to MySQL) dialect uses the question mark parameters while PostgreSQL uses positioned parameters for prepared statements.

Many database engines, including SQL Server for instance, also support question marks. Surely PostgreSQL also supports this?

uhoh-itsmaciek · on April 15, 2022

As some other replies noted, Postgres does not support that. Some drivers or ORMs do their own pre-processing on query text, and add their own support for that. E.g., in ActiveRecord, you can do:

    User.where("status = ?", "active").all

ActiveRecord itself sanitizes the input here and the query that's sent to Postgres does not use parameter markers at all.

nikeee · on April 15, 2022

SQLite also supports referencing by parameter number (and name):

https://www.sqlite.org/lang_expr.html#varparam

So, changing "?" to "?1" could be done in SQLite before the actual migration, leaving just the change from ? to $ when migrating to Postgres.

mjw1007 · on April 15, 2022

? in PostgreSQL is available for use as a user-defined operator.

I suppose they made that decision a long time ago.

packetlost · on April 15, 2022

No, it doesn't. Using a '?' will cause a syntax error.

masklinn · on April 15, 2022

Interestingly, there is one "platform" which supports ? as placeholder: ECPG (where SQL is directly embedded in C code, and post-processed to generate straight C you can feed to a compiler).

In ecpg, prepared statements use ?: https://www.postgresql.org/docs/current/ecpg-commands.html#E...

nurbl · on April 17, 2022

One thing to keep in mind when moving from sqlite to an "out-of-process" database is that individual query overhead will increase a lot. With sqlite it's usually no problem to do lots of small queries and combine the result, but when you need to account for network latency etc it's a different game. This could be unintended; e.g. your ORM use causes lots of queries that you didn't know or care about with sqlite. You may have to rethink the way you do things in order to keep performance.

impoppy · on April 15, 2022

There’s only one reason I keep using PostgreSQL instead of SQLite for my pet projects and it is the “RETURNING” statement

UPD: I know it’s there, but the library version shipped with interpreters and compilers are often outdated

flakiness · on April 15, 2022

Note that they aren't SaaS but are providing a locally-installable software it seems [1]. Then SQLite seems to be a reasonable option and PG can be overkill (but switched to PG anyway because of the listed motivations.)

[1] https://github.com/bytebase/bytebase#installation

tianzhou · on April 16, 2022

Bytebase author here. We have a web service at hub.bytebase.com to deal with license purchasing and is also backed by PostgreSQL. And we also plan to offer a SaaS service in the future.

One thing the article doesn't mention is database unification. PostgreSQL is such a ubiquitous database, which enables us to keep a single database system for a long time.

flakiness · on April 16, 2022

That makes sense. Thanks for chiming in!

sebastianconcpt · on April 15, 2022

Sounds like it was a smooth migration. Good to hear that because in this precise moment I'm adding SQLite support to Mapless [1] (PostgreSQL had support already [2]) so people can do these kind of smooth transitions in their Smalltalk apps.

[1] https://github.com/sebastianconcept/Mapless

[2] https://blog.sebastiansastre.co/article/mapless-is-online-ag...

fmajid · on April 15, 2022

I’d love to see a good open-source sync solution between SQLite and PostgreSQL to get the best of both worlds, like what the Watcom/Sybase/SAP SQLanywhere offered.

simonw · on April 15, 2022

I have a tool that can do that in one direction - db-to-sqlite (built on top of SQLAlchemy) can create a SQLite database with a direct copy of the schema and data from a MySQL or PostgreSQL database.

I've contemplated building it to work on the other direction too but I haven't needed it myself yet so I've not done the work.

https://datasette.io/tools/db-to-sqlite

EnigmaCurry · on April 16, 2022

awesome! https://pgloader.io/ can do it in the other direction.

thunderbong · on April 15, 2022

There's a Ruby ORM called Sequel [0] (Github [1]) which provides a cli [2] to do just that across different databases.

Example from the docs -

  sequel -C mysql://host1/database postgres://host2/database2

or for Postgres > SQLite

  sequel -C postgres://host2/database2 sqlite://database.sqlite

and vice-versa

  sequel -C sqlite://database.sqlite postgres://host2/database2

[0]: https://sequel.jeremyevans.net

[1]: https://github.com/jeremyevans/sequel

[2]: https://sequel.jeremyevans.net/rdoc/files/doc/bin_sequel_rdo...

dannyibunny · on April 15, 2022

Totally agree, compatibility is really the key for any database product or any product. Minimizing the migration gap helps adoption.

candiddevmike · on April 15, 2022

Anyone know how far you could get with just SQLite and some kind of in-memory cache?

avdempsey · on April 15, 2022

You can get pretty far if you don't have much in the way of concurrent writes. Concurrent writes can be safe mind you, but that safety comes from locks that are polled against at increasingly long intervals (at least in my experience backing Django with SQLite). If you have one thread making many short writes over a long period, another thread will check to see if it can acquire a lock at 1 second, 2 seconds, 5 seconds, etc and if it doesn't time that check exactly perfectly you'll keep polling until your timeout. That is, out of the box you don't have a fair queue for the write lock. That kind of access pattern doesn't describe all apps of course! And in WAL mode your reads won't be blocked by these long sequences of writes.

listenallyall · on April 15, 2022

SQLite is nice, but if you have a JDK, H2 is so much better.

lysecret · on April 15, 2022

Main thing I am missing in SQLite is array or Json support.

JamesSwift · on April 15, 2022

SQLite JSON support is quite robust. I leverage it heavily for https://github.com/J-Swift/cod-stats

e.g. https://github.com/J-Swift/cod-stats/blob/master/frontend/ge...

benbjohnson · on April 15, 2022

SQLite has JSON support via the json1 extension[1]. IIRC it's bundled into SQLite by default in recent releases. As for arrays, you're SOL unless you use JSON arrays. :)

[1]: https://www.sqlite.org/json1.html