
All of this is so I can make only the changes that need to be made.

sqlite, they might be good for an UPDATE. Finally, I can compare records where some ID is present in my input and my. If a record is not present in my input and is there in the target, that would suggest a DELETE. Then I can mark certain records as not being present in the target database, so they must be INSERTed. Often (and this is task-dependent), I will have to pull in data from other server-based databases, typically the target. Also, I can reference something that might be DELETEd in an a later table. This is so I can always go back one step for any post-mortem if I need to. It is as "raw" as I can get it.Įach successive transformation occurs on a new table. The input data is loaded into the appropriate tables and then indexed as appropriate (or if appropriate). This could be some default values or even test records for later injection. Some configuration data is loaded in from files first. sqlite database from scratch each time in Python, building out table after table as I like it. SQLite (or other embedded database, like BerkleyDB) can give the best of both worlds- fast random access, low memory usage, and easy to ship. if you just serialize your model after training it), it can be too large to fit in memory. If you store that data in Postgres, Mongo or Redis, it becomes hard to ship your model alongside with updated data sets. I'm not saying ALWAYS use SQLite for these cases, but in the right scenario it can simplify things significantly.Īnother similar use case would be AI/ML models that require a bunch of data to operate (e.g. Seems like lot of people's default, understandably, is to use JSON as the output and intermediate results, but if you use SQLite, you'd have all the benefits of SQL (indexes, joins, grouping, ordering, querying logic, and random access) and many of the benefits of JSON files (SQLite DBs are just files that are easy to copy, store, version, etc and don't require a centralized service). I think a good under-appreciated use case for SQLite is as a build artifact of ETL processes/build processes/data pipelines.
