Hi! We want to reduce infrastructure costs and are considering using Elasticsearch as our primary database without PostgreSQL. In this case, would it be possible to completely eliminate the need for an PostgreSQL database to store the original FtM entities? Are there any potential risks or best practices we should consider when making this decision? It is supported by Aleph out of the box?
Thanks in advance!
First off: I think having an Aleph that runs only off of ES (or at least uses postgresql only for housekeeping like user accounts) would be super fun to have. We’re doing something a bit similar with yente
over at OpenSanctions.
The main problem that needs to be sorted out is how to do entity aggregation prior to loading the data to ElasticSearch. ES is sort of the worst possible place for aggregating entity fragments, because each change to an entity would be a full re-index of that entity, and doing a whole fetch-check-update loop for each entity you want to write would be horrifically slow.
So here’s some of what I think are realistic ways of doing this:
- Don’t write mappings that require aggregating entity fragments. I don’t know how to do this on arbitrary data, but if you had a data architecture that enforced a set of rules around tabular arrangement, you could pull that off.
- Do file-based aggregation before loading data to the index. We have tooling in
nomenklatura
which essentially makes entity data sortable, so aggregation just becomes a streaming process on a pre-sorted JSON lines file. - Do aggregation in a key-value store prior to loading, then throw that store away (that’s what OpenSanctions does).
I’ve been racking my head about this for years now and feel like I’m just stuck. If you have any ideas for alterantive approaches, I’d love to hear … the postgres solution, in any case, is not nice, and killing it would be wonderful.
ps. I’ve gone so mad on this problem, I’ve started building a custom database server for FtM data. But that’s doing the opposite of what you asked: make the architecture more complex, not less.