Introducing yente - an FtM API server

Hey all, since it’s become obvious that a little more communication can’t hurt the Aleph world, I thought I might make an introduction for another citizen of that universe here, yente.

yente (the match-maker :wink: ), is a high-performance, simple-to-deploy data matching API for FollowTheMoney data. Similarly to Aleph, it uses ElasticSearch (or OpenSearch) as a backend - but serves a much smaller functional scope: no documents, no authorization, no entity CRUD.

Instead, yente will read a manifest file to identify FtM datasets - either from a local file or a remote URL - to index. The application then exposes /search, /entities/[id] and graph adjacency APIs. The core, however, is the /match API, which allows for dataset/watchlist screening.

At OpenSanctions, we use yente both as the backend for our SaaS service, and as an on-premise, privacy-preserving appliance that’s deployed in the data centers of around 100 customers and partners.

Here’s some fun things that yente can do:

  • Support multiple, pluggable matching algorithms for scoring entity matches. We maintain a set of these matchers in nomenklatura and expose them in the yente API with a lot of knobs for fine-tuning the weighting (of e.g. country, DOB, name feature matches).
  • Be quick. It’s all written in async Python on top of FastAPI, which allows fun things like doing parallel calls to ElasticSearch for bulk matching. Our production is running at ca. 50 rps, some of our on-prem deployments pipe 4 mn queries through the thing every night.
  • Support atomic dataset updates (never accessing a half-updated set of entities), and entity deltas that make it easy to ship small bundles of changes to larger datasets quickly.

One of the things I’d love to be able to do in the future is to bring down the number of containers that people have to deploy (ie kick Elastic?), and to support statement-based entity data directly.

In any case, I thought I’d share this because having a lightweight entity data API might be interesting for people who want to process or publish FtM data in specific contexts where a “full Aleph” is over the top…

1 Like