OpenAleph new features: Audio/Video extraction, Geocoding & Distance lookups, Peoples faces

simon · May 26, 2025, 11:13am

Hey all

as already shown at this years DataHarvest conference, we introduced a set of a few nice & new features into our soft fork of Aleph, called OpenAleph:

Making audio & video searchable with AI
- Example dataset
- Search example
Geocode Address entities via ftm-geocode and show geographical distance to other entities
- Example dataset
- See the distance between DARC and OpenStreetMap Foundation
Show profile pictures for Person entities with a wikidataId property (via ftm-assets)
- Example dataset

These features are a sneak preview of what we are currently working on in the open. Consider them “work in progress” and follow our discourse for more updates. Of course we will update you here, too.

As announced previously, OpenAleph is a project by DARC and a range of other contributors and will always stay open source.

Feel free to comment for any questions or suggestions!

Cheers,
Simon

simon · May 26, 2025, 11:17am

We also have blogposts about the audio extraction feature:

Feature intro: Making Audio and Video Files Searchable
Technical background: AI Transcriptions in OpenAleph: the First Steps

pudo · May 26, 2025, 11:18am

The distance tables and the audio transcript are absolutely nuts! Are you running ftm-geocode before sending the entities to the API, or did you end up doing it as an analyzer in the ingest framework?

simon · May 26, 2025, 11:21am

Currently, we would preprocess them before sending to the api. But we are actively working on making it part of the ingestion process. For this, we are currently considering if it should be part of ingest-file which would blow up the already huge docker image because of libpostal dependencies, or if this now (together with whisper) is a great time to split up ingest-file in smaller parts (which would introduce a rework of the task queuing, again… )

Just thinking out loud here.

pudo · May 26, 2025, 11:24am

Ah nice, so ingest-file and analyze-entity in essence? We played with this before and it seemed like a nice pattern to me: GitHub - alephdata/translate-service: Demo: document processing service for automated translation

simon · May 26, 2025, 11:26am

Yes, something like this. As more advanced analyzing features would come in (looking at AI ) you might want to run different ingestion services on different infrastructure, ideally without docker or any other virtualization, for specific tasks.

alex · May 29, 2025, 9:12am

We published a blog post about the feature that displays an image next to every FTM entity that has the wikidataId property set. We also published a tutorial that walks folx, step by step, through scraping data, creating FTM entities, uploading them and enjoying the little image thumbnails in their own instance.

We’re planning to release more of these little tutorials, as Jupyter notebooks, because it feels like they’re excellent ways to explore the entire ecosystem of open-source tools that has emerged around Aleph. And, who knows, maybe one they these can serve as the basis of a curricula!

Topic		Replies	Views
December 2023 Roadmap Update Announcements roadmap	0	120	December 20, 2023
Work witch CSV files Development ingest-file	0	58	April 12, 2024
Introducing: OpenAleph and the Data and Research Center – DARC Community Projects	2	206	April 1, 2025
Welcome to Aleph! :wave: General	1	101	June 30, 2023
OpenAleph 5 release Community Projects	0	54	September 2, 2025

OpenAleph new features: Audio/Video extraction, Geocoding & Distance lookups, Peoples faces

Related topics