These features are a sneak preview of what we are currently working on in the open. Consider them “work in progress” and follow our discourse for more updates. Of course we will update you here, too.
As announced previously, OpenAleph is a project by DARC and a range of other contributors and will always stay open source.
Feel free to comment for any questions or suggestions!
The distance tables and the audio transcript are absolutely nuts! Are you running ftm-geocode before sending the entities to the API, or did you end up doing it as an analyzer in the ingest framework?
Currently, we would preprocess them before sending to the api. But we are actively working on making it part of the ingestion process. For this, we are currently considering if it should be part of ingest-file which would blow up the already huge docker image because of libpostal dependencies, or if this now (together with whisper) is a great time to split up ingest-file in smaller parts (which would introduce a rework of the task queuing, again… )
Yes, something like this. As more advanced analyzing features would come in (looking at AI ) you might want to run different ingestion services on different infrastructure, ideally without docker or any other virtualization, for specific tasks.
We published a blog post about the feature that displays an image next to every FTM entity that has the wikidataId property set. We also published a tutorial that walks folx, step by step, through scraping data, creating FTM entities, uploading them and enjoying the little image thumbnails in their own instance.
We’re planning to release more of these little tutorials, as Jupyter notebooks, because it feels like they’re excellent ways to explore the entire ecosystem of open-source tools that has emerged around Aleph. And, who knows, maybe one they these can serve as the basis of a curricula!