OpenAleph 5 release

OpenAleph, a fork of Aleph 3.x maintained by Data and Research Center, has released a major version: OpenAleph 5.

Here’s what’s new:
:magnifying_glass_tilted_right: Discovery Dashboard: see the people, companies, organizations, and locations that appear most frequently in your dataset, along with names that are closely correlated with them
:light_bulb: Search suggestions: related names surface alongside your query to refine results and reveal hidden connections
:sparkles: Sharper highlights: more precise and visible highlighting for terms and mentions in documents
:bar_chart: Status transparency: track running jobs and (for admins) spot failures, with visibility coming to all users in v5.1
:globe_showing_europe_africa: Names across cultures: improved recognition across alphabets and name variations
:open_file_folder: Fallback text extraction: Apache Tika support for more file types
:gear: Infrastructure improvements: a modularized codebase for easier development and future features

Anyone give the new discovery features a go on the OpenAleph instance that DARC maintains:
:magnifying_glass_tilted_right: Dashboard
:light_bulb: Search

We’re grateful for the excellent work & care put into Aleph, that has allowed us to build these features on the shoulders of giants. We’re going to publish some more in-depth articles about the technical details behind these new functionalities this week.

3 Likes

Hey community,

we released OpenAleph 5.1, a big upgrade with many new features (includding tagging, precise synonym search) and improvements. OpenAleph is the open source fork of the sunsetted Aleph project.

We as well updated our documentation with a section for deployment and migrating from earlier Aleph/OpenAleph versions. Don’t hesitate any questions!

release notes OpenAleph 5.1

Read our release blog post

New feature

  • A new tagging feature to organize team research workflows

  • Enable or disable synonyms search per each search query with a toggle

  • Helpful management commands for reingesting/-indexing specific documents

Breaking changes

After running OpenAleph 5.0.x for some time on our big instance, we noticed some performance improvements and a slight restructuring of the index would benefit even more the future development.

What’s changed:

  • Reduced complexity of elasticsearch ingest/analyze pipeline

  • Introduce a dedicated Page index that only stores child pages of documents. By separating this from the existing Pages index (introduced in 5.0) this decreases storage costs as we don’t need to store full text here for highlighting, as opposed to the Pages entity (which is the parent of 1 or more Page entities.)

  • Reworked admin/deployment documentation

Because of the changed analyzers and the new Page index, this requires reindexing. Read our notes about re-indexing. It’s not a blocker anymore.

Refer to the updated migration guide in our docs for how to upgrade from Aleph 3 and Aleph 4 to this most recent version.

Other improvements

  • Updated migration guides and docker setup (example docker-compose.yml)

  • Fixing bugs and regressions introduced by our 5.0.x releases

  • Introduced optional ingest uri to seperate Elasticsearch ingest nodes (if running any custom processing pipelines on the cluster)

:cherry_blossom: Spring is here :cherry_blossom: and so is :rocket: OpenAleph 5.2!

This release tackles two things investigators deal with constantly: email and language barriers.

:e_mail: Stronger email ingestion

Email is one of the richest sources of evidence in investigations, and one of the messiest. Different formats, fragile metadata, nested threads, attachments hiding everywhere… it’s a lot.

In OpenAleph 5.2, we focused on making the email pipeline more reliable so investigators don’t miss crucial details. That includes better attachment extraction, smarter file-type detection, improved handling of “sent on behalf of” messages, and repair attempts for malformed headers.

:globe_showing_europe_africa: Secure, local document translation

We’re also introducing built-in translation for FollowTheMoney Documents. Using open-source models like Argos and Apertium, OpenAleph can now translate documents locally, without sending sensitive material to external services.

Even better: translation can run during ingest, enabling cross-language search. Search for “contract” and you’ll also find documents containing “Vertrag”, and more.

:sparkles: And yes, the UI got a little glow-up too.

Huge thanks to the community (and :penguin: Pierre Romera Zhang at International Consortium of Investigative Journalists (ICIJ) for the inspiration and knowledge exchange around ES Translator).

Here’s where you can read more about the changes:
the full release announcement
• a technical deep dive on the translation feature
• a technical deep dive on email ingestion improvements

As always, feedback from our community helps shape where OpenAleph goes next. So please let us know what you think!

1 Like