Aleph Q3 2024 Roadmap Update

Welcome
Hey everyone. Welcome to the Q3 2024 Aleph Roadmap update. We want to take a moment to dive into the teams work over the past few months, update you on the features we’ve been working on, and let you know what is coming up.

2024 Aleph Questionnaire
A huge thank you to everyone that took the time to fill out the 2024 Community questionnaire. This feedback is massively influential in helping us identify where to focus our efforts over the coming year. Let’s take a brief look at some of the broad themes that were raised:

Aleph has a great range of data
This is something we already suspected but it’s great to get confirmation. A lot of you commented on the range of information available on our Aleph instance aleph.occrp.org. In particular you highlighted how it enables you connect the dots between the various entities in your investigations, and that the breadth of information that is available makes Aleph a great place to start your investigations.

The data in Aleph is managed and maintained by our data desk team who do an absolutely amazing job of making sense of the various datasets that we have access to and ensuring that they are imported into Aleph in a way that makes the most sense.

Aleph is useful for holding and organizing your information as well as searching
A few of you mentioned that you use Aleph not just for searching for insights, but also for uploading, organizing and visualizing your data and investigations. Tools such as bookmarks, timelines, and network diagrams are often overlooked in Aleph as users will often want to find key insights, but it’s great to see that a number of your are getting values from these investigative tools.

If you want to play around with these tools we’d highly recommend logging into Aleph and creating an investigation to see what you can do. If you’d like to learn more, check out our user guide which is published here:

Performance is an issue
The most consistent feedback that we’ve received in this years survey relates to performance. A number of you mentioned the time it takes to upload information into Aleph or the time it takes to perform a search.

To some extent Aleph is suffering from its own success. The size of our search index, number of active users, and amount of new data being added has doubled in the last 2 years and although we’re working hard to increase our capacity, this work is technical, and takes time. With that said, we are confident that the infrastructure changes we are making during the latter half of this year will make a noticeable impact on Aleph, allowing it to run faster and more effectively than ever.

Search needs to be better
There were calls for us to address the search experience. You described the search experience as slow, and that it can be difficult to find key insights amongst all the other data that is available. There were some calls to deal with the duplication of information and to improve the experience around dataset and document search.

We are planning to start addressing the search experience in the near future. We want to make it easier to see the information that you’re searching for in the results that you are interested in, as well as looking at ways in which we might be able to reduce the noise that can often get in the way of finding those key insights.

We need to look at user experience
Following on from the feedback on search we also know that we need to look more generally at the user experience, modernizing it and reviewing the way in which we organize and display information. This will likely be a longer term project for the team, but one that we are keen to start investigating.

Just Landed - Updated Technical Documentation
In other news, we’ve overhauled our technical documentation to make it easier for you to find and setup your Aleph instances as well as new sections on dealing with common technical issues. The updated technical documentation is already live on over at docs.aleph.occrp.org. If you’ve not done so already, please go check it out and feedback to use either on Slack or Discourse if you have any questions or suggestions.

Aleph 4.0.0
A lot of the features below will be bundled into Aleph 4.0.0. We’ll be putting out separate communication around this release as it contains some significant changes to the way in which Aleph functions and we want to ensure that you have all the information you require before making the switch from Aleph 3.x to 4.x.x .

4.0.0 - RabbitMQ
The Aleph team have been working hard to get this project finished and into your hands, in fact we’ve been dogfooding our implementation on our production environment for some time. One of the advantages of this is that it has allowed us to discover a number edge cases that we’ve now been able to fix. We hope that this ensures that when it reaches you, the new task broker is both stable and performant.

We’re now reaching a point where the solution that we have implemented is in a state where it’s ready to be released to a wider audience. We are hoping that this is going to happen before our next quarterly update.

Coming soon - Improved Notifications
Our notifications work, that we believe will improve the information that you receive when ingesting data into Aleph will also land, starting with Aleph 4.0.0. In the first release you’ll see some updates to the Aleph status page that will provide you with more detailed information on the start and last updated times associated with your datasets. Then later on in the year we’ll be introducing an improved experience that will allow you to derive a lot more information about the data you’re pulling into Aleph.

Up next - Aleph Infrastructure Updates
We also working hard to evolve our own infrastructure, to see what changes we can make to improve the speed and reliability of aleph.occrp.org. We’re aware of a few updates that we can make to our search index which we hope will significantly improve our ability to quickly and efficiently provide results to our journalists. Once we’ve been able to make these changes we’ll be sure to update the community so that you can address any of your own concerns around search performance. But our focus for this work will likely be:

  • Upgrading our virtualized hardware
  • Increasing the storage capacity of our nodes
  • Reducing the number of nodes
  • Removing previously deleted data
  • Resizing our shards
  • Merging our indexes

Up next - Search improvements
Finally, as we mentioned in our last update we’ll be starting to address some of the issues around the search experience in Aleph later in the year. The precise list of work that we’re going to be doing is still being discussed but the existing list remains as follows:

  • Greater access to the full range of document search results
  • Improved user experience around previewing search results in a document
  • Better filtering of search results with languages and countries
  • Improved search results highlighting
  • Vector based semantic search
  • Audio and video search via transcription

Speak soon
If you’ve made it this far, thanks! We’ll be in touch in Q4 with another update, in the mean time you can keep up to date with releases to Aleph and our other libraries by signing up to our discourse server which can be found here: aleph.discourse.group

1 Like