December 2023 Roadmap Update

Welcome

Hey everyone. We wanted to provide an update on what the Aleph are working on, and what we are planning to implement in the coming months.

Discourse
First of all we wanted to introduce you to our Discourse platform - if you’re reading this then it looks like you’ve found it - we started setting up Discourse during the summer and have slowly been iterating on it, trying to get it right and we’re now at a point where we’d love for the community to get involved, so take a look around, create a post and share your thoughts.

We are hoping that Discourse will be a great tool for us to have slower, long form, and persistent conversations that help shape Alephs feature set. Being essentially a chat tool, slack does not lend itself to these sorts of discussions that may happen on and off for weeks or even months. Discourse will help fill this gap as well as provide a better way for us to support those of you that run and maintain your own Aleph instances. We are hoping that in time, Discourse will become the place to be for all community discussions.

User metrics
We’re currently validating a project with the goal of improving metrics for Aleph. For a while now we’ve been lacking a good overview of Alephs health. This work will allow us (and you) to keep track of key insights related to the platform, for example:

  • Which versions of Aleph, FtM and ingest-file are running.
  • Statistics on response times from Aleph
  • Performance metrics
  • Concurrent users and authentication requests
  • API request calls, counts, and timings

The metrics use the Prometheus format so you’ll be able to pull these into any tool that supports this format. We’re currently using Grafana but you could also use other tools such as. Datadog or a managed service by any of the big cloud providers.

The stats work should be released and available in Aleph in the first part of 2024.

Notifications
Keeping on the subject of being better informed, we’re currently working on ways in which we can improve the information that we provide when datasets are ingested into Aleph. As you may be aware, right now our status page only provides a limited amount of data when datasets are uploaded or XREF, but falls short of keeping users fully informed, if any of the following sounds familiar you’ll know what we mean:

  • Never ending uploads
  • Missing documents with no explanation
  • No relation between status, jobs, processing, and finished

To address this we’re putting in a number of short (and longer term) changes that we hope will make the process of managing the upload of data a little easier.

In the short term we’ll be adding some addition information into the status page. We believe this information will be key to allowing users to better manage the data they are ingesting. The information we’re adding is:

  • The date and time an ingest was started
  • The date and time the ingest was last active
  • A description of the action taking place ingest, xref, etc.

At the same time we’re taking steps to address some more complex underlying issues around dropped messages and tasks getting stuck. This has been something that we’ve been working towards for some time. The first step, which is now almost complete, is the introduction of RabbitMQ as a task queue, replacing the existing redis based system that we have now. This message broker will allow us to setup a way that we can keep track of messages that fail to communicate and, after a suitable amount of time, re-queue them, or kill them. Doing this should remove the infinite upload problems that people experience.

Along with all this work, we are also reviewing the broader user experience around the information and insights that we provide when you bring new (or updated) data into Aleph. We want to ensure that no matter who you are, journalist, editor, activist, or administrator, we can provide the right level of detail to ensure that you know the state of you datasets and investigations.

Profiles
XREF (cross referencing) is one of the features that helps to set Aleph apart. Having the ability to check the information in your dataset against everything else that you have access to is a great way of identifying connections and leads that you might otherwise have missed. On the downside, there is a need to create these connections for each new investigation, which can be time consuming and frustrating, something that we hope to solve with profiles.

XREF also allows us to create profiles by manually matching results from different datasets. Profiles collect unique datasets from the matched entities and display them as a single object.

Unfortunately the current implementation of profiles in Aleph is both hard to find and work with. It relies on manually matching data in order to build a profile, and it can be difficult to locate and make use of profiles once they have been created.

To address this, in the new year, the Aleph team are going to be taking a look at the existing profiles feature and seeing what we can do to improve the experience. In particular there are a number of questions that we’d like to answer:

  • Can we integrate profiles into the existing search function?
  • Can we better surface profiles in investigations and make them easier to access?
  • Can we improve on the current UI for profiles?
  • Are we able to automatically generate profiles when there is a very high probability that two entities are the same?

We are hoping that we’ll start to provide some of these updates in a version of Aleph at some point during 2024.

Please add any questions, comments, or suggestions that you might have on these items in the comments section, we’d love your feedback.

See you in 2024!

2 Likes