Upgrade from 3.19 to 4.x failed.

I outlined my difficulties with making a copy of aleph to try the upgrade on in the previous post. Having got it working, I thought it might be as simple as replacing my existing docker-compose.yaml with the new one from github containing 4.0 and with added rabbitmq.

I had a couple of extra lines in mine from customization, but nothing crazy: a couple of external directories mounted in the continers, containing a couple of schema modifications, and an edited home page. So I went through the two configs, and merged the two.

On starting up docker compose up -d, I didn’t see any data on the front page. Errors in the error log were plentiful

           | 2024-11-21 06:42:33.889924 [error    ] InternalServerError: 500 Internal Server Error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application. [aleph.views.base_api]
api-1            | Traceback (most recent call last):
api-1            |   File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2190, in wsgi_app
api-1            |     response = self.full_dispatch_request()
api-1            |   File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1486, in full_dispatch_request
api-1            |     rv = self.handle_user_exception(e)
api-1            |   File "/usr/local/lib/python3.8/dist-packages/flask_cors/extension.py", line 176, in wrapped_function
api-1            |     return cors_after_request(app.make_response(f(*args, **kwargs)))
api-1            |   File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1484, in full_dispatch_request
api-1            |     rv = self.dispatch_request()
api-1            |   File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1469, in dispatch_request
api-1            |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
api-1            |   File "/aleph/aleph/views/entities_api.py", line 138, in index
api-1            |     result = EntitiesQuery.handle(request, parser=parser)
api-1            |   File "/aleph/aleph/search/query.py", line 310, in handle
api-1            |     return SearchQueryResult(request, query)
api-1            |   File "/aleph/aleph/search/result.py", line 101, in __init__
api-1            |     result = query.search()
api-1            |   File "/aleph/aleph/search/query.py", line 295, in search
api-1            |     result = es.search(index=self.get_index(), body=self.get_body())
api-1            |   File "/usr/local/lib/python3.8/dist-packages/werkzeug/local.py", line 318, in __get__
api-1            |     obj = instance._get_current_object()
api-1            |   File "/usr/local/lib/python3.8/dist-packages/werkzeug/local.py", line 526, in _get_current_object
api-1            |     return get_name(local())
api-1            |   File "/aleph/aleph/core.py", line 181, in get_es
api-1            |     raise RuntimeError("Could not connect to ElasticSearch")
api-1            | RuntimeError: Could not connect to ElasticSearch
api-1            | 2024-11-21 06:42:33.899018 [info     ] Request handled                [aleph.views.context] request_logging=True
api-1            | [2024-11-21 06:42:33 +0000] [13] [DEBUG] Ignoring EPIPE
api-1            | 2024-11-21 06:42:55.959106 [error    ] Exception on /api/2/entities [GET] [aleph]
api-1            | Traceback (most recent call last):
api-1            |   File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2190, in wsgi_app
api-1            |     response = self.full_dispatch_request()
api-1            |   File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1486, in full_dispatch_request
api-1            |     rv = self.handle_user_exception(e)
api-1            |   File "/usr/local/lib/python3.8/dist-packages/flask_cors/extension.py", line 176, in wrapped_function
api-1            |     return cors_after_request(app.make_response(f(*args, **kwargs)))
api-1            |   File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1484, in full_dispatch_request
api-1            |     rv = self.dispatch_request()
api-1            |   File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1469, in dispatch_request
api-1            |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
api-1            |   File "/aleph/aleph/views/entities_api.py", line 138, in index
api-1            |     result = EntitiesQuery.handle(request, parser=parser)
api-1            |   File "/aleph/aleph/search/query.py", line 310, in handle
api-1            |     return SearchQueryResult(request, query)
api-1            |   File "/aleph/aleph/search/result.py", line 101, in __init__
api-1            |     result = query.search()
api-1            |   File "/aleph/aleph/search/query.py", line 295, in search
api-1            |     result = es.search(index=self.get_index(), body=self.get_body())
api-1            |   File "/usr/local/lib/python3.8/dist-packages/werkzeug/local.py", line 318, in __get__
api-1            |     obj = instance._get_current_object()
api-1            |   File "/usr/local/lib/python3.8/dist-packages/werkzeug/local.py", line 526, in _get_current_object
api-1            |     return get_name(local())
api-1            |   File "/aleph/aleph/core.py", line 181, in get_es
api-1            |     raise RuntimeError("Could not connect to ElasticSearch")
api-1            | RuntimeError: Could not connect to ElasticSearch

Any ideas what’s going on?

And also

elasticsearch-1  | create keystore
elasticsearch-1  | Exception in thread "main" java.lang.IllegalStateException: unable to read from standard input; is standard input open and a tty attached?
elasticsearch-1  | 	at org.elasticsearch.cli.Terminal$SystemTerminal.readText(Terminal.java:293)
elasticsearch-1  | 	at org.elasticsearch.cli.Terminal.promptYesNo(Terminal.java:151)
elasticsearch-1  | 	at org.elasticsearch.common.settings.CreateKeyStoreCommand.execute(CreateKeyStoreCommand.java:41)
elasticsearch-1  | 	at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77)
elasticsearch-1  | 	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112)
elasticsearch-1  | 	at org.elasticsearch.cli.MultiCommand.execute(MultiCommand.java:95)
elasticsearch-1  | 	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112)
elasticsearch-1  | 	at org.elasticsearch.cli.Command.main(Command.java:77)
elasticsearch-1  | 	at org.elasticsearch.common.settings.KeyStoreCli.main(KeyStoreCli.java:33)

Restoring the 3.x docker-compose.yml gets everything working again.

I think the error states that your ElasticSearch instance is not reachable for Aleph.

… which is strange, right, because there were no changes to the elasticsearch component?

Well OK, I kept going at it after posting that. And instead of doing all the changes at once, I thought I’d gradually migrate the settings into docker-compose.yml.
First I did the rabbit mq changes. docker up, OK, docker down. Then one by one I tried the other changes … until at the end of it, all was working on the fresh instance, including elasticsearch. So it seems the answer was to sneak up on it.

I did fall victim of the ALEPH_TAG bug here

In my original config I had this format
${ALEPH_TAG:-4.0.1}
and in the docker-compose.yml in github the format is.
${ALEPH_TAG:-ALEPH_TAG:-4.0.1}
In the end I decided to keep it simple and just use the version number directly instead of all that, which seems to work.
4.0.2

So anyway I have an upgraded instance to test now. If all is well, then we’ll do the same thing on the production copy.

Ah yes, we are tracking that as BUG: invalid reference format in docker-compose.yml · Issue #4001 · alephdata/aleph · GitHub and I will have that fixed asap.

At the point where I confidently declared aleph to be working. I hadn’t yet tried logging in!

Seems that logging with a known user/pass combo doesn’t work.

Tried looking at the users in aleph container.

root@705849efac37:/aleph# aleph users
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UndefinedColumn: column role.last_login_at does not exist
LINE 1: ... AS role_reset_token, role.locale AS role_locale, role.last_...
                                                             ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/bin/aleph", line 33, in <module>
    sys.exit(load_entry_point('aleph', 'console_scripts', 'aleph')())
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/flask/cli.py", line 358, in decorator
    return __ctx.invoke(f, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/aleph/aleph/manage.py", line 451, in users
    all_users = [
  File "/aleph/aleph/manage.py", line 451, in <listcomp>
    all_users = [
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py", line 2828, in __iter__
    result = self._iter()
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py", line 2842, in _iter
    result: Union[ScalarResult[_T], Result[_T]] = self.session.execute(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 2262, in execute
    return self._execute_internal(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py", line 2144, in _execute_internal
    result: Result[Any] = compile_state_cls.orm_execute_statement(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/context.py", line 293, in orm_execute_statement
    result = conn.execute(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1412, in execute
    return meth(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/sql/elements.py", line 516, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1635, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1844, in _execute_context
    return self._exec_single_context(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1984, in _exec_single_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 2339, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column role.last_login_at does not exist
LINE 1: ... AS role_reset_token, role.locale AS role_locale, role.last_...
                                                             ^

[SQL: SELECT role.foreign_id AS role_foreign_id, role.name AS role_name, role.email AS role_email, role.type AS role_type, role.api_key AS role_api_key, role.is_admin AS role_is_admin, role.is_muted AS role_is_muted, role.is_tester AS role_is_tester, role.is_blocked AS role_is_blocked, role.password_digest AS role_password_digest, role.reset_token AS role_reset_token, role.locale AS role_locale, role.last_login_at AS role_last_login_at, role.id AS role_id, role.deleted_at AS role_deleted_at, role.created_at AS role_created_at, role.updated_at AS role_updated_at 
FROM role 
WHERE role.deleted_at IS NULL AND role.type = %(type_1)s AND role.is_blocked = false]
[parameters: {'type_1': 'user'}]
(Background on this error at: https://sqlalche.me/e/20/f405)

That didn’t go well. Wondered if aleph needed to do an upgrade on the DB schema, so tried that …

root@705849efac37:/aleph# aleph upgrade
2024-11-27 03:59:15.198893 [info     ] Context impl PostgresqlImpl.   [alembic.runtime.migration]
2024-11-27 03:59:15.199450 [info     ] Will assume transactional DDL. [alembic.runtime.migration]
2024-11-27 03:59:15.230616 [error    ] Error: Requested revision c52a1f469ac7 overlaps with other requested revisions 274270e01613 [flask_migrate]

Any ideas? Has anyone successfully migrated yet?

That last message looks strange. It seems like something about the DB migration went wrong. Would you be able to check the output of this SQL query against your aleph database: SELECT * FROM alembic_version; ? The output should only have one row.

Indeed, there are two versions numbers.

"SELECT * FROM alembic_version;"
 version_num  
--------------
 c52a1f469ac7
 274270e01613
(2 rows)

OK, out of those 274270e01613 is the older one, so to be safe I would advise leaving just that in there and re-running the DB upgrade. If that fails then try again with c52a1f469ac7.

OK so with

"SELECT * FROM alembic_version;"
 version_num  
--------------
 274270e01613
(1 row)

I got.

docker-compose run --rm shell aleph upgrade
Creating aleph_shell_run ... done
2024-12-05 11:57:35.088848 [info     ] Context impl PostgresqlImpl.   [alembic.runtime.migration]
2024-12-05 11:57:35.090172 [info     ] Will assume transactional DDL. [alembic.runtime.migration]
2024-12-05 11:57:35.114448 [info     ] Running upgrade 274270e01613 -> c52a1f469ac7, create bookmark table [alembic.runtime.migration]
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1965, in _exec_single_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 921, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.DuplicateTable: relation "bookmark" already exists

And then after

"UPDATE alembic_version SET version_num='c52a1f469ac7' WHERE version_num='274270e01613';"
UPDATE 1

It seemed to go a little better. But still errors.

docker-compose run --rm shell aleph upgrade
Creating aleph_shell_run ... done
2024-12-05 12:18:02.248146 [info     ] Context impl PostgresqlImpl.   [alembic.runtime.migration]
2024-12-05 12:18:02.249201 [info     ] Will assume transactional DDL. [alembic.runtime.migration]
2024-12-05 12:18:02.278634 [info     ] Running upgrade c52a1f469ac7 -> 8adf50aadcb0, Add last_login_at column [alembic.runtime.migration]
2024-12-05 12:18:02.281270 [debug    ] update c52a1f469ac7 to 8adf50aadcb0 [alembic.runtime.migration]
2024-12-05 12:18:02.284369 [info     ] Archive: /data                 [servicelayer.archive.file]
2024-12-05 12:18:02.284519 [info     ] Creating system roles...       [aleph.logic.roles]
2024-12-05 12:18:02.441278 [info     ] [aleph-collection-v1] No changes detected in settings. [aleph.index.util] index=aleph-collection-v1
2024-12-05 12:18:02.441456 [info     ] [aleph-collection-v1] Current mappings. [aleph.index.util] index=aleph-collection-v1 mappings={'dynamic': 'false', '_source': {'excludes': 2024-12-05 12:18:02.441728 [info     ] [aleph-collection-v1] New mappings. [aleph.index.util] 
 .... OK until 

2024-12-05 12:18:03.214716 [info     ] [aleph-entity-legalentity-v1] New mappings. [aleph.index.util] index=aleph-entity-legalentity-v1 mappings={'date_detection': False, 'dynamic': False, '_source': {'excludes': ['text', 'fingerprints']}, 'properties': {'caption': {'type': 'keyword'}, 'schema': {'type': 'keyword'}, 'schemata': {'type': 'keyword'}, 'entities': ---snip

{'type': 'keyword'}, 'profile_id': {'type': 'keyword'}, 'collection_id': {'type': 'keyword'}, 'origin': {'type': 'keyword'}, 'created_at': {'type': 'date'}, 'updated_at': {'type': 'date'}}}
2024-12-05 12:18:03.370029 [error    ] Index [aleph-entity-legalentity-v1] error: Mapper for [properties.phone] conflicts with existing mapper:
    Cannot update parameter [index] from [true] to [false] [aleph.index.util]
2024-12-05 12:18:03.370354 [error    ] Failed to upgrade.             [aleph]
Traceback (most recent call last):
  File "/aleph/aleph/manage.py", line 504, in upgrade
    upgrade_system()
  File "/aleph/aleph/migration.py", line 15, in upgrade_system
    upgrade_search()
  File "/aleph/aleph/index/admin.py", line 17, in upgrade_search
    configure_entities()
  File "/aleph/aleph/index/indexes.py", line 68, in configure_entities
    configure_schema(schema, version)
  File "/aleph/aleph/index/indexes.py", line 130, in configure_schema
    return configure_index(index, mapping, settings)
  File "/aleph/aleph/index/util.py", line 320, in configure_index
    _check_response(
  File "/aleph/aleph/index/util.py", line 244, in _check_response
    raise AlephOperationalException(f"Index {index} error: {error}")
aleph.index.util.AlephOperationalException: Index aleph-entity-legalentity-v1 error: Mapper for [properties.phone] conflicts with existing mapper:
    Cannot update parameter [index] from [true] to [false]

Have just run this on the original 3.19 version, and see that there are two version entries in the table there too! So this didn’t happen during the upgrade. 3.19 version is running OK, and lets me log in though.

OK, well still stuck on this, and don’t seem to be able to proceed with the upgrade.

There’s another option, I guess which is to start with a fresh install of 4.x and then re-import the data into it.
Then I gather I’d need to export the list of users from the old postgres database and import it into the new one.
I imagine the dataset number would be different, so I’m guessing all the permissions would need to be fixed. Probably lots of other hacking as well. Is this a viable option?

If you are re-importing the postgres data, the dataset numbers (and user permissions) shouldn’t change.

I see you have open a separate thread to discuss the issue of backing up and restoring your instance. Perhaps following the final suggestion from Simon over there may help importing your data into a fresh 4.x installation?