I’m get the following error that a PDF file could not be extracted:
No preview is available for this document
Could not extract PDF file: OperationalError('(sqlite3.OperationalError) no such table: ingest_cache')
This happens to many, but not all, the documents (PDFs) I try to upload. The error doesn’t seem to occur when the type in the system is “Document” but none of those labled “File” show a preview or any extracted mentions. I managed to have one file finally extracted by deleting it, and re-uploading it a few times but that’s obviously not a sustainable solution.
Here’s what I think is the relevant message in ingest-file:
{“logger”: “ingestors.manager”, “timestamp”: “2025-04-05 14:42:21.616664”, “exception”: “Traceback (most recent call last):\n File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context\n self.dialect.do_execute(\n File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 941, in do_execute\n cursor.execute(statement, parameters)\nsqlite3.OperationalError: no such table: ingest_cache\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/ingestors/ingestors/documents/pdf.py", line 52, in ingest\n self.parse_and_ingest(file_path, entity, self.manager)\n File "/ingestors/ingestors/support/pdf.py", line 89, in parse_and_ingest\n pdf_model: PdfModel = self.parse(file_path)\n File "/ingestors/ingestors/support/pdf.py", line 84, in parse\n self.pdf_extract_page(pdf_doc, page, page.number + 1)\n File "/ingestors/ingestors/support/pdf.py", line 143, in pdf_extract_page\n text = self.extract_ocr_text(data, languages=languages)\n File "/ingestors/ingestors/support/ocr.py", line 30, in extract_ocr_text\n text = self.tags.get(key)\n File "/usr/local/lib/python3.8/dist-packages/servicelayer/tags.py", line 52, in get\n rp = conn.execute(stmt)\n File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1418, in execute\n return meth(\n File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/sql/elements.py", line 515, in _execute_on_connection\n return connection._execute_clauseelement(\n File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1640, in _execute_clauseelement\n ret = self._execute_context(\n File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1846, in _execute_context\n return self._exec_single_context(\n File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1986, in _exec_single_context\n self._handle_dbapi_exception(\n File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 2355, in _handle_dbapi_exception\n raise sqlalchemy_exception.with_traceback(exc_info[2]) from e\n File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/base.py", line 1967, in _exec_single_context\n self.dialect.do_execute(\n File "/usr/local/lib/python3.8/dist-packages/sqlalchemy/engine/default.py", line 941, in do_execute\n cursor.execute(statement, parameters)\nsqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: ingest_cache\n[SQL: SELECT ingest_cache.value \nFROM ingest_cache \nWHERE ingest_cache."key" = ?]\n[parameters: (‘ocr:ec6f56cc558034938fdeb376067be626f0f78cc2’,)]\n(Background on this error at: https://sqlalche.me/e/20/e3q8)\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/ingestors/ingestors/manager.py", line 220, in ingest\n self.delegate(ingestor_class, file_path, entity)\n File "/ingestors/ingestors/manager.py", line 245, in delegate\n ingestor_class(self).ingest(file_path, entity)\n File "/ingestors/ingestors/documents/pdf.py", line 56, in ingest\n raise ProcessingException("Could not extract PDF file: %r" % ex) from ex\ningestors.exc.ProcessingException: Could not extract PDF file: OperationalError(‘(sqlite3.OperationalError) no such table: ingest_cache’)”, “start_time”: 1743864141.5945566, “stage”: “ingest”, “dataset”: “1”, “job_id”: “4:b2680744-6740-4031-bdae-7c1b8b695375”, “v”: “4.0.1”, “trace_id”: “2b6f2837910f412f96dfbbf281c5649c”, “message”: “[<E(‘279.c158a2a3920f19344ee9e80d82f64f290b9bfb6c’,‘200219111757-001.pdf’)>] Failed to process: Could not extract PDF file: OperationalError(‘(sqlite3.OperationalError) no such table: ingest_cache’)”, “severity”: “ERROR”}