Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG postgres terminated by signal 6: Aborted #1205

Open
ns1000 opened this issue Nov 29, 2023 · 8 comments
Open

BUG postgres terminated by signal 6: Aborted #1205

ns1000 opened this issue Nov 29, 2023 · 8 comments

Comments

@ns1000
Copy link
Contributor

ns1000 commented Nov 29, 2023

I found a specific query that causes a crash in postgres.

I tested this bug on both postgres 15/16 on my debian installation and it also crashes the postgres in the latest docker ( ghcr.io/postgresml/postgresml:2.7.3 )

To reproduce follow the following sequence

SELECT pgml.load_dataset('ag_news');
DROP TABLE phrases2;
CREATE TABLE phrases2 (
  id serial PRIMARY KEY,
  phrase text,
  embedding vector(384)
);
insert into phrases2(phrase,embedding) select text,pgml.embed('all-MiniLM-L6-v2', text)::vector from pgml.ag_news limit 10000;

After the import is complete to phrases2. Exit the psql client. And start up the psql new. Then run this query:

WITH Embeddings AS (
  SELECT
    pgml.embed('all-MiniLM-L6-v2', p.phrase) AS embedding,
    id
  FROM
    phrases2 p
  limit 45
)
SELECT
  p.id,
  p.phrase,
  1 - (p.embedding <=> e.embedding::vector) AS similarity
FROM
  Embeddings e
JOIN
  phrases2 p ON true
ORDER BY
  (p.embedding <=> e.embedding::vector)
limit 10;

This will create the following log files and an abort

: CommandLine Error: Option 'nvptx-no-f16-math' registered more than once!
LLVM ERROR: inconsistency in registered CommandLine options
2023-11-29 20:30:51.296 UTC [29] LOG:  server process (PID 162) was terminated by signal 6: Aborted

I found that initializing the embedding first mitigates the bug. So if you start psql client and then run:

SELECT pgml.embed('all-MiniLM-L6-v2', 'initialize the framework')::vector;

WITH Embeddings AS (
  SELECT
    pgml.embed('all-MiniLM-L6-v2', p.phrase) AS embedding,
    id
  FROM
    phrases2 p
  limit 45
)
SELECT
  p.id,
  p.phrase,
  1 - (p.embedding <=> e.embedding::vector) AS similarity
FROM
  Embeddings e
JOIN
  phrases2 p ON true
ORDER BY
  (p.embedding <=> e.embedding::vector)
limit 10;
@tca3
Copy link

tca3 commented Jan 24, 2024

you saved me with this, thank you!

@levkk
Copy link
Contributor

levkk commented Jan 25, 2024

My immediate thought goes to parallel workers. If you temporarily disable them by setting max_parallel_workers_per_gather to 0, are you able to run the query without "prewarming" the connection with the embeddings model?

@ns1000
Copy link
Contributor Author

ns1000 commented Jan 25, 2024

@levkk i tried the max_parallel_work_per_gather to 0 but that doesn't solve the issue. I believe it has to do some how with the venv, and perhaps the reason pg hasn't noticed is because they use pgcat (pooler) that might mitigate the bug.

I have an easier way to reproduce a similar (i think the same) issue, still a few steps to take though:

This assume you have docker + postgresml docker installed and you run the docker with a bash shell you can run interactive commands

apt update
# to install systemctl needed for the upgrade first:
apt install libsystemd0 
apt -y upgrade
apt install postgresql-plpython3-15

then start this command sequence:

sudo -u postgres psql
CREATE DATABASE example;
\c example
CREATE EXTENSION plpython3u;
CREATE OR REPLACE FUNCTION public.set_my_text(string text)
  RETURNS TEXT AS
$BODY$
    GD["my_text"] = string
    return string
$BODY$
LANGUAGE plpython3u VOLATILE;
CREATE EXTENSION pgml;
\q

Now run this sequence

sudo -u postgres psql
\c example
SELECT pgml.embed('all-MiniLM-L6-v2', 'MY TEST PHRASE');
SELECT set_my_text('example');

That crashed the server

Now run it like this:

sudo -u postgres psql
\c example
SELECT set_my_text('example');
SELECT pgml.embed('all-MiniLM-L6-v2', 'MY TEST PHRASE');
SELECT set_my_text('example');

-------------
 example
(1 row)

You can see initialization of python before pgml make sure you dont get a error 11 which leads to a fatal crash

@montanalow
Copy link
Contributor

My guess is this was fixed by #1276, and this bug report is stale. Can you reproduce with latest master build, or is this on an older version?

@ns1000
Copy link
Contributor Author

ns1000 commented Jan 25, 2024

The version i compiled and has the bug with python has the code from 1276

image

@levkk
Copy link
Contributor

levkk commented Jan 25, 2024

The presence of plpython3u is reminding me of another bug report we had with Pl/Python and extension not mixing well together. We fixed it by using a different linker and remapping all our stuff, including libpython which we statically link, into a private namespace in the executable. That being said, I think something is doing global initialization (my guess one of the Python C-libs), and this is what's blowing up.

More context here: #931

@ns1000
Copy link
Contributor Author

ns1000 commented Jan 25, 2024

@montanalow so i checked and the problem described in the first post i am not seeing on my latest installed postgresml, but I do still have seeing the second problem with python3 and this happens on both my postgres instances (one wihch has the older bug and one that doesn't have the older bug anymore) I still need to check to see if the upgrade helps there.

@ns1000
Copy link
Contributor Author

ns1000 commented Jan 25, 2024

@levkk ah yes that looks like the same thing. When do a python function before the first call to pgml there is no problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants