Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After using pgml.dump_all and pgml.load_all for data backup and migration, an error occurs when trying to train on the new database. #1307

Open
HJH0924 opened this issue Jan 30, 2024 · 0 comments

Comments

@HJH0924
Copy link

HJH0924 commented Jan 30, 2024

If it's my original database, after executing pgml.dump_all, I would run the following commands to clear the table data:
TRUNCATE TABLE pgml.models cascade;
TRUNCATE TABLE pgml.deployments cascade;
TRUNCATE TABLE pgml.projects cascade;
TRUNCATE TABLE pgml.snapshots cascade;
TRUNCATE TABLE pgml.files cascade;

At this point, executing pgml.load_all would restore the data and training could proceed as normal.

However, when I execute createdb -O postgresml pgml_backup and then run create extension pgml; to create the pgml extension on the new database pgml_backup, followed by executing pgml.load_all to restore the data to the pgml_backup database, the data can be restored, but pgml.train cannot be performed. An error will occur: ERROR: duplicate key value violates unique constraint "projects_pkey".

Here are the replication steps:
In the postgresml database:
SELECT * FROM pgml.load_dataset('digits'); # OK
SELECT * FROM pgml.train('Handwritten Digits', 'classification', 'pgml.digits', 'target'); # OK
SELECT pgml.dump_all('/root/pgml-bak/'); # OK
\q # Exit the database
createdb -O postgresml pgml_backup # Create a new database

Enter the pgml_backup database and create the pgml extension
create extension pgml; # OK
SELECT pgml.load_all('/root/pgml-bak/'); # OK
SELECT * FROM pgml.load_dataset('diabetes'); # OK
SELECT * FROM pgml.train('Diabetes Progression', 'regression', 'pgml.diabetes', 'target'); # ERROR: duplicate key value violates unique constraint "projects_pkey"
SELECT * FROM pgml.train('Handwritten Digits', algorithm => 'svm', materialize_snapshot => true); # ERROR: duplicate key value violates unique constraint "models_pkey"

If you are using the same database throughout, such as postgresml:
SELECT * FROM pgml.load_dataset('digits'); # OK
SELECT * FROM pgml.train('Handwritten Digits', 'classification', 'pgml.digits', 'target'); # OK
SELECT pgml.dump_all('/root/pgml-bak/'); # OK
TRUNCATE TABLE pgml.models cascade; # OK
TRUNCATE TABLE pgml.deployments cascade; # OK
TRUNCATE TABLE pgml.projects cascade; # OK
TRUNCATE TABLE pgml.snapshots cascade; # OK
TRUNCATE TABLE pgml.files cascade; # OK
SELECT pgml.load_all('/root/pgml-bak/'); # OK
SELECT * FROM pgml.load_dataset('diabetes'); # OK
SELECT * FROM pgml.train('Diabetes Progression', 'regression', 'pgml.diabetes', 'target'); # OK
SELECT * FROM pgml.train('Handwritten Digits', algorithm => 'svm'); # OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant