Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package number inconsistencies #250

Open
diversys opened this issue Nov 22, 2023 · 8 comments
Open

Package number inconsistencies #250

diversys opened this issue Nov 22, 2023 · 8 comments

Comments

@diversys
Copy link
Contributor

diversys commented Nov 22, 2023

Currently HaikuPorts x86_64 repo in HDS shows 8069 packages (including _devel, _source and _debug packages).
However, HD shows 7996 packages. pkgman also shows the same number.

pkgman search -aD | grep HaikuPorts | wc -l
7996
@nielx
Copy link
Member

nielx commented Nov 24, 2023

The web version also indexes the BeSly and FatElk repositories. I think you can see the details here: https://depot.haiku-os.org/#!/repositories?bcguid=bc198-QBID

Does that explain the difference?

@diversys
Copy link
Contributor Author

Those repos are disabled by default. If I enable them then HDS shows 8136 packages.

@andponlin
Copy link
Collaborator

andponlin commented Nov 24, 2023

You can get a list of the packages exported from HDS with the following command;

curl -L "https://depot.haiku-os.org/__pkg/all-haikuports_x86_64-en.json.gz" | gzip -d | jq -r '(.items|sort_by(.name)|.[]|[.name] | join(","))'

You can get a list from Haiku with;

pkgman search -aD | grep HaikuPorts | sed -E 's/^HaikuPorts[ ]+([^ ]+)[ ]+.*$/\1/'

In HDS we see 7998 packages and for Haiku 7996. This gives two extra packages from HDS;

  • monsterz
  • pycharm_community_bin

It looks like the cause of this is that two package versions in the database are set as is_latest. I can see...

WITH x AS (
  SELECT COUNT(pv.id) AS version_count, p.name
  FROM haikudepot.pkg_version pv JOIN haikudepot.pkg p ON p.id = pv.pkg_id
  WHERE pv.is_latest = true AND pv.architecture_id=2864 GROUP BY p.name
) SELECT * FROM x WHERE version_count > 1;

...there are quite a few cases of this and many date to 2022-04-29 so maybe there was an issue then.

This may be the case of the anomaly -- can't be totally sure. I will try to look into it further soon.

@andponlin
Copy link
Collaborator

Looking into this a bit further this evening, I see that the data in HDS is most likely correct.

It seems as if the problem might be coming from the fact that some packages have been issued under both a regular computer architecture such as x86_64 as well as the pseudo-architecture any. This means that there is a latest existing for both architectures and so both versions appear legitimately in queries which looks like a double-up.

Here is a raw database query to check on this;

SELECT
  pv.id AS version_id, pv.create_timestamp AS version_timestamp, p.name AS pkg_name, a.code as arch, pv.active, pv.is_latest, rs.code AS rs_code
FROM haikudepot.pkg_version pv
JOIN haikudepot.architecture a ON a.id = architecture_id
JOIN haikudepot.pkg p ON p.id = pkg_id
JOIN haikudepot.repository_source rs ON rs.id = pv.repository_source_id
WHERE p.name IN ('monsterz', 'pycharm_community_bin') ORDER BY p.name, pv.create_timestamp DESC;

My older obfuscated snapshot shows;

 version_id |    version_timestamp    |       pkg_name        |   arch   | active | is_latest |       rs_code       
------------+-------------------------+-----------------------+----------+--------+-----------+---------------------
     382810 | 2023-08-22 21:02:28.473 | monsterz              | any      | t      | t         | haikuports_x86_gcc2
     378124 | 2023-08-22 21:01:26.211 | monsterz              | any      | t      | t         | haikuports_x86_64
     321530 | 2019-02-02 19:01:20.346 | monsterz              | x86_gcc2 | t      | t         | haikuports_x86_gcc2
     389313 | 2023-11-05 20:18:26.991 | pycharm_community_bin | x86_64   | t      | t         | haikuports_x86_64
     368493 | 2022-04-28 20:20:13.191 | pycharm_community_bin | any      | f      | t         | haikuports_x86_gcc2
     358653 | 2021-05-14 20:17:28.204 | pycharm_community_bin | any      | t      | t         | haikuports_x86_64
     335472 | 2020-05-24 20:17:45.359 | pycharm_community_bin | any      | f      | f         | haikuports_x86_gcc2
     320892 | 2018-12-31 03:01:22.567 | pycharm_community_bin | any      | f      | f         | haikuports_x86_gcc2
     320891 | 2018-12-30 20:01:20.769 | pycharm_community_bin | any      | f      | f         | haikuports_x86_gcc2

(@nielx ; it would be great if you would be able to run this same query and paste the results here to confirm the current state in the production database.)

It is somewhat unclear what is "right" and "wrong" here because this is showing a historical trace of the packages' versions. What I think needs to happen here with monsterz and pycharm_community_bin is...

  • any necessary corrections are made to the package recipes so that they have the correct architecture
  • identify the correct architecture for these two packages in this ticket
  • wait for an import into the HDS database
  • run the sql above to see what older versions on incorrect architectures there are
  • write an UPDATE statement to mark the undesirable versions as inactive

@nielx
Copy link
Member

nielx commented Dec 17, 2023

Output from the production database:

 version_id |    version_timestamp    |       pkg_name        |   arch   | active | is_latest |       rs_code
------------+-------------------------+-----------------------+----------+--------+-----------+---------------------
     389820 | 2023-08-18 09:38:38.676 | monsterz              | any      | t      | t         | haikuports_x86_gcc2
     389817 | 2023-08-18 09:35:55.716 | monsterz              | any      | t      | t         | haikuports_x86_64
     384498 | 2023-04-25 05:15:02.158 | monsterz              | x86_64   | t      | t         | haikuports_x86_64
     384347 | 2023-04-22 08:46:50.118 | monsterz              | x86_64   | t      | f         | haikuports_x86_64
     321530 | 2019-02-02 19:01:20.346 | monsterz              | x86_gcc2 | f      | t         | haikuports_x86_gcc2
     393678 | 2023-10-31 11:29:39.911 | pycharm_community_bin | x86_64   | t      | t         | haikuports_x86_64
     368493 | 2022-04-28 20:20:13.191 | pycharm_community_bin | any      | f      | t         | haikuports_x86_gcc2
     358653 | 2021-05-14 20:17:28.204 | pycharm_community_bin | any      | t      | t         | haikuports_x86_64
     335472 | 2020-05-24 20:17:45.359 | pycharm_community_bin | any      | f      | f         | haikuports_x86_gcc2
     320892 | 2018-12-31 03:01:22.567 | pycharm_community_bin | any      | f      | f         | haikuports_x86_gcc2
     320891 | 2018-12-30 20:01:20.769 | pycharm_community_bin | any      | f      | f         | haikuports_x86_gcc2
(11 rows)

@andponlin
Copy link
Collaborator

There is an anomaly with the pycharm_community_bin is that it looks like inconsistent packaging between x86_64 and x86gcc2. monsterz looks like it is being packaged under any for some reason. @diversys ; do you have time to communicate with the package owners to clarify what they are doing there?

Maybe for now we could fix the broken packages with;

UPDATE haikudepot.pkg_version SET active = false WHERE id IN (384498, 321530, 358653);

That will make those anomalous versions appear to be deleted.

@nielx
Copy link
Member

nielx commented Dec 17, 2023

I have applied the SQL to make them inactive. Let me know if I can be of further assistance.

@andponlin
Copy link
Collaborator

Thanks @nielx ; it looks like there is still a discrepancy. I will have to look into at again another time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants