Skip to content
This repository has been archived by the owner on Sep 26, 2022. It is now read-only.

Project Update v5 - Clean Slate #320

Closed
JMSwag opened this issue Dec 31, 2021 · 37 comments
Closed

Project Update v5 - Clean Slate #320

JMSwag opened this issue Dec 31, 2021 · 37 comments

Comments

@JMSwag
Copy link
Member

JMSwag commented Dec 31, 2021

Hey All,

v4 was a cluster of my own making. Essentially not giving the project enough of my time. My sinceriest apologies. There is some good news though. I've finally got over my fear of front end development and have been learning quiet a lot in that area and have some cool projects under my belt. I finally feel like a half way complete developer. 😅

I initially wrote this project while learning how to code. Shotly after going though learn python the hard way. Shout out to Zed Shaw.

Ok let's get down to business.

Lots of thinks I'd like to improve on.

PyUpdater v5 will be a clean break, saving the good stuff. Thinking in the since of a toolkit where parts are easily replaced.

Gaols

  • Remove all compatiabilty code
  • Remove as my 3rd party libraries as possible.
  • Only supported versions of python. We end when the PSF ends.
  • Support other hosting proviers. .i.e Github releases
  • Make a way to migrate from v3 PyUpdatr projects
  • Best effort on migrating from v4 PyUpdater projects, The config file could be broken beyound repair. Gather feedback during alpha/beta phases.

Questions / Comments / Concerns are very welccome.

@JMSwag
Copy link
Member Author

JMSwag commented Dec 31, 2021

@NicklasTegner @dennisvang

@JessicaTegner
Copy link
Contributor

Fantastic.
I have the following questions.

  • Do you want to rewrite it from scratch, or split apart what we already have.
  • By supported python and PyInstaller versions, I'm assuming we are talking about Python 3.x and PyInstaller 3x and 4x
  • Where do you want to begin?

In addition, if you need help approving code, managing the documentation, feel free to tag me or any of the sorts.
Let's get started :)

@JMSwag
Copy link
Member Author

JMSwag commented Jan 1, 2022

@NicklasTegner
Glad you're on board.

  • Ideally keep the good parts of what we have and improve on what's feasible. I'm fine with everything else being thrown out.
  • I've simplified my initial comment. Essentially we support what's supported by the PSF
  • Not quiet sure just yet. I do have a milestone created. Let me go over existing issues & the codebase. After analysis I'll start creating a few tickets / assigning existing tickets to the v5 milestone.

Will do a the help is greatly appreciated!

@JessicaTegner
Copy link
Contributor

@JMSwag nice. Looking forward to a roadmap / todo list of some kind.
May I suggest linking the milestone in the first post, and pinning this issue.

@dennisvang
Copy link

dennisvang commented Jan 3, 2022

@JMSwag Looks like a good plan. :)

I particularly like the idea of minimizing third-party dependencies and simplifying where possible.

Some steps towards simplifying the code base:

  • drop support for Python 3.6, which has reached end-of-life
  • replace os.path by pathlib equivalents, where possible
  • only create new directories and files in one place, to prevent side effects such as caused by PackageHandler._setup_work_dirs()
  • issue Config confusion #323

It might also be worth having a discussion about the data structure for the versions.gz file and config.pyu file, and the general workflow:

  • What needs to remain, and what can go? For example, are "platform" keys required in the versions.gz file, or would it make more sense only to create a separate versions file for each platform?
  • What would be the workflow for multi-platform development?
  • I also wonder if it is necessary to keep the channel variables/arguments/keys throughout the code. The "channel" information is already contained in the version string (or Version object), so this feels redundant. Getting rid of both "channel" and "platform" keys from the versions file, etc., would certainly simplify things.
  • I suppose the new release should at least be able to produce a versions.gz file (and keys.gz) that can be read by the Client from older releases. In my opinion, clients in the field should be able to update from 3.* to 4.* to 5.* without issue.

A clear definition of "release channels" would also help guide further development:

  • The concepts of "release cycle" and "release channel," as implemented in PyUpdater, should be clearly defined in the documentation.
  • Currently, it looks like release channels are treated as separate, isolated/parallel paths. Is that the intention? That would imply it is not possible to update from e.g. an alpha version to a beta version or stable version. The fact that there is only one patch counter, on the other hand, suggests the opposite.
  • The basic software release cycle (...->alpha->beta->release candidate->production (stable)->alpha->...) suggests a linear path where alpha/beta/rc are intermediate steps between the "stable" releases. If I'm on the alpha channel, I expect to get any available update, i.e. alpha, beta, rc, or stable, whichever is the latest. If I'm on the beta channel, I expect to get only beta, rc, and stable updates, and so on and so forth.
  • Instead of the current release channels "alpha", "beta", "stable", it would be more convenient to adhere to PEP440 terminology. Among other things, PEP440 distinguishes between "final release", i.e. our "stable", and "pre-release", i.e. "alpha", "beta", or "release candidate". I would propose to drop the "stable" terminology, and, instead, let channel=None (or channel="", if you prefer) correspond with a "final release". We should then support pre-release channels as "a", "b", "rc".

In the meantime I've started fixing some of the issues related to the 4.0 release. I think that's coming along reasonably well, but I've taken a lot of liberty there, as you can see e.g. in PR #316...

@JMSwag JMSwag pinned this issue Jan 4, 2022
@JMSwag JMSwag changed the title Project Update v5 Project Update v5 - Clean Slate Jan 4, 2022
@dennisvang
Copy link

@JMSwag Could you help me understand the reasoning behind the current structure of the version manifest, as in versions.gz?

Here's an example what it looks like, currently:

{
  "latest": {
    "Acme": {
      "alpha": {"win": "1.0.2.0.0"},
      "stable": {"win": "1.0.1.2.0"}
    }
  },
  "updates": {
    "Acme": {
      "1.0.0.2.0": {
        "win": {
          "file_hash": "***",
          "file_size": 279320368,
          "filename": "Acme-win-1.0.0.zip"
        }
      },
      "1.1.0.2.0": {
        "win": {
          "file_hash": "***",
          "file_size": 275047978,
          "filename": "Acme-win-1.1.0.zip",
          "patch_hash": "***",
          "patch_name": "Acme-win-stable-0",
          "patch_size": 3063604
        }
      },
      "1.2.0.0.0": {
        "win": {
          "file_hash": "***",
          "file_size": 301965712,
          "filename": "Acme-win-1.2.0-alpha.zip"
        }
      }
    }
  },
  "signature": "***"
}

Now, I'm wondering:

  • Is there a use case where the manifest file would contain multiple different app-name keys (like "Acme")?
  • Is there a use case where the manifest file would contain multiple different platform keys (like "win")?
  • Is the "latest" object really necessary? The actual latest version must be determined dynamically anyway, based on channel, as in get_highest_version().

The reason I'm asking is this:

Following the discussion around issue #255, I'm thinking of an alternative implementation, along the lines of separate manifest files for each app and supported platform, with a simplified, shallower, structure.

For example something like this:

{
  "app_name": "Acme",
  "platform": "win",
  "signature": "***",
  "archives": [
    {
      "version": "1.0.0",
      "hash": "***",
      "size": 279320368,
      "filename": "Acme-win-1.0.0.zip"
    },
    {
      "version": "1.1.0",
      "hash": "***",
      "size": 275047978,
      "filename": "Acme-win-1.1.0.zip"
    },
    {
      "version": "1.2.0a0",
      "hash": "***",
      "size": 301965712,
      "filename": "Acme-win-1.2.0a0.zip"
    }
  ],
  "patches": [
    {
      "version": "1.1.0",
      "hash": "***",
      "size": 3063604,
      "filename": "Acme-win-0"
    },
    {
      "version": "1.2.0a0",
      "hash": "***",
      "size": 2487417,
      "filename": "Acme-win-1"
    }
  ]
}

This puts the version string inside the archive/patch objects, instead of using the version as a key. This may be slightly less convenient, but it ensures that we always know which version corresponds with the object. In addition, JSON objects are unordered, whereas arrays are ordered.

We could then do things like:

import json
from packaging.version import Version

manifest = json.load(<manifest file defined above>)

current_version = Version("1.0.0")
latest_version = Version("1.2.0a0")

# we can easily collect the patches required to update from current to latest version
required_patches = [
    patch for patch in manifest["patches"]
    if current_version < Version(patch["version"]) <= latest_version
]

# patches can be sorted like this, whenever necessary
required_patches.sort(key=lambda patch: Version(patch["version"]))

To improve cohesion, I'm also thinking about something like a VersionManifest class which would handle (de)serialization of the version manifest, determining the latest version, collecting the required patches for an update, etc.

For backward compatibility, the structure above can be converted to a separate file in the old format.

What do you think?

@JessicaTegner
Copy link
Contributor

@dennisvang I have always wondered about that myself.
It might be of interest to check out the --split-version argument during signing e.g:
pyupdater pkg --sign --split-version
As it splits the versions.gz file up into platform specific ones.

As for the application names, that could (but I'm not sure) be because PyUpdater also supports tracking and updating of asset files, hence why I don't think your suggested implementation would work.
What do you think :)

@dennisvang
Copy link

@NicklasTegner Thanks, those are good points. :)
I never used asset updates before, so didn't think of those.
I'll have a closer look at them.

@JMSwag
Copy link
Member Author

JMSwag commented Jan 7, 2022

@dennisvang I coded around my workflow. At the time of writing, I was using PyUpdater by syncing source code to VMs for each supported platform, then building and packaging. Syncing artifacts back to my main dev machine then signing and uploading. This was before the CI/CD craze. When PyUpdater usage picked up and people started wanting to use it in CI, the --split-version option was added. Never needed it personally for my use case.

PyUpdater supports versioned assets with are treated as another app internally.

@dennisvang
Copy link

@JMSwag Thanks for the explanation. That provides some insight into possible workflows for multiple platforms.

@NicklasTegner I think it should be possible to use a simplified version manifest like the one I described above. Maybe not exactly the same, but similar.

At the very least I think we should aim to remove as much nesting as possible from the manifest (as in my example), or, alternatively, change the order of nesting so that app name and platform are at the top level.

I don't think the app name or platform can change within one "session", right? If so, then, both on the client side and on the package-creation side, the app name and platform should only be needed once, viz. at the start of the "session", to load the version data from the manifest for that specific combination of app name and platform.

This way, we don't need the app_name and platform keys every time we access the manifest data, e.g. here, and don't need to use things like EasyAccessDict, e.g. here.

For example, version information in the current manifest is nested like this

{"updates": {"Acme": {"1.0.0.2.0": {"win": {}}}}}

So we need to do version_data["updates"]["Acme"]["1.0.0.2.0"]["win"] every time we need info for "1.0.0.2.0", even though "Acme" and "win" are always the same in one session.

It would already be a great improvement if the order were different:

{"updates": {"Acme": {"win": {"1.0.0.2.0": {}}}}}

Then we could assign version_data = manifest["updates"]["Acme"]["win"] somewhere at the start, and only need to do version_data["1.0.0.2.0"] whenever we need it.

I'm convinced a simplification of the manifest structure would greatly improve code readability.

@JMSwag
Copy link
Member Author

JMSwag commented Jan 7, 2022

@dennisvang I concur.

@dennisvang
Copy link

@JMSwag Just a quick question related to removal of non-essential code: Is there a compelling reason to hide/unhide the files in AppUpdate._win_rename? See e.g. this line.

@JMSwag
Copy link
Member Author

JMSwag commented Jan 25, 2022

@dennisvang On Windows you cannot overwrite the exe from a running process. This is a workaround to simulate the behavior on Linux/Mac.

@dennisvang
Copy link

@JMSwag: Thanks. Does that mean it won't work if you don't hide the file on windows?

Another question, to help us understand the reasoning behind the cryptographic approach.

As I see it, currently, the versions.gz file contains hashes for the package files (archives and patches). The versions.gz file itself is signed. If the versions.gz file signature is verified, the hashes from that file can be trusted. As a result, these hashes can be used to verify not only integrity but also authenticity of the package files.

I wonder why this approach was chosen, instead of just signing all individual package files?
I assume it is a performance optimization, as calculating a hash is typically (much) faster than verifying a signed message.
Is that assumption correct?

I'm asking because I think removing the hash part, and just signing all files instead, would enable another significant simplification of the code (and workflow).

Based on a quick timing comparison, verifying a 1 GiB signed file takes approx. two to three times as long as verifying the hash for the same file, on my system: 5.5 s vs 2.4 s.
The signing itself takes approx. twice that.

The relative time difference is large, but it occurs only incidentally: before deploying a new update, on the dev side, and after downloading a new update, on the client side. Moreover, compared with e.g. download time, it can probably be neglected.

What do you think?

@JMSwag
Copy link
Member Author

JMSwag commented Jan 30, 2022

@dennisvang NP! It will still work but the old executable will be visible next to the current executable.

It was a solution copied form youtube-dl. I was a noob at the time and figured it was sufficient. If signing each file individually simplifies the implementation, I'm all for it!

@dennisvang
Copy link

@JMSwag Thanks again for clarifying. :-)

@JMSwag
Copy link
Member Author

JMSwag commented Feb 1, 2022

@dennisvang My pleasure 😉

@JessicaTegner
Copy link
Contributor

@JMSwag @dennisvang how would I start writing code/documentation or anything else for this?
I mean, it doesn't seem like we have a branch on here, to work against, or something like that?

Please correct me, if I'm wrong :)

@JessicaTegner
Copy link
Contributor

Also another suggestions. Would it maybe be a good idea to split PyUpdater app updates and PyUpdater assets updates (think it's called library updates) into separate packages, as I think (but I'm not sure) that the majority of people using PyUpdater are using it for the application update capabilities?

@dennisvang
Copy link

@JMSwag @dennisvang how would I start writing code/documentation or anything else for this? I mean, it doesn't seem like we have a branch on here, to work against, or something like that?

@NicklasTegner you're right, but you can create a new branch in your own fork, for the time being.

@dennisvang
Copy link

dennisvang commented Feb 17, 2022

Also another suggestions. Would it maybe be a good idea to split PyUpdater app updates and PyUpdater assets updates (think it's called library updates) into separate packages, as I think (but I'm not sure) that the majority of people using PyUpdater are using it for the application update capabilities?

@NicklasTegner Although I've never used the asset update capability myself, I'm not sure it is worth moving it into a separate repo. The asset updates do not require that much extra code.

On the client side, AppUpdate is actually a subclass of LibUpdate (i.e. the asset update class), see source.

On the dev/build/server side (whatever you call it in this context), the actual workflow for creating asset updates is slightly different from that for app updates, but the asset-specific code is mostly limited to ExternalLib (i.e. external asset) and create_asset_archive, see source.

True, there is some juggling with app-name/asset-name throughout the rest of the codebase, but that can be taken care of by a simplified implementation of the file manifest.

@dennisvang
Copy link

@JMSwag @NicklasTegner

I suppose it would be useful to create a list of minimum requirements:

  • what functionality is considered essential?
  • what functionality is optional? (asset updates? uploader? plugins?)
  • what functionality can be dropped altogether?

Based on that, we could set some targets, etc.

Also:

  • which python versions do we want to support? (3.6 is end-of-life, 3.7 still has life left, but imposes some minor limitations)
  • which OS/platforms do we want to support?
  • CI/CD support? (what are example use cases?)

@JessicaTegner
Copy link
Contributor

you're right, but you can create a new branch in your own fork, for the time being.

That just seems very problematic, if everyone starts writing their own implementation/code.

@JessicaTegner
Copy link
Contributor

Although I've never used the asset update capability myself, I'm not sure it is worth moving it into a separate repo.

I see your point with this comment, but it definitely neds a cleanup :)

@dennisvang
Copy link

That just seems very problematic, if everyone starts writing their own implementation/code.

Personally I think it would be very useful to have a few different ideas and implementations to compare.

As long as there is no clear tactical plan...

@JMSwag
Copy link
Member Author

JMSwag commented Feb 20, 2022

@JMSwag @NicklasTegner

I suppose it would be useful to create a list of minimum requirements:

  • what functionality is considered essential?
  • what functionality is optional? (asset updates? uploader? plugins?)
  • what functionality can be dropped altogether?

Based on that, we could set some targets, etc.

Also:

  • which python versions do we want to support? (3.6 is end-of-life, 3.7 still has life left, but imposes some minor limitations)
  • which OS/platforms do we want to support?
  • CI/CD support? (what are example use cases?)

I'd say that existing functionality is essential. Not sure of anything that could be dropped.

We'd want to support any supported version of Python. 3.7+ at this point in time.

At a minimum we should support Linux, Mac & Windows.

I don't think explicit support for a particular CI/CD is needed as long as PyUpdater is flexible enough to be used in CI/CD.

@JessicaTegner
Copy link
Contributor

any updates on this?

@dennisvang @JMSwag

@JonThom
Copy link

JonThom commented Mar 21, 2022

@JMSwag As a potential user, is the current release fit for use?

@dennisvang
Copy link

dennisvang commented Mar 21, 2022

@JonThom Version 4.0 has some issues. See e.g. PR #322 for pending fixes.

We've been using the 3.1 release for several years now without major issues, although I've not tried any of the fancy stuff such as plugins and assets.

@dennisvang
Copy link

@NicklasTegner While awaiting guidance here, I've been working on an alternative implementation that uses a version manifest structured as a flat JSON array of file-info objects, as follows:

[
    {"filename": "Acme-win-1.0.zip", "file_size": 12345, "file_hash": "...."},
    {"filename": "Acme-win-1.1a0.zip", "file_size": 13456, "file_hash": "...."},
    {"filename": "Acme-win-1.1a0.patch", "file_size": 1111, "file_hash": "...."},
    ...,
]

App/asset name, platform, version (and release channel), and file type are parsed from the filenames. App archives, patches, and asset archives, for different platforms and release channels, all co-exist peacefully in the same array. No nesting.

Although the work is far from finished, I am about ready to publish to my fork. At the very least, I'm hoping that will spark the discussion here.

@JMSwag
Copy link
Member Author

JMSwag commented Mar 21, 2022

@dennisvang @NicklasTegner I think I've responded to the outstanding questions. Please let me know if otherwise.

@dennisvang
Copy link

@JMSwag @dennisvang how would I start writing code/documentation or anything else for this? I mean, it doesn't seem like we have a branch on here, to work against, or something like that?

@JMSwag I guess above comment from @NicklasTegner is the only one that has not been answered yet. :-)

I think the main question is: How would you like to proceed? Do you have any specific tasks in mind that we could work on?

@dennisvang
Copy link

dennisvang commented Mar 23, 2022

@JMSwag @NicklasTegner It turns out python-tuf, i.e. the reference implementation for TUF (The Update Framework), has recently reached version 1.0.0, and is now production-ready.

I'm just wondering, would it be an option to build on top of that? It would allow pyupdater to focus on high-level functionality.

@JessicaTegner
Copy link
Contributor

@dennisvang seems like a fantastic idea

@Titan-Node
Copy link

Any updates on the progress of v5? Very interested in building on top of this! Thanks!

@its-monotype
Copy link

Any updates?

@its-monotype
Copy link

Project is dead

@JMSwag JMSwag closed this as completed Sep 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants