Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with page ID stability: being robust when renaming page titles #144

Open
feliksik opened this issue Jan 25, 2024 · 4 comments
Open
Labels
help wanted Extra attention is needed

Comments

@feliksik
Copy link

feliksik commented Jan 25, 2024

As the sourcetext files do not have a confluence ID, the identity is matched based on the page title. A page title is unique in a confluence space.

But with the title as ID, a page is rename it leads to the page being deleted, and a new page is created. As a consequence, incoming links break 🙁 (i.e. links from other confluence pages/spaces not managed by the same text2confl project). It would be much better to update the existing page instead.

We could use the filename (and optionally/additionally a self-made-up-identifier metadata field that can be even more stable, but has to be managed manually in the text file), and put this as metadata in the confluence page. When uploading this, we would read the metadata and match with the file, thus having a bridge between the confluence page id and the file on the asciidoc/md contents. This would solve the delete-recreate issue.

I suppose there are some details to work out, and alternative approaches to consider. I think this would be a very valuable feature for usage in a larger context, where the stability of incoming links is rather important. I'm happy to collaborate on making this work.

(Note: Initially I also mentioned the page stealing issue here, mentioned in #142, but I now think it deserves a separate solution).

@feliksik
Copy link
Author

feliksik commented Jan 25, 2024

I have also thought about having a metadata confluence-page-id field that can be managed in the text file, instead of a self-made-up-identifier that needs to be added in the text file, and administered as metadata in Confluence.

Obviously this would only be known after creation in Confluence, so this needs to be added to the text later; either manually, or even by the tooling, inserting this in the text file as 1st line.

However, this does not seem like a good idea:

  • if adding the ID would be done by the tooling: it's probably a CI/CD pipeline running the deployment, so not a great moment to do new git commits.
  • I can imagine a workflow where you deploy the docs to MyProductionSpace in CI/CD, but I develop the documentation before PR/merge in MyTestSpace. A self-made-up-identifier could be unique per space, but be reused in both the Test and the Production space. On the contrary, confluence-page-id would break this workflow of having 2 deployments of the same document (unless we make it more advanced/complicated).

@feliksik
Copy link
Author

feliksik commented Feb 12, 2024

New idea: provide an option --follow-git-renames. It will use something like git log --follow --diff-filter=A -- possibly-renamed.md to determine the hash commit of where a file was introduced, and uses the ${commitHash}-${originalFilename} as the identifier of the document, in the confluence page metadata.

I think this will achieve exactly what I intend:

  • renaming the file will keep the same page id
  • changing the title will keep the same page id
  • no manual effort needed to maintain an self-made-up-id in the document metadata

@zeldigas I'm not sure how much time you spend on this project, but I may get to implementing this myself when time permits. Either way, it's useful to first align on this idea.

@zeldigas
Copy link
Owner

@feliksik I believe that file rename is not an issue at all for page renames when you have explicit page titles - any sort of cleanup is done after all doc tree is processed, so even if file name was renamed or even moved under another location it will be processed properly

But for title renames it's challenging. You mentioned some metadata that can be associated with the page. While this can be set for sure, the main challenge would be to find this page - I did not dig deep into it, but I doubt that it's available out of the box if even possible. I see some docs, that is applied only to server version and requires server setup configuration: https://developer.atlassian.com/server/confluence/content-properties-in-the-rest-api/. And iterating over all the pages might be not a good idea at all.

That said, this idea need some research and I really appreciate your help here, as I'm not sure that withing reasonable time I'll be able to research this on my own.

Probably it's worth starting with research - if it's possible to search for page by some metadata

Another thoughts that I have - with additional constraints it might be possible to do without this search, but also with additional load on confluence: as we know parent page, we can try to fetch information about all child pages (recuresively) and use this page tree to search for renamed pages - either based on file name or based on this hash that you mentinoned

@zeldigas zeldigas added the help wanted Extra attention is needed label Mar 23, 2024
@feliksik
Copy link
Author

You are spot on in your analysis. This is not my highest priority, but I'll keep you posted when I make any progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants