Dealing with page ID stability: being robust when renaming page titles #144

feliksik · 2024-01-25T09:18:39Z

As the sourcetext files do not have a confluence ID, the identity is matched based on the page title. A page title is unique in a confluence space.

But with the title as ID, a page is rename it leads to the page being deleted, and a new page is created. As a consequence, incoming links break 🙁 (i.e. links from other confluence pages/spaces not managed by the same text2confl project). It would be much better to update the existing page instead.

We could use the filename (and optionally/additionally a self-made-up-identifier metadata field that can be even more stable, but has to be managed manually in the text file), and put this as metadata in the confluence page. When uploading this, we would read the metadata and match with the file, thus having a bridge between the confluence page id and the file on the asciidoc/md contents. This would solve the delete-recreate issue.

I suppose there are some details to work out, and alternative approaches to consider. I think this would be a very valuable feature for usage in a larger context, where the stability of incoming links is rather important. I'm happy to collaborate on making this work.

(Note: Initially I also mentioned the page stealing issue here, mentioned in #142, but I now think it deserves a separate solution).

The text was updated successfully, but these errors were encountered:

feliksik · 2024-01-25T09:59:48Z

I have also thought about having a metadata confluence-page-id field that can be managed in the text file, instead of a self-made-up-identifier that needs to be added in the text file, and administered as metadata in Confluence.

Obviously this would only be known after creation in Confluence, so this needs to be added to the text later; either manually, or even by the tooling, inserting this in the text file as 1st line.

However, this does not seem like a good idea:

if adding the ID would be done by the tooling: it's probably a CI/CD pipeline running the deployment, so not a great moment to do new git commits.
I can imagine a workflow where you deploy the docs to MyProductionSpace in CI/CD, but I develop the documentation before PR/merge in MyTestSpace. A self-made-up-identifier could be unique per space, but be reused in both the Test and the Production space. On the contrary, confluence-page-id would break this workflow of having 2 deployments of the same document (unless we make it more advanced/complicated).

feliksik · 2024-02-12T10:11:08Z

New idea: provide an option --follow-git-renames. It will use something like git log --follow --diff-filter=A -- possibly-renamed.md to determine the hash commit of where a file was introduced, and uses the ${commitHash}-${originalFilename} as the identifier of the document, in the confluence page metadata.

I think this will achieve exactly what I intend:

renaming the file will keep the same page id
changing the title will keep the same page id
no manual effort needed to maintain an self-made-up-id in the document metadata

@zeldigas I'm not sure how much time you spend on this project, but I may get to implementing this myself when time permits. Either way, it's useful to first align on this idea.

zeldigas · 2024-03-23T15:43:08Z

@feliksik I believe that file rename is not an issue at all for page renames when you have explicit page titles - any sort of cleanup is done after all doc tree is processed, so even if file name was renamed or even moved under another location it will be processed properly

But for title renames it's challenging. You mentioned some metadata that can be associated with the page. While this can be set for sure, the main challenge would be to find this page - I did not dig deep into it, but I doubt that it's available out of the box if even possible. I see some docs, that is applied only to server version and requires server setup configuration: https://developer.atlassian.com/server/confluence/content-properties-in-the-rest-api/. And iterating over all the pages might be not a good idea at all.

That said, this idea need some research and I really appreciate your help here, as I'm not sure that withing reasonable time I'll be able to research this on my own.

Probably it's worth starting with research - if it's possible to search for page by some metadata

Another thoughts that I have - with additional constraints it might be possible to do without this search, but also with additional load on confluence: as we know parent page, we can try to fetch information about all child pages (recuresively) and use this page tree to search for renamed pages - either based on file name or based on this hash that you mentinoned

feliksik · 2024-03-28T07:18:29Z

You are spot on in your analysis. This is not my highest priority, but I'll keep you posted when I make any progress.

feliksik mentioned this issue Jan 25, 2024

Unclear error when a page title is identical to that of the parent-id #142

Closed

zeldigas added the help wanted Extra attention is needed label Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dealing with page ID stability: being robust when renaming page titles #144

Dealing with page ID stability: being robust when renaming page titles #144

feliksik commented Jan 25, 2024 •

edited

feliksik commented Jan 25, 2024 •

edited

feliksik commented Feb 12, 2024 •

edited

zeldigas commented Mar 23, 2024

feliksik commented Mar 28, 2024

Dealing with page ID stability: being robust when renaming page titles #144

Dealing with page ID stability: being robust when renaming page titles #144

Comments

feliksik commented Jan 25, 2024 • edited

feliksik commented Jan 25, 2024 • edited

feliksik commented Feb 12, 2024 • edited

zeldigas commented Mar 23, 2024

feliksik commented Mar 28, 2024

feliksik commented Jan 25, 2024 •

edited

feliksik commented Jan 25, 2024 •

edited

feliksik commented Feb 12, 2024 •

edited