Wrong order of chapters #279

vic-by · 2021-08-10T13:44:41Z

The order of chapters is messed up.

About The Author
Chapter 5
“About the Cover Illustration”
Chapter 1, 2, 3, 4
Chapter 6
<..>

Is it an issue with this specific title "Marko Lukša. “Kubernetes in Action.” only?
I tried to run the following command twice with the same outcome both times:
python3 safaribooks.py --cred "xxx:yyy" --kindle 9781617293726

The text was updated successfully, but these errors were encountered:

0Ky · 2021-10-26T18:04:50Z

I can confirm that it's not only this book, it happens to other books I've tested. Indeed the chapters are not in chronological order. There is a problem with the creation of the content.opf file. Below shows the contents of content.opf and Chapter 05 (right after <itemref idref="Author"/>) doesn't belong in that particular line as well as <itemref idref="resources"/> between Chapter 16 and 17 is in the wrong place.

<spine toc="ncx">
    <itemref idref="titlepage"/>
    <itemref idref="titl"/>
    <itemref idref="Copyright"/>
    <itemref idref="Dedication"/>
    <itemref idref="btoc"/>
    <itemref idref="toc"/>
    <itemref idref="Preface"/>
    <itemref idref="Acknowledgments"/>
    <itemref idref="Book"/>
    <itemref idref="Author"/>
    <itemref idref="05"/>  <!-- Chapter 5 is placed in the wrong order -->
    <itemref idref="Cover"/>
    <itemref idref="p1"/>
    <itemref idref="01"/>
    <itemref idref="02"/>
    <itemref idref="p2"/>
    <itemref idref="03"/>
    <itemref idref="04"/>
    <itemref idref="06"/>
    <itemref idref="07"/>
    <itemref idref="08"/>
    <itemref idref="09"/>
    <itemref idref="10"/>
    <itemref idref="p3"/>
    <itemref idref="11"/>
    <itemref idref="12"/>
    <itemref idref="13"/>
    <itemref idref="14"/>
    <itemref idref="15"/>
    <itemref idref="16"/>
    <itemref idref="resources"/> <!-- Resources is placed in the wrong order -->
    <itemref idref="17"/>
    <itemref idref="18"/>
    <itemref idref="A"/>
    <itemref idref="B"/>
    <itemref idref="C"/>
    <itemref idref="D"/>
    <itemref idref="Index"/>
    <itemref idref="Figures"/>
    <itemref idref="Tables"/>
    <itemref idref="Listings"/>
  </spine>

It's either the python script parsing the wrong order and appending or there is some sort of re-arrangement causing the issue.

My Environment:

I'm running the latest commit (e016ad3) as of this post.

OS: Ubuntu 21.10 x86_64
Kernel: 5.13.0-20-generic
Shell: bash 5.1.8
Node: v12.22.5
npm: v8.1.1
Python3: v3.9.7

johnnywiller · 2021-10-29T12:49:44Z

I can confirm that this happens quite a lot for me too, in fact basically every book i download. One book that you can test is Strategic Monoliths and Microservices: Driving Innovation Using Purposeful Architecture.

The generated content.obf for me is the following. You can notice for example chapter three coming after preface

<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml" />
<item id="cover" href="cover.xhtml" media-type="application/xhtml+xml" />
<item id="pref00" href="pref00.xhtml" media-type="application/xhtml+xml" />
<item id="praise" href="praise.xhtml" media-type="application/xhtml+xml" />
<item id="halftitle" href="halftitle.xhtml" media-type="application/xhtml+xml" />
<item id="fm01" href="fm01.xhtml" media-type="application/xhtml+xml" />
<item id="title" href="title.xhtml" media-type="application/xhtml+xml" />
<item id="copyright" href="copyright.xhtml" media-type="application/xhtml+xml" />
<item id="contents" href="contents.xhtml" media-type="application/xhtml+xml" />
<item id="foreword" href="foreword.xhtml" media-type="application/xhtml+xml" />
<item id="preface" href="preface.xhtml" media-type="application/xhtml+xml" />
<item id="ch03" href="ch03.xhtml" media-type="application/xhtml+xml" />
<item id="acknowledgments" href="acknowledgments.xhtml" media-type="application/xhtml+xml" />
<item id="authors" href="authors.xhtml" media-type="application/xhtml+xml" />
<item id="part01" href="part01.xhtml" media-type="application/xhtml+xml" />
<item id="ch01" href="ch01.xhtml" media-type="application/xhtml+xml" />
<item id="ch02" href="ch02.xhtml" media-type="application/xhtml+xml" />
<item id="part02" href="part02.xhtml" media-type="application/xhtml+xml" />
<item id="ch04" href="ch04.xhtml" media-type="application/xhtml+xml" />
<item id="ch05" href="ch05.xhtml" media-type="application/xhtml+xml" />
<item id="ch06" href="ch06.xhtml" media-type="application/xhtml+xml" />
<item id="ch07" href="ch07.xhtml" media-type="application/xhtml+xml" />
<item id="part03" href="part03.xhtml" media-type="application/xhtml+xml" />
<item id="ch08" href="ch08.xhtml" media-type="application/xhtml+xml" />
<item id="ch09" href="ch09.xhtml" media-type="application/xhtml+xml" />
<item id="part04" href="part04.xhtml" media-type="application/xhtml+xml" />
<item id="ch10" href="ch10.xhtml" media-type="application/xhtml+xml" />
<item id="ch11" href="ch11.xhtml" media-type="application/xhtml+xml" />
<item id="ch12" href="ch12.xhtml" media-type="application/xhtml+xml" />
<item id="index" href="index.xhtml" media-type="application/xhtml+xml" />

0Ky · 2021-10-30T17:57:48Z

I'm sure someone with better understanding of python and the code base can explain this better than I can, but the problem looks like it's related to line #567 in safaribooks.py.

Problem

When taking a closer look at the original code causing the problem:

result.extend([c for c in response["results"] if "cover" in c["filename"] or "cover" in c["title"]])

The code above will append an item to the "result" list if the dictionary variable contains the word "cover" inside of "filename" or "title" key's value. I'm certain if you look at the chapters or sections that are incorrectly ordered, it will contain the word "cover" in the title or filename, so the problem here is that we can't use in operator because cover can be seen in words like dis𝘤𝘰𝘷𝘦𝘳y.

Workaround?

result.extend([c for c in response["results"] if "cover" == c["title"].lower() or "cover.xhtml" == c["filename"].lower() or "titlepage.xhtml" == c["filename"].lower()])

I've change the in operator to == which should do an exact match of the string and I've added titlepage.xhtml because there are books like API Security in Action (9781617296024) & The Art of Network Penetration Testing (9781617296826) that doesn't contain a cover.xhtml file, instead it's called titlepage.xhtml.

That sorted out the wrong order of chapters, but now there's an issue with the books mentioned above where the cover image from titlepage.xhtml isn't downloaded when using the workaround code above.

lorenzodifuccia added help wanted need more info Please provide more info to address the issue labels Aug 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong order of chapters #279

Wrong order of chapters #279

vic-by commented Aug 10, 2021 •

edited

0Ky commented Oct 26, 2021 •

edited

johnnywiller commented Oct 29, 2021 •

edited

0Ky commented Oct 30, 2021 •

edited

Wrong order of chapters #279

Wrong order of chapters #279

Comments

vic-by commented Aug 10, 2021 • edited

0Ky commented Oct 26, 2021 • edited

My Environment:

johnnywiller commented Oct 29, 2021 • edited

0Ky commented Oct 30, 2021 • edited

Problem

Workaround?

vic-by commented Aug 10, 2021 •

edited

0Ky commented Oct 26, 2021 •

edited

johnnywiller commented Oct 29, 2021 •

edited

0Ky commented Oct 30, 2021 •

edited