Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong order of chapters #279

Open
vic-by opened this issue Aug 10, 2021 · 3 comments
Open

Wrong order of chapters #279

vic-by opened this issue Aug 10, 2021 · 3 comments
Labels
help wanted need more info Please provide more info to address the issue

Comments

@vic-by
Copy link

vic-by commented Aug 10, 2021

The order of chapters is messed up.

  • About The Author
  • Chapter 5
  • “About the Cover Illustration”
  • Chapter 1, 2, 3, 4
  • Chapter 6
    <..>

Is it an issue with this specific title "Marko Lukša. “Kubernetes in Action.” only?
I tried to run the following command twice with the same outcome both times:
python3 safaribooks.py --cred "xxx:yyy" --kindle 9781617293726

@lorenzodifuccia lorenzodifuccia added help wanted need more info Please provide more info to address the issue labels Aug 24, 2021
@0Ky
Copy link

0Ky commented Oct 26, 2021

I can confirm that it's not only this book, it happens to other books I've tested. Indeed the chapters are not in chronological order. There is a problem with the creation of the content.opf file. Below shows the contents of content.opf and Chapter 05 (right after <itemref idref="Author"/>) doesn't belong in that particular line as well as <itemref idref="resources"/> between Chapter 16 and 17 is in the wrong place.

<spine toc="ncx">
    <itemref idref="titlepage"/>
    <itemref idref="titl"/>
    <itemref idref="Copyright"/>
    <itemref idref="Dedication"/>
    <itemref idref="btoc"/>
    <itemref idref="toc"/>
    <itemref idref="Preface"/>
    <itemref idref="Acknowledgments"/>
    <itemref idref="Book"/>
    <itemref idref="Author"/>
    <itemref idref="05"/>  <!-- Chapter 5 is placed in the wrong order -->
    <itemref idref="Cover"/>
    <itemref idref="p1"/>
    <itemref idref="01"/>
    <itemref idref="02"/>
    <itemref idref="p2"/>
    <itemref idref="03"/>
    <itemref idref="04"/>
    <itemref idref="06"/>
    <itemref idref="07"/>
    <itemref idref="08"/>
    <itemref idref="09"/>
    <itemref idref="10"/>
    <itemref idref="p3"/>
    <itemref idref="11"/>
    <itemref idref="12"/>
    <itemref idref="13"/>
    <itemref idref="14"/>
    <itemref idref="15"/>
    <itemref idref="16"/>
    <itemref idref="resources"/> <!-- Resources is placed in the wrong order -->
    <itemref idref="17"/>
    <itemref idref="18"/>
    <itemref idref="A"/>
    <itemref idref="B"/>
    <itemref idref="C"/>
    <itemref idref="D"/>
    <itemref idref="Index"/>
    <itemref idref="Figures"/>
    <itemref idref="Tables"/>
    <itemref idref="Listings"/>
  </spine>

It's either the python script parsing the wrong order and appending or there is some sort of re-arrangement causing the issue.

My Environment:

I'm running the latest commit (e016ad3) as of this post.

  • OS: Ubuntu 21.10 x86_64
  • Kernel: 5.13.0-20-generic
  • Shell: bash 5.1.8
  • Node: v12.22.5
  • npm: v8.1.1
  • Python3: v3.9.7

@johnnywiller
Copy link

johnnywiller commented Oct 29, 2021

I can confirm that this happens quite a lot for me too, in fact basically every book i download. One book that you can test is Strategic Monoliths and Microservices: Driving Innovation Using Purposeful Architecture.

The generated content.obf for me is the following. You can notice for example chapter three coming after preface

<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml" />
<item id="cover" href="cover.xhtml" media-type="application/xhtml+xml" />
<item id="pref00" href="pref00.xhtml" media-type="application/xhtml+xml" />
<item id="praise" href="praise.xhtml" media-type="application/xhtml+xml" />
<item id="halftitle" href="halftitle.xhtml" media-type="application/xhtml+xml" />
<item id="fm01" href="fm01.xhtml" media-type="application/xhtml+xml" />
<item id="title" href="title.xhtml" media-type="application/xhtml+xml" />
<item id="copyright" href="copyright.xhtml" media-type="application/xhtml+xml" />
<item id="contents" href="contents.xhtml" media-type="application/xhtml+xml" />
<item id="foreword" href="foreword.xhtml" media-type="application/xhtml+xml" />
<item id="preface" href="preface.xhtml" media-type="application/xhtml+xml" />
<item id="ch03" href="ch03.xhtml" media-type="application/xhtml+xml" />
<item id="acknowledgments" href="acknowledgments.xhtml" media-type="application/xhtml+xml" />
<item id="authors" href="authors.xhtml" media-type="application/xhtml+xml" />
<item id="part01" href="part01.xhtml" media-type="application/xhtml+xml" />
<item id="ch01" href="ch01.xhtml" media-type="application/xhtml+xml" />
<item id="ch02" href="ch02.xhtml" media-type="application/xhtml+xml" />
<item id="part02" href="part02.xhtml" media-type="application/xhtml+xml" />
<item id="ch04" href="ch04.xhtml" media-type="application/xhtml+xml" />
<item id="ch05" href="ch05.xhtml" media-type="application/xhtml+xml" />
<item id="ch06" href="ch06.xhtml" media-type="application/xhtml+xml" />
<item id="ch07" href="ch07.xhtml" media-type="application/xhtml+xml" />
<item id="part03" href="part03.xhtml" media-type="application/xhtml+xml" />
<item id="ch08" href="ch08.xhtml" media-type="application/xhtml+xml" />
<item id="ch09" href="ch09.xhtml" media-type="application/xhtml+xml" />
<item id="part04" href="part04.xhtml" media-type="application/xhtml+xml" />
<item id="ch10" href="ch10.xhtml" media-type="application/xhtml+xml" />
<item id="ch11" href="ch11.xhtml" media-type="application/xhtml+xml" />
<item id="ch12" href="ch12.xhtml" media-type="application/xhtml+xml" />
<item id="index" href="index.xhtml" media-type="application/xhtml+xml" />

@0Ky
Copy link

0Ky commented Oct 30, 2021

I'm sure someone with better understanding of python and the code base can explain this better than I can, but the problem looks like it's related to line #567 in safaribooks.py.

Problem

When taking a closer look at the original code causing the problem:

result.extend([c for c in response["results"] if "cover" in c["filename"] or "cover" in c["title"]])

The code above will append an item to the "result" list if the dictionary variable contains the word "cover" inside of "filename" or "title" key's value. I'm certain if you look at the chapters or sections that are incorrectly ordered, it will contain the word "cover" in the title or filename, so the problem here is that we can't use in operator because cover can be seen in words like dis𝘤𝘰𝘷𝘦𝘳y.

Workaround?

result.extend([c for c in response["results"] if "cover" == c["title"].lower() or "cover.xhtml" == c["filename"].lower() or "titlepage.xhtml" == c["filename"].lower()])

I've change the in operator to == which should do an exact match of the string and I've added titlepage.xhtml because there are books like API Security in Action (9781617296024) & The Art of Network Penetration Testing (9781617296826) that doesn't contain a cover.xhtml file, instead it's called titlepage.xhtml.

That sorted out the wrong order of chapters, but now there's an issue with the books mentioned above where the cover image from titlepage.xhtml isn't downloaded when using the workaround code above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted need more info Please provide more info to address the issue
Projects
None yet
Development

No branches or pull requests

4 participants