-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
History entry timestamps aren't accurate #119
Comments
Thanks for pointing this out. Timestamps seem to be fundamentally flawed as a unique identifier I think. The new design I'm working on makes them entirely optional and uses a sha256 of the URL instead, but it's going to be hard to change the folder layout of the archive to hashes if everyone's right now are timestamp-based. Related to: #74 |
@kergoth a quick update, v0.3.0 adds some improvement to the timestamp parsing, but it's still not perfect. It doesn't yet handle Firefox's timestamps being off by 10x, and Chrome's timestamps aren't fixed from 1601 yet either, but it's a start: https://github.com/pirate/ArchiveBox/blob/dev/archivebox/util.py#L369 |
I think the latest git checkout django
git pull
docker build . -t archivebox
docker run -v $PWD/output:/data archivebox init
docker run -v $PWD/output:/data archivebox add 'https://example.com'
docker run -v $PWD/output:/data archivebox remove --delete 'https://example.com'
docker run -v $PWD/output:/data archivebox update Comment back here if you're still having troubles with timestamps being wildly off and I can reopen the ticket. |
Firefox uses PRTime, Chrome uses webkit timestamps, neither of which match up as is with bookmark-archiver timestamp expectations. Firefox's timestamps need to be multiplied by 10, otherwise this year's history entries show up as 1974, and chrome's timestamps are in microseconds from 1601. To work around, use
(last_visit_time-11644446702000000)*10
rather thanlast_visit_time
for chrome, andlast_visit_date*10
rather thanlast_visit_date
for firefox. I'm also testing addition of safari history export, but the dates require further massaging than the other two, as they're Mac Absolute Time and in<seconds from 2001>.<microseconds>
form, just multiplying to eliminate the decimal doesn't work as the microseconds lack leading zero padding.For reference, see:
The text was updated successfully, but these errors were encountered: