Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovery data from chunks without metadata :) #572

Open
asyslinux opened this issue Mar 18, 2024 · 4 comments
Open

Recovery data from chunks without metadata :) #572

asyslinux opened this issue Mar 18, 2024 · 4 comments

Comments

@asyslinux
Copy link

asyslinux commented Mar 18, 2024

Hello dear MooseFS developers. I know that on your website it is written that it is extremely difficult to recover data without metadata from chunks.

But it so happened that I only had chunks left; the server was hacked and the metadata and metadata backups were destroyed.

But I would like to recover only "jpg" files from the chunks. They are all less than 64MB.


Past MooseFS 3.0.117 configuration:

goal = 1

hdd-1,3,5 = chunkserver 1 = LABEL A
hdd-2,4,6 = chunkserver 2 = LABEL B

I still have .chunkdb files left...


Since I am a developer myself, I can write a recovery utility myself, but I have a couple of questions:

  1. How to directly read chunks files, is there some format, bytes-characters-delimiters between files located inside a chunk file? I am good at working with bytes. Please tell me how I can work with chunk files directly, if possible.

  2. How exactly are real files distributed among chunk files? Can a real file smaller than, say, 32MB in size be, within one chunk server, divided between several chunk files within one hard drive, or even within different hard drives in chunk files?

It’s just that if the chunk format has separators, a certain or custom marshaller/unmarshaller that you use, and if real small-sized files are stored within a specific chunk file on a specific hdd, and are not divided into small parts between a bunch of different chunk files, then I can easily write a utility that counts bytes from chunk files separated by delimiters and determines from the beginning of the data that this is, say, a “jpg” picture and I can easily restore my photo archive for 10 years.

I hope you can help me with information about the chunk file format. Or at least point to your source code regarding the chunk file format, which I will need to pay attention to.

Thank you very much.

@deltabweb
Copy link

Hi,
To quickly answer your first question, the chunks are prefixed with an 8KB header.
I had to recover jpg files recently as well; for small files, you can recover the data with tail -c +8193 chunk.mfs > image.jpg

@asyslinux
Copy link
Author

@deltabweb, thx for advice. https://github.com/asyslinux/irec - utility simply find start/end jpg bytes and then recovery jpg files with size <64MB from any raw/device/chunk file, in this case there is no need to skip first 8KB header. Now the recovery process is underway, in a few days I will write the result.

@asyslinux
Copy link
Author

Updated information about recovery jpg files, now as default recovery tool i using standard recoverjpeg utility in linux distros + small shell script, this is better tool for recovery jpg files, than custom script or program. New manual here: https://github.com/asyslinux/irec

@asyslinux
Copy link
Author

Success story. I recovered from MooseFS chunks most files with size <64MB - jpg, png, webp, docs, archives and much more, photorec can recover around 300+ different types of files. Files recovered without real filenames.

Release of recovery manual for recovery all file types from MooseFS with photorec: https://github.com/asyslinux/irec
Maybe someone will find this useful. Developers can close the issue and save this manual for another people.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants