Force FSYNC #115

guestisp · 2018-05-20T13:32:55Z

Is possible to force MooseFS to issue fsync on every operation before returning an ACK to the client?

What if i want to be 100% sure that data is properly written to disk even if the client is not asking for fsync?

oszafraniec · 2018-05-20T18:02:13Z

https://moosefs.com/Content/Downloads/moosefs-2-0-users-manual.pdf
6.3.1 mfschunkserver.cfg
[...]
• HDD FSYNC BEFORE CLOSE – enables/disables fsync before chunk closing; deafult is 0 (off)

I think this is what you want :)

guestisp · 2018-05-20T18:33:22Z

So, setting that to 1, means to call FSYNC on every fclose ?

What if client is asking for fsync by it's own and HDD FSYNC BEFORE CLOSE is set as 0? fsync is honored or ignored due to the config parameter ?

oszafraniec · 2018-05-20T18:50:43Z

@OXide94 can help here...

pkonopelko · 2018-05-20T19:22:47Z

Hi @guestisp and @oszafraniec,

The parameter @oszafraniec mentioned is about fsync before chunk closing, so let's not mix up two completely different things. What @guestisp wants to achieve (I believe) is to get ACK after every write operation (so at "Client <--> Chunkserver" level, not at the level of Chunk writing).

In MooseFS write is a transaction, let's assume somebody is writing 2 MiB. These 2 mebibytes are anyway divided into 64 kiB blocks. In current implementation Client connects to the CS, sends 64 kiB, does not wait for ACK and sends next 64 kiB.

We can consider adding such parameter. The questions is: @guestisp, would you like to have fsync after whole group of blocks (in this example 2 MiB) or after every 64 kiB? As stated above, when data is being sent, Client does not wait for ACK of every 64 kiB block, so ACKs are a bit delayed (milliseconds). Please keep in mind, that it would probably slow down whole transaction (maybe not that much, because ACKs in this case would probably just reach client a bit later because of fsync (some more milliseconds)).

This is theory, we would probably need to make some comparison tests (just add this fsync in code and see performance differences with and without it).

Thanks,
Peter / MooseFS Team

guestisp · 2018-05-20T19:54:32Z

@OXide94 Preface: I'm not an expert.

There are AFAIK, two ways to be sure that a write operation has really reached disk and not any cache buffer: by opening a file with O_SYNC or by issuing fsync on a file hadler.

Now, what this means in MooseFS, I don't know.

My questions are (obviously, i'm talking about files stored on MooseFS)

what happens by opening a file with O_SYNC set ?
what happens by issuing a fsync after an fwrite and before fclose ?
what happens without any of 2 above, but setting HDD_FSYNC_BEFORE_CLOSE to 1?

We can consider adding such parameter

From my point of view, anything aimed to improve data consistency should be added so that any sysop is able to choose based on their requirements. If adding a flag is not an issue, yes, add it. If adding 2 flags (one for fsync after every 64kb, one for fsync after the whole group of block) is not an issue, than please add both.

Anyway, I don't think adding fsync after every 64kb is useful as long you force clients to wait for the final fsync. If the whole file can't be properly flushed on disk, you should block the write operation and notify the client (that is still here waiting for a write ACK). Why you should send fsync after each 64kb?

In example, if you run dd if=/dev/zero of=test bs=100M count=5 conv=fsync, a single fsync is issued after all 5 chunks.

If you run dd if=/dev/zero of=test bs=100M count=5 oflag=direct no fsync is issued because file is opened with O_SYNC, probably the Kernel flush automatically to disk thanks to the O_SYNC flag

Will MooseFS honor these two cases ? Can we force one of these cases (or both) by setting a configuration parameter even if client is not asking for fsync or O_SYNC ?

guestisp · 2018-05-22T19:53:51Z

Trying to figure out how this works.
Even setting HDD_FSYNC_BEFORE_CLOSE to 1, i don't see any fsync call by stracing the mfschunkserver.

even writing with O_SYNC flag set when writing a file, the flag seems to be ignored by MooseFS, as chunk file is wrote without O_SYNC flag:

open("/mnt/moosefs//00/chunk_000000000000B072_00000001.mfs", O_RDWR|O_CREAT|O_TRUNC, 0666)

did i miss something?

guestisp · 2018-05-22T20:13:39Z

Small correction: by setting HDD_FSYNC_BEFORE_CLOSE to 1, fsync is called, but flags set by the client are still ignored. Thus, the client can't ask for a sync write.

Or we can set all writes to be sync with HDD_FSYNC_BEFORE_CLOSE or all writes are made async, regardless what client is asking. If a client is asking for O_SYNC or similiar, it should be honored, because that kind of data could be very valuable.

acid-maker · 2018-05-23T09:59:01Z

ok. This is my fault. I didn't know that FUSE passes such flags as O_SYNC to userspace, but I've just checked and it does. It passes O_SYNC,O_ASYNC,O_NONBLOCK and O_NOATIME. Now I need to think how to take them into account. The most important is probably O_SYNC. We have many options here:

do internal 'fsync' after each 'write' - likely the worst choice (safest one, but very slow - expected write speed would be less than 1MB/s)
Pass this O_SYNC flag to CS and open chunks with such flag (or do fsync on CS after each write - probably same result). In such case client's write will return immediatelly without sync'ing data but each ACK will be send back to the mfsmount only after successful write of each portion of data.
After sending all data to CS send new packet "perform fsync" and wait for ACK. Similar to previous one - the main difference is that it would sync whole stream from mfsmount to CS once, not each write. Likely the same result, but much more efficient.

In 2 and 3 successful fsync/close (but not write) done by client on descriptor opened with O_SYNC will mean that your data are synced to disks on CS.

What do you think? In my opinion option 3 is the best. Is it safe enough?

zcalusic · 2018-05-23T10:07:28Z

O_SYNC means "Write operations on the file will complete according to the requirements of synchronized I/O file integrity completion", so if you do the first part of 2. "Pass this O_SYNC flag to CS and open chunks with such flag" it should be enough to cover the semantics, and no additional fsync() should be needed.

zcalusic · 2018-05-23T10:10:58Z

While at it, see if O_DSYNC can also be implemented, it's similar:

   _O_SYNC provides synchronized I/O file integrity completion, meaning
   write operations will flush data and all associated metadata to the
   underlying hardware.  O_DSYNC provides synchronized I/O data
   integrity completion, meaning write operations will flush data to the
   underlying hardware, but will only flush metadata updates that are
   required to allow a subsequent read operation to complete
   successfully.  Data integrity completion can reduce the number of
   disk operations that are required for applications that don't need
   the guarantees of file integrity completion._

http://man7.org/linux/man-pages/man2/open.2.html

guestisp · 2018-05-23T10:17:31Z

@zcalusic the problem is that MooseFS is totally ignoring this flag and this flag is not being passed when opening chunk file for writing...

zcalusic · 2018-05-23T10:21:10Z

@guestisp , please read the comments before replying, you have missed at least one from @acid-maker

guestisp · 2018-05-23T10:25:24Z

Sorry my fault
I've only received your email notification and not the first one

guestisp · 2018-05-23T10:32:05Z

Anyway, O_SYNC should return to the client only when write is properly stored on disk and not immediatly as @acid-maker said or the client will be unaware of any failures

zcalusic · 2018-05-23T10:51:40Z

Yes, of course. But, as the write() call is synchronous, it's just a matter of passing its return status to the caller, hopefully that can be easily integrated with the current workflow, don't know MooseFS internals well, but @acid-maker will. 😄

guestisp · 2018-05-23T11:00:49Z

@acid-maker wrote differently: "In such case client's write will return immediatelly without sync'ing data"

So, writes won't be synchronous but still in writeback.

If, as client, i'm asking for o_sync is because i want to be 100% sure that data is really flushed to disk so write must return after the real flush, even if much slower
(Not all writes needs to be sync, thus the write penalty shouldn't be an issue)

zcalusic · 2018-05-23T11:19:00Z

I see. I mostly ignored that part thinking that opening chunk with O_SYNC should be enough, and that write() propagates its return code back upstream, already. I base my understanding on the following figure:

https://www.researchgate.net/profile/Weigang_Wu/publication/271464202/figure/fig1/AS:295235751038978@1447401096555/The-read-write-process-of-MooseFS-9.png

So, O_SYNC and similar would be passed via pt. 4/5/6, and write() would return status code via pt. 7. Of course, that figure is much simplified, and the real world is certainly more complicated. :)

In any case, supporting these flags would bring MooseFS closer to POSIX compatibility, so it would be great if they could be added and properly supported.

borkd · 2018-10-04T15:59:47Z

@acid-maker: to follow up on our conversation - one idea was to keep current fsync behavior as-is, but use an extended attribute to mark files or directory trees where FSYNC or DIRECT compliance is required, and honoring that flag all the way to the chunk writes.

dumblob · 2021-10-07T22:43:46Z

Any news on this rather fundamental behavior?

chogata · 2021-10-11T12:10:27Z

This is still on our roadmap.

Motophan · 2022-06-03T09:51:08Z

Hi, please add core filesystem functionality for basic operation, thank you.

guestisp · 2022-06-03T09:54:17Z

they will never add something useful, the promised, years ago, a v4 free for everyone, i also had binaries to test and use with an awesome HA mode but the v4 is still closed source and unavailable. they promise a lot of things ....

guestisp · 2022-06-03T09:55:01Z

it's a shame because mfs is by far the best distributed storage available

Motophan · 2023-02-03T09:47:28Z

Hi, how is roadmap doing these days?

guestisp · 2023-02-08T18:38:17Z

Hi, how is roadmap doing these days?

they does nothing except bug fix.
they promised a lot of things like open source v4, years ago, but still nothing.

they talk, talk, talk, talk, .....

pkonopelko added feature Idea of a new feature to make MooseFS even better! :) question Question labels May 20, 2018

acid-maker self-assigned this May 25, 2018

borkd pinned this issue Dec 19, 2018

borkd added need feedback documentation Issue related to documentation help wanted PR welcome labels Dec 19, 2018

borkd added data safety Tag issues and questions regarding potential data safety issues. Improve existing documentation. performance labels Jan 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force FSYNC #115

Force FSYNC #115

guestisp commented May 20, 2018

oszafraniec commented May 20, 2018

guestisp commented May 20, 2018

oszafraniec commented May 20, 2018

pkonopelko commented May 20, 2018 •

edited

guestisp commented May 20, 2018

guestisp commented May 22, 2018

guestisp commented May 22, 2018

acid-maker commented May 23, 2018

zcalusic commented May 23, 2018

zcalusic commented May 23, 2018

guestisp commented May 23, 2018

zcalusic commented May 23, 2018

guestisp commented May 23, 2018

guestisp commented May 23, 2018

zcalusic commented May 23, 2018

guestisp commented May 23, 2018

zcalusic commented May 23, 2018

borkd commented Oct 4, 2018

dumblob commented Oct 7, 2021

chogata commented Oct 11, 2021

Motophan commented Jun 3, 2022

guestisp commented Jun 3, 2022

guestisp commented Jun 3, 2022

Motophan commented Feb 3, 2023

guestisp commented Feb 8, 2023

Force FSYNC #115

Force FSYNC #115

Comments

guestisp commented May 20, 2018

oszafraniec commented May 20, 2018

guestisp commented May 20, 2018

oszafraniec commented May 20, 2018

pkonopelko commented May 20, 2018 • edited

guestisp commented May 20, 2018

guestisp commented May 22, 2018

guestisp commented May 22, 2018

acid-maker commented May 23, 2018

zcalusic commented May 23, 2018

zcalusic commented May 23, 2018

guestisp commented May 23, 2018

zcalusic commented May 23, 2018

guestisp commented May 23, 2018

guestisp commented May 23, 2018

zcalusic commented May 23, 2018

guestisp commented May 23, 2018

zcalusic commented May 23, 2018

borkd commented Oct 4, 2018

dumblob commented Oct 7, 2021

chogata commented Oct 11, 2021

Motophan commented Jun 3, 2022

guestisp commented Jun 3, 2022

guestisp commented Jun 3, 2022

Motophan commented Feb 3, 2023

guestisp commented Feb 8, 2023

pkonopelko commented May 20, 2018 •

edited