-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Slowdown after 160k files written #518
Comments
Do I understand correctly: you first wrote data from one physical machine (machine A) to one path on MooseFS (path X) and then from another machine (machine B) to another path on MooseFS (path Y)? And writing from A to X was much faster, than writing from B to Y? If yes, the obvious answer would be, that there is some problem with B's connection to your MooseFS instance. Do you have any messages in logs (all of them: master, chunk servers, client B) about timeouts, disconnections, a "long loop" message? |
Correct
Both A and B have direct connections (Wireguard VPN) to the chunkservers (iiuc the writing does go directly to the chunkserver from the writing machine, right?). Wierdly enough after B wrote to the MooseFS filesystem all subsequent writes from A were also rather slow.
Yeah after B finished writing the time at least doubled on all future writes. Including the ones from A. |
Can you describe the network side of your cluster in more detail in addition to all relevant storage class definitions and chunkserver labels? Topology, bandwidth (iperf) and typical RTTs between all nodes with and without witeguard, wiregurd in full mesh or something else, known bottlenecks, etc..? "Some kind of corruption" on your mfsmaster node sounds too vague. Please be more specific |
There is no deduplication in MooseFS (aka the system does not in any way analyse the content of the data it writes, the only exception being trailing or ending zeros, which are not physically written to disks). |
Have you read through available documentation, open Github issues and Github Q&A Discussions?
Yes
System information
Your moosefs version and its origin (moosefs.com, packaged by distro, built from source, ...).
apt install moosefs-* 3.0.116
Operating system (distribution) and kernel version.
Ubuntu 22.04 LTS
Hardware / network configuration, and underlying filesystems on master, chunkservers, and clients.
Mostly 1GBe connections between 22 servers,
20 chunkservers 2x512GB SSD (in Software RAID1) with ext4 FS (non-dedicated disks)
How much data is tracked by moosefs master (order of magnitude)?
4.1 TB in ~160k files
Describe the problem you observed.
After I've wrote about 1.2TB of data to my mfs path at /opt/mfs/pub1/, I've started writing similiar data from another host to /opt/mfs/pub2/ I've noticed that the writes from the second host were already must slower as the copying of it's 1.2TB (as said same application data) was still running the next morning. Also the first server (which happens to be the mfsmaster too) seemingly experienced some data corruption.
Is there chunk deduplication or something when there's two similiar looking files?
Can you reproduce it? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
Yes still slow for some reason.
Troubleshooting steps:
Include any warning/errors/backtraces from the system logs.
There was something seemingly happen around 08:30 which made the system slow down significantly.
Any help appreciated.
The text was updated successfully, but these errors were encountered: