[Feature]: Datanode support to manage both SSD and HDD disks #3239

OpenPie-DTXLab · 2024-03-12T07:11:34Z

Contact Details

Is there an existing issue for this?

I have searched all the existing issues

Is your feature request related to a problem? Please describe.

In our use cases, there are only few machines available to deploy cfs cluster , eg 3 or 5. Those machines are both equipped with SSD and HDD disks. For some cold data, we want to migrate them from SSD to HDD.
As I know, the developing Hybrid-cloud branch support data cool-down feature, but need to deploy at least two zones , in each zone the nodes are limited to configure same type of disks. when migrating, the cold data are transfered from SSD zone to HDD zone.
The solution now is not friendly for small cfs clusters, eg. nodes less than 6 . Also, for machines with both SSD and HDD disks, the migration process can not leverage Locality to imporve performance.

Describe the solution you'd like.

As a solutiion, I think the datanode should have the ability to manage both SSD and HDD which seems more reasonable. As a result, even the cluster with 3 nodes (with both SSD and HDD, one zone) can experience the data cool-down feature, Meanwhile, cubefs can optimize cold data migration performance. When migrating cold data, prefer to choose HDD directories that on the same node as the destination, which can reduce network traffic and improve migration performance greatly.

Describe an alternate solution.

As a alternate solution, we can deploy cfs in containers, so we can deploy two zones on three nodes cluster. Nodes in each zone manage a single type of disks. Nodes configure secondary IP also can help.
But can't use locality to reduce network traffics caused by data migration

Anything else? (Additional Context)

No response

github-actions · 2024-03-12T07:11:55Z

See

64% datanode support config diskPathPrefix to manage disk path list. #1727

OpenPie-DTXLab · 2024-03-12T08:21:07Z

We have a rough idea :

a datanode can configrue different type of disks, so zone also can have multiple media types.
when create volume , if specified two kind of storage class, prefer to create paired SSD and HDD data parittion on the same data node .
when lcnode trigger migration task. First try to transite cold data from SSD srcDp to HDD dstDp which located on the same data node with SSD srcDp.
a. get extents list of the migrating file
b. group the extents by data partition (srcDp)
c. for each srcDp , select a dstDp which is co-located to srcDp .then build a extentsLocalTransition request which contains the local transition context and will be sent to datanode
d. send request to target datanode on which first dp replica located (repl-protocol)
e. all datanodes with dp replicas perform local transition , read from srcDp extents and write to extents on dstDp . And first replica datanode return the inode migrated extents list to lcNode
g. lcNode batch update inode metadata of the migrating file
If local transition failed , then use the original workflow to transite data across nodes

OpenPie-DTXLab · 2024-03-12T08:41:31Z

we have roughly implement a draft version , and verify the idea. #3243

The local transition process described above is faster than the original and even more faster than file upload process. In my environment（4c,8G ,1000M netwrok）,a 4GB file local migration takes about more than 10 seconds(even faster than s3 put which takes about 30 seconds) , and the original cross-node migration process takes about 90 seconds.

true1064 · 2024-03-20T09:55:45Z

This solution requires significant changes, and we can only consider whether to incorporate this approach after the completion of the first phase of the HybridCloud project.

OpenPie-DTXLab added the enhancement New feature or request label Mar 12, 2024

OpenPie-DTXLab assigned leonrayang and xiaochunhe Mar 12, 2024

OpenPie-DTXLab mentioned this issue Mar 12, 2024

feat(datanode, lcnode): Datanode support to manage both SSD and HDD disks and migration optimization #3243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Datanode support to manage both SSD and HDD disks #3239

[Feature]: Datanode support to manage both SSD and HDD disks #3239

OpenPie-DTXLab commented Mar 12, 2024

github-actions bot commented Mar 12, 2024

OpenPie-DTXLab commented Mar 12, 2024 •

edited

OpenPie-DTXLab commented Mar 12, 2024

true1064 commented Mar 20, 2024

[Feature]: Datanode support to manage both SSD and HDD disks #3239

[Feature]: Datanode support to manage both SSD and HDD disks #3239

Comments

OpenPie-DTXLab commented Mar 12, 2024

Contact Details

Is there an existing issue for this?

Is your feature request related to a problem? Please describe.

Describe the solution you'd like.

Describe an alternate solution.

Anything else? (Additional Context)

github-actions bot commented Mar 12, 2024

See

OpenPie-DTXLab commented Mar 12, 2024 • edited

OpenPie-DTXLab commented Mar 12, 2024

true1064 commented Mar 20, 2024

OpenPie-DTXLab commented Mar 12, 2024 •

edited