You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
We found Full GC and OOM behavior on YARN Nodemanger in an environment. after analyzing the heap dump, we found that the ConfigHashSync of the alluxio client and related classes were not being garbage collected properly.
We believe this is due to the client not being properly closed, so we modified the BaseFileSystem class in the client code to see where the client was not properly closed.
publicBaseFileSystem(FileSystemContextfsContext) {
// ...LOG.info("constructing BaseFileSystem with context id {}", mFsContext.getId(), newException("print stack trace"));
}
publicsynchronizedvoidclose() throwsIOException {
// ...LOG.info("closing BaseFileSystem with context id {}", mFsContext.getId(), newException("print stack trace"));
}
In the end, we get the following log:
2024-01-02 11:17:19,817 [2146392] - INFO [LogAggregationService #56:BaseFileSystem@109] - constructing BaseFileSystem with context id app-2020615593602897283
java.lang.Exception: print stack trace
at alluxio.client.file.BaseFileSystem.<init>(BaseFileSystem.java:109)
at alluxio.client.file.FileSystem$Factory.create(FileSystem.java:146)
at alluxio.client.file.FileSystem$Factory.create(FileSystem.java:127)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:533)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:472)
at org.apache.hadoop.fs.DelegateToFileSystem.<init>(DelegateToFileSystem.java:52)
at alluxio.hadoop.AlluxioFileSystem.<init>(AlluxioFileSystem.java:50)
at sun.reflect.GeneratedConstructorAccessor64.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:135)
at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:173)
at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:258)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:342)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:339)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:339)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:465)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:491)
at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:461)
at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter$1.run(AggregatedLogFormat.java:476)
at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter$1.run(AggregatedLogFormat.java:473)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.initialize(AggregatedLogFormat.java:472)
at org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController.initializeWriter(LogAggregationTFileController.java:90)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:459)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:415)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:265)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
When the method FileContext.getFileContext is called, it creates an instance of alluxio.hadoop.AlluxioFileSystem to access Alluxio. However, the base class org.apache.hadoop.fs.AbstractFileSystem does not have a close method to release the held resources, which can lead to memory leaks.
To Reproduce
importorg.apache.hadoop.conf.Configuration;
importorg.apache.hadoop.fs.FileContext;
importorg.apache.hadoop.fs.FileStatus;
importorg.apache.hadoop.fs.Path;
importjava.net.URI;
publicclassMain {
publicstaticvoidmain(String[] args) throwsException {
while (true){
Configurationconf = newConfiguration();
conf.set("fs.AbstractFileSystem.alluxio.impl", "alluxio.hadoop.AlluxioFileSystem");
// every call will create new filesystem but not closeFileContextfileContext = FileContext.getFileContext(URI.create("alluxio://localhost:19998/"), conf);
FileStatusfileStatus = fileContext.getFileStatus(newPath("/"));
System.out.println(fileStatus);
Thread.sleep(200);
}
}
}
Expected behavior
Will not produce memory leaks.
Urgency
High.
Are you planning to fix it
yes.
Additional context
The text was updated successfully, but these errors were encountered:
### What changes are proposed in this pull request?
Fix#18479
### Why are the changes needed?
Fix the memory leak issue when the client accesses Alluxio through Hadoop FileContext API.
### Does this PR introduce any user facing changes?
No.
pr-link: #18480
change-id: cid-f36ec718ebcbc61fdee79cef2ecc8731d0ba2ee5
Alluxio Version:
2.x
Describe the bug
We found Full GC and OOM behavior on YARN Nodemanger in an environment. after analyzing the heap dump, we found that the ConfigHashSync of the alluxio client and related classes were not being garbage collected properly.
We believe this is due to the client not being properly closed, so we modified the
BaseFileSystem
class in the client code to see where the client was not properly closed.In the end, we get the following log:
When the method
FileContext.getFileContext
is called, it creates an instance of alluxio.hadoop.AlluxioFileSystem to access Alluxio. However, the base class org.apache.hadoop.fs.AbstractFileSystem does not have a close method to release the held resources, which can lead to memory leaks.To Reproduce
Expected behavior
Will not produce memory leaks.
Urgency
High.
Are you planning to fix it
yes.
Additional context
The text was updated successfully, but these errors were encountered: