Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Jupyter Kernel for SystemDS #1998

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

kubieren
Copy link

Add Jupyter Kernel for SystemDS

Description

This pull request introduces a new Jupyter kernel for SystemDS, enabling seamless integration and execution of SystemDS scripts within Jupyter notebooks. The addition aims to provide a more interactive and user-friendly environment for data scientists and researchers working with SystemDS for machine learning and statistical data processing.

Benefits

  • Enhanced Usability: Makes it easier for users to test and run SystemDS scripts in an interactive notebook environment.
  • Increased Accessibility: Lowers the barrier to entry for new users and contributes to the SystemDS community by providing additional tools for exploration and experimentation.
  • Improved Development Workflow: Supports rapid prototyping and visualization of data and results within the same notebook, streamlining the development process for SystemDS scripts.

How to Use

Please refer to the updated README in the scripts/jupyterkernel/ directory for detailed instructions on installing and using the SystemDS kernel in your Jupyter environment.

Comment on lines +7 to +11

- Java JDK 11 or later
- Maven
- Git
- Jupyter Notebook or JupyterLab
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention the operating system this README is written for.

Comment on lines +31 to +33
git clone https://github.com/kubieren/SystemDSKernel.git
cd SystemDSKernel/kernelsds
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the code from the main SystemDS repo, which will be available after this PR is merged.

Comment on lines +28 to +32
//JupyterSocket.JUPYTER_LOGGER.setLevel(Level.WARNING);
JupyterSocket.JUPYTER_LOGGER.setLevel(Level.WARNING);

KernelConnectionProperties connProps = KernelConnectionProperties.parse(contents);
JupyterConnection connection = new JupyterConnection(connProps);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add detailed comments for readability and debuggability, especially for the code that configure flags.

Comment on lines +37 to +52
.preferLong()
//Keywords from a great poem at https://stackoverflow.com/a/12114140
.withKeywords("let", "this", "long", "package", "float")
.withKeywords("goto", "private", "class", "if", "short")
.withKeywords("while", "protected", "with", "debugger", "case")
.withKeywords("continue", "volatile", "interface")
.withKeywords("instanceof", "super", "synchronized", "throw")
.withKeywords("extends", "final", "export", "throws")
.withKeywords("try", "import", "double", "enum")
.withKeywords("false", "boolean", "abstract", "function")
.withKeywords("implements", "typeof", "transient", "break")
.withKeywords("void", "static", "default", "do")
.withKeywords("switch", "int", "native", "new")
.withKeywords("else", "delete", "null", "public", "var")
.withKeywords("in", "return", "for", "const", "true", "char")
.withKeywords("finally", "catch", "byte")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these the keywords for dml? If so, we need to clean this up.

Comment on lines +104 to +106
matrix[i][j] = Double.parseDouble(values[j].trim()); // Trim to remove any leading or trailing spaces.
} catch (NumberFormatException e) {
// Handle the case where the string cannot be parsed as a double.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These parsing logic seem to make strong assumptions regarding the code statements. I like to see some examples of complex dml code. You can use our builtins.

Comment on lines +262 to +266
for (String out: outputArray
) {
//System.out.println("out: "+out);
expr = expr + ";\nwrite("+out+", './tmp/"+out+"');" ;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing the output objects to disk after each cell is extremely inefficient. We need to utilize the jmlc framework to maintain the variables in-memory.

Comment on lines +2 to +3
import org.apache.sysds.api.jmlc.*;
import org.apache.sysds.common.Types;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to avoid wildcard imports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
2 participants