A plugin for Pentaho Data Integration (Kettle) that adds support for DuckDB table input/output.
The following should already be installed and configured on your local machine:
- Maven, version 3+
- Java JDK 11 (or OpenJDK)
- This settings.xml in your /.m2 directory
Begin by cloning this repository locally.
git clone https://github.com/forgineer/duckdb-kettle-plugin.git
After cloning, review the pom.xml
file for the project <version>
tag and ensure it is the same version as the Pentaho Data Integration (Kettle) version you are using. This version can be verified under the Maven repository.
...
<groupId>com.forgineer</groupId>
<artifactId>duckdb-kettle-plugin</artifactId>
<!-- The version should reflect the PDI (Kettle version) -->
<!-- Ex: 9.4.0.0-343, 9.3.0.0-428, etc. -->
<version>9.4.0.0-343</version>
...
Also, review the DuckDB version and update it to the desired version.
...
<dependency>
<groupId>org.duckdb</groupId>
<artifactId>duckdb_jdbc</artifactId>
<!-- The version should reflect JDBC driver used and vice versa -->
<version>0.9.2</version>
</dependency>
...
Lastly, update the JDBC driver jar file name and version to match the pom.xml
file. This file can be verified by downloading from Maven or directly from DuckDB.
@Override
public String[] getUsedLibraries() {
// The version should match POM
return new String[] {"duckdb_jdbc-0.9.2.jar"};
}
After reviewing and/or updating the above, proceed with packaging the jar file.
mvn package
This will create a jar and zip file inside of the target
directory:
- duckdb-kettle-plugin-x.x.x.x-xxx.jar
- duckdb-kettle-plugin-x.x.x.x-xxx.zip
Unpack the zip file into the plugins directory (\data-integration\plugins
) of your local Kettle install. The zip file should include the plugin and necessary JDBC driver for DuckDB.
data-integration\
plugins\
duckdb-kettle-plugin-x.x.x.x-xxx\
lib\
duckdb_jdbc-x.x.x.jar
duckdb-kettle-plugin-x.x.x.x-xxx.jar
Restart Kettle (Spoon).
DuckDB should now be available as a connection type from a table input or output step. Only the database name is required to create the DuckDB file.
✋ Use caution when using Kettle properties within your database name parameter. In recent testing, this has caused errors with the interpreted file path that Kettle generates.