Support partially specified writes from case classes #1139

aashishs101 · 2017-08-23T19:27:47Z

Currently, we cannot use rdd.saveToCassandra on an RDD[CaseClass] if CaseClass does not contain all of the fields from the target table. This will currently result in the Error:

java.lang.IllegalArgumentException: requirement failed: Columns not found in CaseClass: [missing_field1, missing_field2]

This can be limiting if the table has had columns added to it from an external process. If this is the case, a job that writes to the table will automatically fail. This PR proposes a resolution for this issue by making the failure case more permissive (if all of the first n fields of the target table are matched by the case class, then we will attempt the write).

SPARKC-476: SessionProxy Should proxy all Runtime Interfaces

Previously when a DataFrame was turned into an RDD by it's `rdd` method saveToCassandra and joinWithCassandraTable would fail because of the lack of an implicit RowWriterFactory. To fix this we add an implicit for a RowWriterFactory for RDD[T <: Row] which ends up mapping to the SqlRowWriterFactory which we already have written.

The CassandraRDDMock just passes through another RDD and pretends it is a CassandraRDD.

SPARKC-466: Add a CassandraRDDMock for end users to use in Unit Testing

SPARKC-475: Add implicit RowWriterFactory for RDD[Row]

Fix doc links Adds Python Dictionary as Kwargs Example More examples, Common issues Minor edits for consistency Text and link edits

Refresh Documentation to Use Spark 2.X concepts

* SPARKC 492: Protect against Size Estimate Overflows  * SPARKC-491: add java.time classes support to converters and sparkSQL  * SPARKC-470: Allow Writes to Static Columnns and Partition Keys

Link to spark-connector Slack channel at DataStax Academy Slack

[DOCS][MINOR] Formatting

For some reason the logic for finding the module names was not working (at least on my Mac). Here I simplify it by just iterating over the two folders we know about, therefore making sure the output files exist in their own separate folders.

B2.0

Add support for LcalDate year only parsing

datastax-bot · 2017-08-23T19:27:55Z

Hi @aashishs101, thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign Spark Cassandra Connector CLA. It's all electronic and will take just minutes.

datastax-bot · 2017-08-23T19:29:46Z

Thank you @aashishs101 for signing the Spark Cassandra Connector CLA.

RussellSpitzer · 2017-08-23T22:46:41Z

...r/src/test/scala/com/datastax/spark/connector/writer/MappedToGettableDataConverterSpec.scala


 it should "throw a meaningful exception when a column has an incorrect type" in {
 val userTable = newTable(loginColumn, addressColumn)
- val user = UserWithUnknownType("foo", new UnknownType)


Why was this test changed?

RussellSpitzer · 2017-08-23T22:49:27Z

...andra-connector/src/main/scala/com/datastax/spark/connector/mapper/DefaultColumnMapper.scala

 val mappedColumns = getterMap.values.toSet
 val unmappedColumns = selectedColumns.filterNot(mappedColumns)
- require(unmappedColumns.isEmpty, s"Columns not found in $tpe: [${unmappedColumns.mkString(", ")}]")
+ require(selectedColumns.endsWith(unmappedColumns), s"Unmapped columns nust be at end of table definition: [${unmappedColumns.mkString(", ")}]")


Not sure what this means, seems like it would only match if unmappedColumns was in the same order as selectedColumns

RussellSpitzer · 2017-08-23T22:54:24Z

...assandra-connector/src/main/scala/com/datastax/spark/connector/writer/DefaultRowWriter.scala

@@ -19,13 +19,12 @@ class DefaultRowWriter[T : TypeTag : ColumnMapper](

 override def readColumnValues(data: T, buffer: Array[Any]) = {
 val row = converter.convert(data)
- for (i <- columnNames.indices)
+ for (i <- row.columnValues.indices)


This returns values in the order of the data and not in the order of the selected columns?

It shouldn't be changing the order, since these are the indices of the Seq and all of the actual data access is being done on the next line where members of the row and buffer are being accessed/modified. In most cases, the number of table columns will be equal to the number of values in the row. However, in the case that we're interested in (the underspecified case class case), the number of columns in the table will be greater than the number of values in the row, and so we want to use the smaller number.

makes sense

aashishs101 · 2017-08-24T17:56:50Z

...andra-connector/src/main/scala/com/datastax/spark/connector/mapper/DefaultColumnMapper.scala

+ if (!isTopLevel)
+ require(unmappedColumns.isEmpty, s"Columns not found in nested $tpe: [${unmappedColumns.mkString(", ")}]")
+ else
+ require( selectedColumns.endsWith(unmappedColumns), s"Unmapped columns nust be at end of table definition: [${unmappedColumns.mkString(", ")}]")


@RussellSpitzer the comment you had here was removed after my most recent push, but I still wanted to address it. Based on the way that unmappedColumns is created (as a filter on selectedColumnson line 109), the two sequences should be in the same order. So, this requirement is essentially asserting that we should map the first n columns from the table with the case class, but subsequent columns can go unmapped.

aashishs101 · 2017-11-20T21:11:24Z

@RussellSpitzer, any word on this?

RussellSpitzer · 2017-11-20T21:56:33Z

Do we have a SPARKC ticket for this yet? I couldn't find one in jira?
https://datastax-oss.atlassian.net/issues/

aashishs101 · 2017-11-21T18:54:11Z

Sorry, I hadn't created one until now. Here it is: https://datastax-oss.atlassian.net/browse/JAVA-1676

RussellSpitzer · 2017-11-28T00:39:09Z

Going to try to trigger jenkins from here

RussellSpitzer · 2017-11-28T00:39:16Z

test this please

RussellSpitzer · 2017-11-28T00:39:26Z

Tests running!

ds-jenkins-builds · 2017-11-28T00:59:59Z

Build against Scala 2.10 finished with success

ds-jenkins-builds · 2017-11-28T01:00:43Z

Build against Scala 2.11 finished with success

RussellSpitzer · 2017-11-29T05:50:58Z

So are we sure we want to aim this patch at Master? I don't think we are going to do another big public release. So if we want to have this capability it would be nice if we could find a way of back-porting it to b1.6 or b2.0 I know we slightly change one method signature but since it's an internal api it should be ok even on the earlier branches... thoughts?

aashishs101 · 2017-12-04T16:19:40Z

@RussellSpitzer, since it is an internal method with a default parameter, I'm guessing it shouldn't change much? I'm also not sure about all of the ramifications of doing a backport vs. a normal release. Would I need to make a new PR that reapplies these changes against b2.0?

aashishs101 · 2017-12-14T19:39:57Z

Hey @RussellSpitzer, any update on this?

RussellSpitzer · 2017-12-14T19:42:00Z

Sorry yeah, you need to change the PR target to b2.0

aashishs101 · 2017-12-15T18:55:26Z

test this please

RussellSpitzer and others added 30 commits February 28, 2017 14:12

Merge branch 'SPARKC-476-b1.6' into SPARKC-476-master

5d21efd

Merge tag 'v2.0.0'

3d5022c

Merge pull request #1092 from datastax/SPARKC-476-master

d97c03c

SPARKC-476: SessionProxy Should proxy all Runtime Interfaces

Merge branch 'b1.6'

5fe21d1

Merge branch 'b2.0'

86e37ab

SPARKC-466: Add a CassandraRDDMock for end users to use in Unit Testing

bdbd658

The CassandraRDDMock just passes through another RDD and pretends it is a CassandraRDD.

Merge branch 'b2.0'

94d1821

Update Doc References to Latest Version

16a6154

Merge pull request #1106 from datastax/SPARKC-466

a2b16cb

SPARKC-466: Add a CassandraRDDMock for end users to use in Unit Testing

Merge pull request #1103 from datastax/SPARKC-475

7585fba

SPARKC-475: Add implicit RowWriterFactory for RDD[Row]

Refresh Documentation to Use Spark 2.X concepts

85d5925

Fix doc links Adds Python Dictionary as Kwargs Example More examples, Common issues Minor edits for consistency Text and link edits

Merge pull request #1105 from datastax/2.0-DocRefresh

72ee19d

Refresh Documentation to Use Spark 2.X concepts

Merge branch 'b2.0'

f952474

Merge tag 'v2.0.1'

af45b15

Merge branch 'b2.0'

4fca10c

Merge branch 'b2.0'

2d9c102

Add 1.6.6 Api Doc Links

e19883d

Merge branch 'b2.0'

2378bd3

Preparing 2.0.2 Release

e63b771

* SPARKC 492: Protect against Size Estimate Overflows  * SPARKC-491: add java.time classes support to converters and sparkSQL  * SPARKC-470: Allow Writes to Static Columnns and Partition Keys

Merge tag 'v2.0.2'

2595231

Fix GenerateDocs

75eef87

Updated Api Doc Links

952a36d

Add link to spark-connector Slack channel at DataStax Academy Slack

8e7ee2f

Merge pull request #1114 from jaceklaskowski/readme-slack

58aca07

Link to spark-connector Slack channel at DataStax Academy Slack

[DOCS][MINOR] Formatting

3a33912

Merge pull request #1115 from jaceklaskowski/docs-formatting

8ea8163

[DOCS][MINOR] Formatting

SPARKC-493: Fix generate docs script

fe9911b

For some reason the logic for finding the module names was not working (at least on my Mac). Here I simplify it by just iterating over the two folders we know about, therefore making sure the output files exist in their own separate folders.

Merge pull request #1121 from datastax/b2.0

02089bb

B2.0

Merge branch 'b1.6'

012b0e1

artem-aliev and others added 6 commits August 9, 2017 15:59

Merge pull request #1131 from datastax/DSP-13063

a0e5819

Add support for LcalDate year only parsing

Merge branch 'b1.6'

b7af81b

Merge branch 'master' of github.com:datastax/spark-cassandra-connector

9224ccd

Merge branch 'b2.0'

cf7b9d1

Merge branch 'b2.0'

7bddee2

support partially specified writes from case classes

4ceaa20

datastax-bot added the cla-missing label Aug 23, 2017

datastax-bot removed the cla-missing label Aug 23, 2017

RussellSpitzer reviewed Aug 23, 2017

View reviewed changes

isTopLevel flag for nested UDFs should require no unmapped columns

96ab9c4

aashishs101 commented Aug 24, 2017

View reviewed changes

fix integration test

727a7e4

aashishs101 changed the base branch from master to b2.0 December 15, 2017 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support partially specified writes from case classes #1139

Support partially specified writes from case classes #1139

aashishs101 commented Aug 23, 2017

datastax-bot commented Aug 23, 2017

datastax-bot commented Aug 23, 2017

RussellSpitzer Aug 23, 2017

RussellSpitzer Aug 23, 2017

RussellSpitzer Aug 23, 2017

aashishs101 Aug 24, 2017

RussellSpitzer Nov 20, 2017

aashishs101 Aug 24, 2017

aashishs101 commented Nov 20, 2017

RussellSpitzer commented Nov 20, 2017

aashishs101 commented Nov 21, 2017

RussellSpitzer commented Nov 28, 2017

RussellSpitzer commented Nov 28, 2017

RussellSpitzer commented Nov 28, 2017

ds-jenkins-builds commented Nov 28, 2017

ds-jenkins-builds commented Nov 28, 2017

RussellSpitzer commented Nov 29, 2017

aashishs101 commented Dec 4, 2017

aashishs101 commented Dec 14, 2017

RussellSpitzer commented Dec 14, 2017

aashishs101 commented Dec 15, 2017

Support partially specified writes from case classes #1139

Are you sure you want to change the base?

Support partially specified writes from case classes #1139

Conversation

aashishs101 commented Aug 23, 2017

datastax-bot commented Aug 23, 2017

datastax-bot commented Aug 23, 2017

RussellSpitzer Aug 23, 2017

Choose a reason for hiding this comment

RussellSpitzer Aug 23, 2017

Choose a reason for hiding this comment

RussellSpitzer Aug 23, 2017

Choose a reason for hiding this comment

aashishs101 Aug 24, 2017

Choose a reason for hiding this comment

RussellSpitzer Nov 20, 2017

Choose a reason for hiding this comment

aashishs101 Aug 24, 2017

Choose a reason for hiding this comment

aashishs101 commented Nov 20, 2017

RussellSpitzer commented Nov 20, 2017

aashishs101 commented Nov 21, 2017

RussellSpitzer commented Nov 28, 2017

RussellSpitzer commented Nov 28, 2017

RussellSpitzer commented Nov 28, 2017

ds-jenkins-builds commented Nov 28, 2017

ds-jenkins-builds commented Nov 28, 2017

RussellSpitzer commented Nov 29, 2017

aashishs101 commented Dec 4, 2017

aashishs101 commented Dec 14, 2017

RussellSpitzer commented Dec 14, 2017

aashishs101 commented Dec 15, 2017