Sum up an array of columns per row based on a condition.

we need to sum up an array of columns per row based on a condition. we found a way to do it but it does not seem like a good idea when I have 20+ columns to sum up since
Wanted result is: Sum of values for all columns that end with "_val" where value is either 0 or 1 (or <2, i just want to exclude the value 3 right now)

val df1 = Seq(
("pr1", 1, 0, 3),
("pr2", 0, 0, 3),
("pr3", 1, 1, 3))
.toDF("id", "bla_val", "blub_val", "bli_val")
our solution with the required result in the column sum

val prodNames = df1.schema.fieldNames.filter(_.endsWith("_val"))
val ch = prodNames.map(x => col(x+"_redval"))

val df2 = df1.select(col("") +: (prodNames.map(c =>
when(col(c) === 1, lit(1))
.otherwise(lit(0)).as(c+"_redval"))): _
)

val df3 = df2.withColumn("sum", ch.reduce(+))
df3.show()
Example output:

+---+-------+--------+-------+--------------+---------------+--------------+---+
| id|bla_val|blub_val|bli_val|bla_val_redval|blub_val_redval|bli_val_redval|sum|
+---+-------+--------+-------+--------------+---------------+--------------+---+
|id1| 1| 0| 3| 1| 0| 0| 1|
|id2| 0| 0| 3| 0| 0| 0| 0|
|id3| 1| 1| 3| 1| 1| 0| 2|
+---+-------+--------+-------+--------------+---------------+--------------+---+

PLease help us in this

Tagged:
Sign In or Register to comment.