How to read data in a kafka topic to an RDD by specifying start and end offsets?
How to read data in a kafka topic to an RDD by specifying start and end offsets?
Tagged:
It looks like you're new here. If you want to get involved, click one of these buttons!
Comments
Example To read from kafka with offsets:
val df =
spark.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1,topic2")
.option("startingOffsets", """{"topic1":{"0":23,"1":-2},"topic2":{"0":-2}}""")
.option("endingOffsets", """{"topic1":{"0":50,"1":-1},"topic2":{"0":-1}}""")
.load()
The above will read the data available within the offsets, and then you can convert the columns to string, and cast to your object Message.
val messageRDD: RDD[Message] =
df.select(
col("key").cast("string"),
col("value").cast("string"),
col("offset").cast("long"),
col("timestamp").cast("long")
).as[Message].rdd