How to read data in a kafka topic to an RDD by specifying start and end offsets?

How to read data in a kafka topic to an RDD by specifying start and end offsets?

Tagged:

Comments

  • Example To read from kafka with offsets:

    val df =
    spark.readStream
    .format("kafka")
    .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
    .option("subscribe", "topic1,topic2")
    .option("startingOffsets", """{"topic1":{"0":23,"1":-2},"topic2":{"0":-2}}""")
    .option("endingOffsets", """{"topic1":{"0":50,"1":-1},"topic2":{"0":-1}}""")
    .load()
    The above will read the data available within the offsets, and then you can convert the columns to string, and cast to your object Message.

    val messageRDD: RDD[Message] =
    df.select(
    col("key").cast("string"),
    col("value").cast("string"),
    col("offset").cast("long"),
    col("timestamp").cast("long")
    ).as[Message].rdd

Sign In or Register to comment.