Kafka parallel consuming
For very large topics when you need more parallelism (especially on the insert side) you may use several tables with the same pipeline (pre 20.9) or enable
kafka_thread_per_consumer (after 20.9).
kafka_num_consumers = N, kafka_thread_per_consumer=1
- the inserts will happen in parallel (without that setting inserts happen linearly)
- enough partitions are needed.
kafka_num_consumersis limited by number of physical cores (half of vCPUs).
kafka_disable_num_consumers_limitcan be used to override the limit.
background_message_broker_schedule_pool_sizeis 16 by default, you may need to increase if using more than 16 consumers
kafka_num_consumers with keeping
kafka_thread_per_consumer=0 may improve consumption & parsing speed, but flushing & committing still happens by a single thread there (so inserts are linear).