Apache Druid: PostgreSQL as Metadata storage, and replace ZooKeeper with Kubernetes Extensions

By | 10/05/2022

We continue with a series of posts about Apache Druid. In the first part, we took a look at the Apache Druid itself – its architecture and monitoring, in the second part – we ran a PostgreSQL cluster and set up its monitoring.

Next tasks:

Let’s get started with PostgreSQL.

Configuring Apache Druid with PostgreSQL

See PostgreSQL Metadata Store and Metadata Storage.

PostgreSQL users

Let’s go back to the manifests/minimal-master-replica-svcmonitor.yaml file, which was used to create the PostgreSQL cluster – add a druid user, and a database druid:

...
  users:
    zalando:  # database owner
    - superuser
    - createdb
    foo_user: []  # role for application foo
    druid:       
    - createdb
  databases:
    foo: zalando  # dbname: owner
    druid: druid
...

Upgrade the cluster:

[simterm]

$ kubectl apply -f maniapplyminimal-master-replica-svcmonitor.yaml

[/simterm]

Retrieve the password of the user druid :

[simterm]

$ kubectl -n test-pg get secret druid.acid-minimal-cluster.credentials.postgresql.acid.zalan.do -o 'jsonpath={.data.password}' | base64 -d
Zfqeb0oJnW3fcBCZvEz1zyAn3TMijIvdv5D8WYOz0Y168ym6fXahta05zJjnd3tY

[/simterm]

Open port to the PostgreSQL master:

[simterm]

$ kubectl -n test-pg port-forward acid-minimal-cluster-0 6432:5432 
Forwarding from 127.0.0.1:6432 -> 5432
Forwarding from [::1]:6432 -> 5432

[/simterm]

Connect:

[simterm]

$ psql -U druid -h localhost -p 6432
Password for user druid: 
psql (14.5, server 13.7 (Ubuntu 13.7-1.pgdg18.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

druid=>

[/simterm]

Check the base – must be empty for now:

[simterm]

druid-> \dt
Did not find any relations.

[/simterm]

Apache Druid metadata.storage config

Use the same druid-operator/examples/tiny-cluster.yaml config, that was used to create the Apache Druid cluster (see Spin Up Druid Cluster).

Currently, it’s configured  for DerbyDB, so that the data is stored on the local disk:

...
    druid.metadata.storage.type=derby
    druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/druid/data/derbydb/metadata.db;create=true
    druid.metadata.storage.connector.host=localhost
    druid.metadata.storage.connector.port=1527
    druid.metadata.storage.connector.createTables=true
...

For PostgreSQL, we need to specify connectURI, so find corresponding Kubernetes Service:

[simterm]

$ kubectl -n test-pg get svc
NAME                                       TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
acid-minimal-cluster                       ClusterIP   10.97.188.225   <none>        5432/TCP   14h

[/simterm]

Edit the manifest – comment the Derbi lines, and add a new config with the type=postgresql:

...
    # Extensions
    #
    druid.extensions.loadList=["druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-kubernetes-extensions"]
...
    # Metadata Store
    #druid.metadata.storage.type=derby
    #druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/druid/data/derbydb/metadata.db;create=true
    #druid.metadata.storage.connector.host=localhost
    #druid.metadata.storage.connector.port=1527
    #druid.metadata.storage.connector.createTables=true

    druid.metadata.storage.type=postgresql
    druid.metadata.storage.connector.connectURI=jdbc:postgresql://acid-minimal-cluster.test-pg.svc.cluster.local/druid
    druid.metadata.storage.connector.user=druid
    druid.metadata.storage.connector.password=Zfqeb0oJnW3fcBCZvEz1zyAn3TMijIvdv5D8WYOz0Y168ym6fXahta05zJjnd3tY
    druid.metadata.storage.connector.createTables=true
...

Update the Druid cluster:

[simterm]

$ kubectl -n druid apply -f examples/tiny-cluster.yaml

[/simterm]

Check the data in the Postgre database:

[simterm]

druid-> \dt
               List of relations
 Schema |         Name          | Type  | Owner 
--------+-----------------------+-------+-------
 public | druid_audit           | table | druid
 public | druid_config          | table | druid
 public | druid_datasource      | table | druid
 public | druid_pendingsegments | table | druid
 public | druid_rules           | table | druid
 public | druid_segments        | table | druid
 public | druid_supervisors     | table | druid

[/simterm]

Nice!

If you need to migrate data from Derby to Postgre – see Metadata Migration.

Next, let’s get rid of the necessary in the ZooKeeper instance to replace it with Kubernetes itself.

Configuring Druid Kubernetes Service Discovery

Documentation for the module is here>>>.

Go back to the druid-operator/examples/tiny-cluster.yaml, update the config – turn on ZooKeeper, add a new extension druid-kubernetes-extensionsand additional parameters:

...
    druid.extensions.loadList=["druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-kubernetes-extensions"]
    ...
    druid.zk.service.enabled=false
    druid.serverview.type=http
    druid.coordinator.loadqueuepeon.type=http
    druid.indexer.runner.type=httpRemote
    druid.discovery.type=k8s

    # Zookeeper
    #druid.zk.service.host=tiny-cluster-zk-0.tiny-cluster-zk
    #druid.zk.paths.base=/druid
    #druid.zk.service.compress=false
...

Upgrade the cluster:

[simterm]

$ kubectl -n druid apply -f examples/tiny-cluster.yam

[/simterm]

Druid RBAC Role

Add an RBAC Role and a RoleBinding, otherwise, we will get authorization errors like this:

ERROR [org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcherbroker] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher – Error while watching node type [BROKER]
org.apache.druid.java.util.common.RE: Expection in watching pods, code[403] and error[{“kind”:”Status”,”apiVersion”:”v1″,”metadata”:{},”status”:”Failure”,”message”:”pods is forbidden: User \”system:serviceaccount:druid:default\” cannot watch resource
\”pods\” in API group \”\” in the namespace \”druid\””,”reason”:”Forbidden”,”details”:{“kind”:”pods”},”code”:403}

Create a manifest from the documentation:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: druid-cluster
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - configmaps
  verbs:
  - '*'
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: druid-cluster
subjects:
- kind: ServiceAccount
  name: default
roleRef:
  kind: Role
  name: druid-cluster
  apiGroup: rbac.authorization.k8s.io

Create new resources in Druid’s namespace:

[simterm]

$ kubectl -n druid apply -f druid-serviceaccout.yaml 
role.rbac.authorization.k8s.io/druid-cluster created
rolebinding.rbac.authorization.k8s.io/druid-cluster created

[/simterm]

And in a minute or two check the logs again:

[simterm]

...
2022-09-21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Starting NodeRoleWatcher for [HISTORICAL]...
2022-09-21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Started NodeRoleWatcher for [HISTORICAL].
2022-09-21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Created NodeRoleWatcher for nodeRole [HISTORICAL].
2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Creating NodeRoleWatcher for nodeRole [PEON].
2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Starting NodeRoleWatcher for [PEON]...
2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Started NodeRoleWatcher for [PEON].
2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Created NodeRoleWatcher for nodeRole [PEON].
2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Creating NodeRoleWatcher for nodeRole [INDEXER].
2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Starting NodeRoleWatcher for [INDEXER]...
2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Started NodeRoleWatcher for [INDEXER].
2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Created NodeRoleWatcher for nodeRole [INDEXER].
2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Creating NodeRoleWatcher for nodeRole [BROKER].
2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Starting NodeRoleWatcher for [BROKER]...
2022-09-21T17:01:15,918 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Started NodeRoleWatcher for [BROKER].
2022-09-21T17:01:15,918 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Created NodeRoleWatcher for nodeRole [BROKER].
...

[/simterm]

Done.