We continue with a series of posts about Apache Druid. In the first part, we took a look at the Apache Druid itself – its architecture and monitoring, in the second part – we ran a PostgreSQL cluster and set up its monitoring.
Next tasks:
- switch Druid to PostgreSQL as metadata storage instead of Apache Derby
- get rid of the Apache ZooKeeper, and replace it with the
druid-kubernetes-extensions
Let’s get started with PostgreSQL.
Contents
Configuring Apache Druid with PostgreSQL
See PostgreSQL Metadata Store and Metadata Storage.
PostgreSQL users
Let’s go back to the manifests/minimal-master-replica-svcmonitor.yaml
file, which was used to create the PostgreSQL cluster – add a druid
user, and a database druid
:
... users: zalando: # database owner - superuser - createdb foo_user: [] # role for application foo druid: - createdb databases: foo: zalando # dbname: owner druid: druid ...
Upgrade the cluster:
[simterm]
$ kubectl apply -f maniapplyminimal-master-replica-svcmonitor.yaml
[/simterm]
Retrieve the password of the user druid :
[simterm]
$ kubectl -n test-pg get secret druid.acid-minimal-cluster.credentials.postgresql.acid.zalan.do -o 'jsonpath={.data.password}' | base64 -d Zfqeb0oJnW3fcBCZvEz1zyAn3TMijIvdv5D8WYOz0Y168ym6fXahta05zJjnd3tY
[/simterm]
Open port to the PostgreSQL master:
[simterm]
$ kubectl -n test-pg port-forward acid-minimal-cluster-0 6432:5432 Forwarding from 127.0.0.1:6432 -> 5432 Forwarding from [::1]:6432 -> 5432
[/simterm]
Connect:
[simterm]
$ psql -U druid -h localhost -p 6432 Password for user druid: psql (14.5, server 13.7 (Ubuntu 13.7-1.pgdg18.04+1)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type "help" for help. druid=>
[/simterm]
Check the base – must be empty for now:
[simterm]
druid-> \dt Did not find any relations.
[/simterm]
Apache Druid metadata.storage
config
Use the same druid-operator/examples/tiny-cluster.yaml
config, that was used to create the Apache Druid cluster (see Spin Up Druid Cluster).
Currently, it’s configured for DerbyDB, so that the data is stored on the local disk:
... druid.metadata.storage.type=derby druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/druid/data/derbydb/metadata.db;create=true druid.metadata.storage.connector.host=localhost druid.metadata.storage.connector.port=1527 druid.metadata.storage.connector.createTables=true ...
For PostgreSQL, we need to specify connectURI
, so find corresponding Kubernetes Service:
[simterm]
$ kubectl -n test-pg get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE acid-minimal-cluster ClusterIP 10.97.188.225 <none> 5432/TCP 14h
[/simterm]
Edit the manifest – comment the Derbi lines, and add a new config with the type=postgresql
:
... # Extensions # druid.extensions.loadList=["druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-kubernetes-extensions"] ... # Metadata Store #druid.metadata.storage.type=derby #druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/druid/data/derbydb/metadata.db;create=true #druid.metadata.storage.connector.host=localhost #druid.metadata.storage.connector.port=1527 #druid.metadata.storage.connector.createTables=true druid.metadata.storage.type=postgresql druid.metadata.storage.connector.connectURI=jdbc:postgresql://acid-minimal-cluster.test-pg.svc.cluster.local/druid druid.metadata.storage.connector.user=druid druid.metadata.storage.connector.password=Zfqeb0oJnW3fcBCZvEz1zyAn3TMijIvdv5D8WYOz0Y168ym6fXahta05zJjnd3tY druid.metadata.storage.connector.createTables=true ...
Update the Druid cluster:
[simterm]
$ kubectl -n druid apply -f examples/tiny-cluster.yaml
[/simterm]
Check the data in the Postgre database:
[simterm]
druid-> \dt List of relations Schema | Name | Type | Owner --------+-----------------------+-------+------- public | druid_audit | table | druid public | druid_config | table | druid public | druid_datasource | table | druid public | druid_pendingsegments | table | druid public | druid_rules | table | druid public | druid_segments | table | druid public | druid_supervisors | table | druid
[/simterm]
Nice!
If you need to migrate data from Derby to Postgre – see Metadata Migration.
Next, let’s get rid of the necessary in the ZooKeeper instance to replace it with Kubernetes itself.
Configuring Druid Kubernetes Service Discovery
Documentation for the module is here>>>.
Go back to the druid-operator/examples/tiny-cluster.yaml
, update the config – turn on ZooKeeper, add a new extension druid-kubernetes-extensions
and additional parameters:
... druid.extensions.loadList=["druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-kubernetes-extensions"] ... druid.zk.service.enabled=false druid.serverview.type=http druid.coordinator.loadqueuepeon.type=http druid.indexer.runner.type=httpRemote druid.discovery.type=k8s # Zookeeper #druid.zk.service.host=tiny-cluster-zk-0.tiny-cluster-zk #druid.zk.paths.base=/druid #druid.zk.service.compress=false ...
Upgrade the cluster:
[simterm]
$ kubectl -n druid apply -f examples/tiny-cluster.yam
[/simterm]
Druid RBAC Role
Add an RBAC Role and a RoleBinding, otherwise, we will get authorization errors like this:
ERROR [org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcherbroker] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher – Error while watching node type [BROKER]
org.apache.druid.java.util.common.RE: Expection in watching pods, code[403] and error[{“kind”:”Status”,”apiVersion”:”v1″,”metadata”:{},”status”:”Failure”,”message”:”pods is forbidden: User \”system:serviceaccount:druid:default\” cannot watch resource
\”pods\” in API group \”\” in the namespace \”druid\””,”reason”:”Forbidden”,”details”:{“kind”:”pods”},”code”:403}
Create a manifest from the documentation:
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: druid-cluster rules: - apiGroups: - "" resources: - pods - configmaps verbs: - '*' --- kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: druid-cluster subjects: - kind: ServiceAccount name: default roleRef: kind: Role name: druid-cluster apiGroup: rbac.authorization.k8s.io
Create new resources in Druid’s namespace:
[simterm]
$ kubectl -n druid apply -f druid-serviceaccout.yaml role.rbac.authorization.k8s.io/druid-cluster created rolebinding.rbac.authorization.k8s.io/druid-cluster created
[/simterm]
And in a minute or two check the logs again:
[simterm]
... 2022-09-21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Starting NodeRoleWatcher for [HISTORICAL]... 2022-09-21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Started NodeRoleWatcher for [HISTORICAL]. 2022-09-21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Created NodeRoleWatcher for nodeRole [HISTORICAL]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Creating NodeRoleWatcher for nodeRole [PEON]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Starting NodeRoleWatcher for [PEON]... 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Started NodeRoleWatcher for [PEON]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Created NodeRoleWatcher for nodeRole [PEON]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Creating NodeRoleWatcher for nodeRole [INDEXER]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Starting NodeRoleWatcher for [INDEXER]... 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Started NodeRoleWatcher for [INDEXER]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Created NodeRoleWatcher for nodeRole [INDEXER]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Creating NodeRoleWatcher for nodeRole [BROKER]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Starting NodeRoleWatcher for [BROKER]... 2022-09-21T17:01:15,918 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Started NodeRoleWatcher for [BROKER]. 2022-09-21T17:01:15,918 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Created NodeRoleWatcher for nodeRole [BROKER]. ...
[/simterm]
Done.