
We continue with a series of posts about Apache Druid. In the first part, we took a look at the Apache Druid itself – its architecture and monitoring, in the second part – we ran a PostgreSQL cluster and set up its monitoring.
Next tasks:
- switch Druid to PostgreSQL as metadata storage instead of Apache Derby
- get rid of the Apache ZooKeeper, and replace it with the
druid-kubernetes-extensions
Let’s get started with PostgreSQL.
Contents
Configuring Apache Druid with PostgreSQL
See PostgreSQL Metadata Store and Metadata Storage.
PostgreSQL users
Let’s go back to the manifests/minimal-master-replica-svcmonitor.yaml file, which was used to create the PostgreSQL cluster – add a druid user, and a database druid:
...
users:
zalando: # database owner
- superuser
- createdb
foo_user: [] # role for application foo
druid:
- createdb
databases:
foo: zalando # dbname: owner
druid: druid
...
Upgrade the cluster:
[simterm]
$ kubectl apply -f maniapplyminimal-master-replica-svcmonitor.yaml
[/simterm]
Retrieve the password of the user druid :
[simterm]
$ kubectl -n test-pg get secret druid.acid-minimal-cluster.credentials.postgresql.acid.zalan.do -o 'jsonpath={.data.password}' | base64 -d
Zfqeb0oJnW3fcBCZvEz1zyAn3TMijIvdv5D8WYOz0Y168ym6fXahta05zJjnd3tY
[/simterm]
Open port to the PostgreSQL master:
[simterm]
$ kubectl -n test-pg port-forward acid-minimal-cluster-0 6432:5432 Forwarding from 127.0.0.1:6432 -> 5432 Forwarding from [::1]:6432 -> 5432
[/simterm]
Connect:
[simterm]
$ psql -U druid -h localhost -p 6432 Password for user druid: psql (14.5, server 13.7 (Ubuntu 13.7-1.pgdg18.04+1)) SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off) Type "help" for help. druid=>
[/simterm]
Check the base – must be empty for now:
[simterm]
druid-> \dt Did not find any relations.
[/simterm]
Apache Druid metadata.storage config
Use the same druid-operator/examples/tiny-cluster.yaml config, that was used to create the Apache Druid cluster (see Spin Up Druid Cluster).
Currently, it’s configured for DerbyDB, so that the data is stored on the local disk:
...
druid.metadata.storage.type=derby
druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/druid/data/derbydb/metadata.db;create=true
druid.metadata.storage.connector.host=localhost
druid.metadata.storage.connector.port=1527
druid.metadata.storage.connector.createTables=true
...
For PostgreSQL, we need to specify connectURI, so find corresponding Kubernetes Service:
[simterm]
$ kubectl -n test-pg get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE acid-minimal-cluster ClusterIP 10.97.188.225 <none> 5432/TCP 14h
[/simterm]
Edit the manifest – comment the Derbi lines, and add a new config with the type=postgresql:
...
# Extensions
#
druid.extensions.loadList=["druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-kubernetes-extensions"]
...
# Metadata Store
#druid.metadata.storage.type=derby
#druid.metadata.storage.connector.connectURI=jdbc:derby://localhost:1527/druid/data/derbydb/metadata.db;create=true
#druid.metadata.storage.connector.host=localhost
#druid.metadata.storage.connector.port=1527
#druid.metadata.storage.connector.createTables=true
druid.metadata.storage.type=postgresql
druid.metadata.storage.connector.connectURI=jdbc:postgresql://acid-minimal-cluster.test-pg.svc.cluster.local/druid
druid.metadata.storage.connector.user=druid
druid.metadata.storage.connector.password=Zfqeb0oJnW3fcBCZvEz1zyAn3TMijIvdv5D8WYOz0Y168ym6fXahta05zJjnd3tY
druid.metadata.storage.connector.createTables=true
...
Update the Druid cluster:
[simterm]
$ kubectl -n druid apply -f examples/tiny-cluster.yaml
[/simterm]
Check the data in the Postgre database:
[simterm]
druid-> \dt
List of relations
Schema | Name | Type | Owner
--------+-----------------------+-------+-------
public | druid_audit | table | druid
public | druid_config | table | druid
public | druid_datasource | table | druid
public | druid_pendingsegments | table | druid
public | druid_rules | table | druid
public | druid_segments | table | druid
public | druid_supervisors | table | druid
[/simterm]
Nice!
If you need to migrate data from Derby to Postgre – see Metadata Migration.
Next, let’s get rid of the necessary in the ZooKeeper instance to replace it with Kubernetes itself.
Configuring Druid Kubernetes Service Discovery
Documentation for the module is here>>>.
Go back to the druid-operator/examples/tiny-cluster.yaml, update the config – turn on ZooKeeper, add a new extension druid-kubernetes-extensionsand additional parameters:
...
druid.extensions.loadList=["druid-kafka-indexing-service", "postgresql-metadata-storage", "druid-kubernetes-extensions"]
...
druid.zk.service.enabled=false
druid.serverview.type=http
druid.coordinator.loadqueuepeon.type=http
druid.indexer.runner.type=httpRemote
druid.discovery.type=k8s
# Zookeeper
#druid.zk.service.host=tiny-cluster-zk-0.tiny-cluster-zk
#druid.zk.paths.base=/druid
#druid.zk.service.compress=false
...
Upgrade the cluster:
[simterm]
$ kubectl -n druid apply -f examples/tiny-cluster.yam
[/simterm]
Druid RBAC Role
Add an RBAC Role and a RoleBinding, otherwise, we will get authorization errors like this:
ERROR [org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcherbroker] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher – Error while watching node type [BROKER]
org.apache.druid.java.util.common.RE: Expection in watching pods, code[403] and error[{“kind”:”Status”,”apiVersion”:”v1″,”metadata”:{},”status”:”Failure”,”message”:”pods is forbidden: User \”system:serviceaccount:druid:default\” cannot watch resource
\”pods\” in API group \”\” in the namespace \”druid\””,”reason”:”Forbidden”,”details”:{“kind”:”pods”},”code”:403}
Create a manifest from the documentation:
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: druid-cluster rules: - apiGroups: - "" resources: - pods - configmaps verbs: - '*' --- kind: RoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: druid-cluster subjects: - kind: ServiceAccount name: default roleRef: kind: Role name: druid-cluster apiGroup: rbac.authorization.k8s.io
Create new resources in Druid’s namespace:
[simterm]
$ kubectl -n druid apply -f druid-serviceaccout.yaml role.rbac.authorization.k8s.io/druid-cluster created rolebinding.rbac.authorization.k8s.io/druid-cluster created
[/simterm]
And in a minute or two check the logs again:
[simterm]
... 2022-09-21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Starting NodeRoleWatcher for [HISTORICAL]... 2022-09-21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Started NodeRoleWatcher for [HISTORICAL]. 2022-09-21T17:01:15,916 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Created NodeRoleWatcher for nodeRole [HISTORICAL]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Creating NodeRoleWatcher for nodeRole [PEON]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Starting NodeRoleWatcher for [PEON]... 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Started NodeRoleWatcher for [PEON]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Created NodeRoleWatcher for nodeRole [PEON]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Creating NodeRoleWatcher for nodeRole [INDEXER]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Starting NodeRoleWatcher for [INDEXER]... 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Started NodeRoleWatcher for [INDEXER]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Created NodeRoleWatcher for nodeRole [INDEXER]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Creating NodeRoleWatcher for nodeRole [BROKER]. 2022-09-21T17:01:15,917 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Starting NodeRoleWatcher for [BROKER]... 2022-09-21T17:01:15,918 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider$NodeRoleWatcher - Started NodeRoleWatcher for [BROKER]. 2022-09-21T17:01:15,918 INFO [main] org.apache.druid.k8s.discovery.K8sDruidNodeDiscoveryProvider - Created NodeRoleWatcher for nodeRole [BROKER]. ...
[/simterm]
Done.