Vertica: a simple cluster set up and configuration on AWS EC2

By | 10/10/2019
 

Vertica is a data warehouse solution intended to store petabytes amount of data.

The task, for now, is to spin up a kind of Proof of Concept for Vertica on an AWS EC2 to take a closer look at its setup and to give it to data-analytics to play with it a bit.

Vertica has good documentation here.

AWS

Check a list of the EC2 instances types which can be used for Vertica here>>>.

Find its AMI:

Let’s use a minimal instance type m4.4xlarge:

As this is PoC or Dev, then no need to create and configure VPC and Placement Group, also we will run the only EC2 instance for our cluster.

Beside that, the documentation told us:

add a number of drives equal to the number of physical cores in your instance. For example, for a c3.8xlarge instance, 8 drives. For an r3.2xlarge, add 4 drives.

But again – not now: will attach only one EBS-volume for data and obviously no RAID-0 needed:

Security Group already has all ports configured, you can just replace the ALL (0.0.0.0) rule to your office/home IP:

Connect to the instance:

ssh -i vertica-dev-eu-west-1.pem dbadmin@34.242.8.164
Vertica Analytics Platform.
You can find documentation for using Vertica on Amazon Web Services here:
https://www.vertica.com/docs/latest/HTML/index.htm#Authoring/UsingVerticaOnAWS/UsingVerticaOnAWS.htm
You can also access the full documentation set for all releases here: https://www.vertica.com/documentation/vertica/
[dbadmin@ip-172-31-9-216 ~]$

License

The license file will be needed when you’ll create a multi-node cluster::

[dbadmin@ip-172-31-9-216 ~]$ cat /opt/vertica/config/licensing/vertica_community_edition.license.key
Vertica Community Edition
2011-11-22
Perpetual
1TB CE Nodes 3
767***E87

A data volume configuration

As for data we attached an additional EBS volume – need to create a partition on it and mount it to the host.

Check disks:

[root@ip-172-31-9-216 dbadmin]# lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda    202:0    0   30G  0 disk
└─xvda1 202:1    0   30G  0 part /
xvdf    202:16   0  100G  0 disk

xvdf – here is our EBS.

Create one partition on it (be carefull with the sfdisk):

[root@ip-172-31-9-216 dbadmin]# echo ';' | sfdisk /dev/xvdf
...
Device     Boot Start       End   Sectors  Size Id Type
/dev/xvdf1       2048 209715199 209713152  100G 83 Linux
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Check it:

[root@ip-172-31-9-216 dbadmin]# lsblk /dev/xvdf
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvdf    202:16   0  100G  0 disk
└─xvdf1 202:17   0  100G  0 part

The partition is created, format it now to ext4:

[root@ip-172-31-9-216 dbadmin]# mkfs.ext4 /dev/xvdf1

Create a directory to mount the partition in:

[root@ip-172-31-9-216 dbadmin]# mkdir -p /data/vertica

Mount it:

[root@ip-172-31-9-216 dbadmin]# mount /dev/xvdf1 /data/vertica/
[root@ip-172-31-9-216 dbadmin]# ll /data/vertica/
total 16
drwx------ 2 root root 16384 Oct 10 08:38 lost+found

A cluster set up

Run the next command – /opt/vertica/sbin/install_vertica --hosts 127.0.0.1 --dba-user-password-disabled --data-dir /data/vertica/:

[root@ip-172-31-9-216 dbadmin]# /opt/vertica/sbin/install_vertica --hosts 127.0.0.1 --dba-user-password-disabled --data-dir /data/vertica/
Vertica Analytic Database 9.2.1-7 Installation Tool
...
>> Validating node and cluster prerequisites...
Prerequisites not fully met during local (OS) configuration for
verify-127.0.0.1.xml:
HINT (S0305): https://www.vertica.com/docs/9.2.x/HTML/index.htm#cshid=S0305
TZ is unset for dbadmin. Consider updating .profile or .bashrc
FAIL (S0020): https://www.vertica.com/docs/9.2.x/HTML/index.htm#cshid=S0020
Readahead size of xvda (/dev/xvdf1) is too low for typical systems: 256
< 2048
System prerequisites failed.  Threshold = WARN
Hint: Fix above failures or use --failure-threshold
Installation FAILED with errors.
Vertica and readahead size

The error says:

Readahead size of xvda (/dev/xvda1) is too low

Read the documentation here>>>, set readahead to 2048 in the rc.local to make the setting persistent:

[root@ip-172-31-9-216 dbadmin]# echo '/sbin/blockdev --setra 2048 /dev/xvda' >> /etc/rc.local
[root@ip-172-31-9-216 dbadmin]# echo '/sbin/blockdev --setra 2048 /dev/xvdf' >> /etc/rc.local

And set it right now manually:

[root@ip-172-31-9-216 dbadmin]# /sbin/blockdev --setra 2048 /dev/xvda
[root@ip-172-31-9-216 dbadmin]# /sbin/blockdev --setra 2048 /dev/xvdf

Check:

[root@ip-172-31-9-216 dbadmin]# blockdev --report /dev/xvda
RO    RA   SSZ   BSZ   StartSec            Size   Device
rw  2048   512  4096          0     32212254720   /dev/xvda
[root@ip-172-31-9-216 dbadmin]# blockdev --report /dev/xvdf
RO    RA   SSZ   BSZ   StartSec            Size   Device
rw  2048   512  4096          0    107374182400   /dev/xvdf

RA == 2048, OK.

Start over the cluster creation:

[root@ip-172-31-9-216 dbadmin]# /opt/vertica/sbin/install_vertica --hosts 127.0.0.1 --dba-user-password-disabled --data-dir /data/vertica/
Vertica Analytic Database 9.2.1-7 Installation Tool
AWS Detected. Using AWS defaults.
AWS Default: --point-to-point was not specified,  enabling point-to-point spread communication by default while on AWS
...
Default shell on nodes:
127.0.0.1 /bin/bash
...
>> Creating or validating DB Admin user/group...
Successful on hosts (1): 127.0.0.1
Provided DB Admin account details: user = dbadmin, group = verticadba, home = /home/dbadmin
Creating group... Group already exists
Validating group... Okay
Creating user... User already exists
Validating user... Okay
>> Validating node and cluster prerequisites...
Prerequisites not fully met during local (OS) configuration for
verify-127.0.0.1.xml:
HINT (S0305): https://www.vertica.com/docs/9.2.x/HTML/index.htm#cshid=S0305
TZ is unset for dbadmin. Consider updating .profile or .bashrc
System prerequisites passed.  Threshold = WARN
>> Establishing DB Admin SSH connectivity...
Installing/Repairing SSH keys for dbadmin
>> Setting up each node and modifying cluster...
Creating Vertica Data Directory...
Updating agent...
Creating node node0001 definition for host 127.0.0.1
... Done
>> Sending new cluster configuration to all nodes...
AWS node-hour pricing enabled
Starting agent...
>> Completing installation...
Running upgrade logic
No spread upgrade required: /opt/vertica/config/vspread.conf not found on any node
Installation complete.
...
To add or remove hosts, select Cluster Management from the Advanced Menu.

Create a database

Switch back to the dbadmin user and run the adminTools utility:

[dbadmin@ip-172-31-9-216 ~]$ /opt/vertica/bin/adminTools

Accept agreement:

Chose Configuration Menu:

Create a database:

Chose the database model, check the documentation here>>>.

Can leave it with the default value Enterprise and set the database’s name:

Set a password:

Leave localhost as we have only one node running in our cluster:

Set directories to be used to store data:

Okay here as we are running Dev, not a Production solution:

Confirm:

Wait a minute and check if a port is already opened:

[root@ip-172-31-9-216 dbadmin]# netstat -anp | grep 4804
udp        0      0 127.0.0.1:4804          0.0.0.0:*                           8612/spread

Now you can connect to the database as the dbadmin user using the /opt/vertica/bin/vsql tool:

[root@ip-172-31-9-216 dbadmin]# /opt/vertica/bin/vsql testdb dbadmin
Password:
Welcome to vsql, the Vertica Analytic Database interactive terminal.
Type:  \h or \? for help with vsql commands
\g or terminate with semicolon to execute query
\q to quit
testdb=>

Done.

Read the Administrator’s Guide here>>>.