Vertica: a simple cluster set up and configuration on AWS EC2

By | 10/10/2019
 

Vertica is a data warehouse solution intended to store petabytes amount of data.

The task, for now, is to spin up a kind of Proof of Concept for Vertica on an AWS EC2 to take a closer look at its setup and to give it to data-analytics to play with it a bit.

Vertica has good documentation here.

AWS

Check a list of the EC2 instances types which can be used for Vertica here>>>.

Find its AMI:

Let’s use a minimal instance type m4.4xlarge:

As this is PoC or Dev, then no need to create and configure VPC and Placement Group, also we will run the only EC2 instance for our cluster.

Beside that, the documentation told us:

add a number of drives equal to the number of physical cores in your instance. For example, for a c3.8xlarge instance, 8 drives. For an r3.2xlarge, add 4 drives.

But again – not now: will attach only one EBS-volume for data and obviously no RAID-0 needed:

Security Group already has all ports configured, you can just replace the ALL (0.0.0.0) rule to your office/home IP:

Connect to the instance:

[simterm]

$ ssh -i vertica-dev-eu-west-1.pem [email protected]
Vertica Analytics Platform.
 
You can find documentation for using Vertica on Amazon Web Services here:
 
https://www.vertica.com/docs/latest/HTML/index.htm#Authoring/UsingVerticaOnAWS/UsingVerticaOnAWS.htm
 
You can also access the full documentation set for all releases here: https://www.vertica.com/documentation/vertica/

[dbadmin@ip-172-31-9-216 ~]$

[/simterm]

License

The license file will be needed when you’ll create a multi-node cluster::

[simterm]

[dbadmin@ip-172-31-9-216 ~]$ cat /opt/vertica/config/licensing/vertica_community_edition.license.key 
Vertica Community Edition 
2011-11-22
Perpetual
0
1TB CE Nodes 3
767***E87

[/simterm]

A data volume configuration

As for data we attached an additional EBS volume – need to create a partition on it and mount it to the host.

Check disks:

[simterm]

[root@ip-172-31-9-216 dbadmin]# lsblk 
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvda    202:0    0   30G  0 disk 
└─xvda1 202:1    0   30G  0 part /
xvdf    202:16   0  100G  0 disk

[/simterm]

xvdf – here is our EBS.

Create one partition on it (be carefull with the sfdisk):

[simterm]

[root@ip-172-31-9-216 dbadmin]# echo ';' | sfdisk /dev/xvdf
...
Device     Boot Start       End   Sectors  Size Id Type
/dev/xvdf1       2048 209715199 209713152  100G 83 Linux

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

[/simterm]

Check it:

[simterm]

[root@ip-172-31-9-216 dbadmin]# lsblk /dev/xvdf
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
xvdf    202:16   0  100G  0 disk 
└─xvdf1 202:17   0  100G  0 part

[/simterm]

The partition is created, format it now to ext4:

[simterm]

[root@ip-172-31-9-216 dbadmin]# mkfs.ext4 /dev/xvdf1

[/simterm]

Create a directory to mount the partition in:

[simterm]

[root@ip-172-31-9-216 dbadmin]# mkdir -p /data/vertica

[/simterm]

Mount it:

[simterm]

[root@ip-172-31-9-216 dbadmin]# mount /dev/xvdf1 /data/vertica/
[root@ip-172-31-9-216 dbadmin]# ll /data/vertica/
total 16
drwx------ 2 root root 16384 Oct 10 08:38 lost+found

[/simterm]

A cluster set up

Run the next command – /opt/vertica/sbin/install_vertica --hosts 127.0.0.1 --dba-user-password-disabled --data-dir /data/vertica/:

[simterm]

[root@ip-172-31-9-216 dbadmin]# /opt/vertica/sbin/install_vertica --hosts 127.0.0.1 --dba-user-password-disabled --data-dir /data/vertica/
Vertica Analytic Database 9.2.1-7 Installation Tool
...
>> Validating node and cluster prerequisites...

Prerequisites not fully met during local (OS) configuration for
verify-127.0.0.1.xml:
    HINT (S0305): https://www.vertica.com/docs/9.2.x/HTML/index.htm#cshid=S0305
        TZ is unset for dbadmin. Consider updating .profile or .bashrc
    FAIL (S0020): https://www.vertica.com/docs/9.2.x/HTML/index.htm#cshid=S0020
        Readahead size of xvda (/dev/xvdf1) is too low for typical systems: 256
        < 2048

System prerequisites failed.  Threshold = WARN
        Hint: Fix above failures or use --failure-threshold

Installation FAILED with errors.

[/simterm]

Vertica and readahead size

The error says:

Readahead size of xvda (/dev/xvda1) is too low

Read the documentation here>>>, set readahead to 2048 in the rc.local to make the setting persistent:

[simterm]

[root@ip-172-31-9-216 dbadmin]# echo '/sbin/blockdev --setra 2048 /dev/xvda' >> /etc/rc.local
[root@ip-172-31-9-216 dbadmin]# echo '/sbin/blockdev --setra 2048 /dev/xvdf' >> /etc/rc.local

[/simterm]

And set it right now manually:

[simterm]

[root@ip-172-31-9-216 dbadmin]# /sbin/blockdev --setra 2048 /dev/xvda
[root@ip-172-31-9-216 dbadmin]# /sbin/blockdev --setra 2048 /dev/xvdf

[/simterm]

Check:

[simterm]

[root@ip-172-31-9-216 dbadmin]# blockdev --report /dev/xvda
RO    RA   SSZ   BSZ   StartSec            Size   Device
rw  2048   512  4096          0     32212254720   /dev/xvda
[root@ip-172-31-9-216 dbadmin]# blockdev --report /dev/xvdf
RO    RA   SSZ   BSZ   StartSec            Size   Device
rw  2048   512  4096          0    107374182400   /dev/xvdf

[/simterm]

RA == 2048, OK.

Start over the cluster creation:

[simterm]

[root@ip-172-31-9-216 dbadmin]# /opt/vertica/sbin/install_vertica --hosts 127.0.0.1 --dba-user-password-disabled --data-dir /data/vertica/
Vertica Analytic Database 9.2.1-7 Installation Tool

AWS Detected. Using AWS defaults.
    AWS Default: --point-to-point was not specified,  enabling point-to-point spread communication by default while on AWS
...
Default shell on nodes:
127.0.0.1 /bin/bash
...
>> Creating or validating DB Admin user/group...

Successful on hosts (1): 127.0.0.1
    Provided DB Admin account details: user = dbadmin, group = verticadba, home = /home/dbadmin
    Creating group... Group already exists
    Validating group... Okay
    Creating user... User already exists
    Validating user... Okay

>> Validating node and cluster prerequisites...

Prerequisites not fully met during local (OS) configuration for
verify-127.0.0.1.xml:
    HINT (S0305): https://www.vertica.com/docs/9.2.x/HTML/index.htm#cshid=S0305
        TZ is unset for dbadmin. Consider updating .profile or .bashrc

System prerequisites passed.  Threshold = WARN

>> Establishing DB Admin SSH connectivity...

Installing/Repairing SSH keys for dbadmin

>> Setting up each node and modifying cluster...

Creating Vertica Data Directory...

Updating agent...
Creating node node0001 definition for host 127.0.0.1
... Done

>> Sending new cluster configuration to all nodes...

AWS node-hour pricing enabled
Starting agent...

>> Completing installation...

Running upgrade logic
No spread upgrade required: /opt/vertica/config/vspread.conf not found on any node
Installation complete.
...

To add or remove hosts, select Cluster Management from the Advanced Menu.

[/simterm]

Create a database

Switch back to the dbadmin user and run the adminTools utility:

[simterm]

[dbadmin@ip-172-31-9-216 ~]$ /opt/vertica/bin/adminTools

[/simterm]

Accept agreement:

Chose Configuration Menu:

Create a database:

Chose the database model, check the documentation here>>>.

Can leave it with the default value Enterprise and set the database’s name:

Set a password:

Leave localhost as we have only one node running in our cluster:

Set directories to be used to store data:

Okay here as we are running Dev, not a Production solution:

Confirm:

Wait a minute and check if a port is already opened:

[simterm]

[root@ip-172-31-9-216 dbadmin]# netstat -anp | grep 4804
udp        0      0 127.0.0.1:4804          0.0.0.0:*                           8612/spread

[/simterm]

Now you can connect to the database as the dbadmin user using the /opt/vertica/bin/vsql tool:

[simterm]

[root@ip-172-31-9-216 dbadmin]# /opt/vertica/bin/vsql testdb dbadmin
Password: 
Welcome to vsql, the Vertica Analytic Database interactive terminal.

Type:  \h or \? for help with vsql commands
       \g or terminate with semicolon to execute query
       \q to quit

testdb=>

[/simterm]

Done.

Read the Administrator’s Guide here>>>.