Vertica is a data warehouse solution intended to store petabytes amount of data.
The task, for now, is to spin up a kind of Proof of Concept for Vertica on an AWS EC2 to take a closer look at its setup and to give it to data-analytics to play with it a bit.
Vertica has good documentation here.
Contents
AWS
Check a list of the EC2 instances types which can be used for Vertica here>>>.
Find its AMI:
Let’s use a minimal instance type m4.4xlarge:
As this is PoC or Dev, then no need to create and configure VPC and Placement Group, also we will run the only EC2 instance for our cluster.
Beside that, the documentation told us:
add a number of drives equal to the number of physical cores in your instance. For example, for a c3.8xlarge instance, 8 drives. For an r3.2xlarge, add 4 drives.
But again – not now: will attach only one EBS-volume for data and obviously no RAID-0 needed:
Security Group already has all ports configured, you can just replace the ALL (0.0.0.0) rule to your office/home IP:
Connect to the instance:
[simterm]
$ ssh -i vertica-dev-eu-west-1.pem [email protected] Vertica Analytics Platform. You can find documentation for using Vertica on Amazon Web Services here: https://www.vertica.com/docs/latest/HTML/index.htm#Authoring/UsingVerticaOnAWS/UsingVerticaOnAWS.htm You can also access the full documentation set for all releases here: https://www.vertica.com/documentation/vertica/ [dbadmin@ip-172-31-9-216 ~]$
[/simterm]
License
The license file will be needed when you’ll create a multi-node cluster::
[simterm]
[dbadmin@ip-172-31-9-216 ~]$ cat /opt/vertica/config/licensing/vertica_community_edition.license.key Vertica Community Edition 2011-11-22 Perpetual 0 1TB CE Nodes 3 767***E87
[/simterm]
A data volume configuration
As for data we attached an additional EBS volume – need to create a partition on it and mount it to the host.
Check disks:
[simterm]
[root@ip-172-31-9-216 dbadmin]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvda 202:0 0 30G 0 disk └─xvda1 202:1 0 30G 0 part / xvdf 202:16 0 100G 0 disk
[/simterm]
xvdf
– here is our EBS.
Create one partition on it (be carefull with the sfdisk
):
[simterm]
[root@ip-172-31-9-216 dbadmin]# echo ';' | sfdisk /dev/xvdf ... Device Boot Start End Sectors Size Id Type /dev/xvdf1 2048 209715199 209713152 100G 83 Linux The partition table has been altered. Calling ioctl() to re-read partition table. Syncing disks.
[/simterm]
Check it:
[simterm]
[root@ip-172-31-9-216 dbadmin]# lsblk /dev/xvdf NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvdf 202:16 0 100G 0 disk └─xvdf1 202:17 0 100G 0 part
[/simterm]
The partition is created, format it now to ext4
:
[simterm]
[root@ip-172-31-9-216 dbadmin]# mkfs.ext4 /dev/xvdf1
[/simterm]
Create a directory to mount the partition in:
[simterm]
[root@ip-172-31-9-216 dbadmin]# mkdir -p /data/vertica
[/simterm]
Mount it:
[simterm]
[root@ip-172-31-9-216 dbadmin]# mount /dev/xvdf1 /data/vertica/ [root@ip-172-31-9-216 dbadmin]# ll /data/vertica/ total 16 drwx------ 2 root root 16384 Oct 10 08:38 lost+found
[/simterm]
A cluster set up
Run the next command – /opt/vertica/sbin/install_vertica --hosts 127.0.0.1 --dba-user-password-disabled --data-dir /data/vertica/
:
[simterm]
[root@ip-172-31-9-216 dbadmin]# /opt/vertica/sbin/install_vertica --hosts 127.0.0.1 --dba-user-password-disabled --data-dir /data/vertica/ Vertica Analytic Database 9.2.1-7 Installation Tool ... >> Validating node and cluster prerequisites... Prerequisites not fully met during local (OS) configuration for verify-127.0.0.1.xml: HINT (S0305): https://www.vertica.com/docs/9.2.x/HTML/index.htm#cshid=S0305 TZ is unset for dbadmin. Consider updating .profile or .bashrc FAIL (S0020): https://www.vertica.com/docs/9.2.x/HTML/index.htm#cshid=S0020 Readahead size of xvda (/dev/xvdf1) is too low for typical systems: 256 < 2048 System prerequisites failed. Threshold = WARN Hint: Fix above failures or use --failure-threshold Installation FAILED with errors.
[/simterm]
Vertica and readahead size
The error says:
Readahead size of xvda (/dev/xvda1) is too low
Read the documentation here>>>, set readahead to 2048 in the rc.local
to make the setting persistent:
[simterm]
[root@ip-172-31-9-216 dbadmin]# echo '/sbin/blockdev --setra 2048 /dev/xvda' >> /etc/rc.local [root@ip-172-31-9-216 dbadmin]# echo '/sbin/blockdev --setra 2048 /dev/xvdf' >> /etc/rc.local
[/simterm]
And set it right now manually:
[simterm]
[root@ip-172-31-9-216 dbadmin]# /sbin/blockdev --setra 2048 /dev/xvda [root@ip-172-31-9-216 dbadmin]# /sbin/blockdev --setra 2048 /dev/xvdf
[/simterm]
Check:
[simterm]
[root@ip-172-31-9-216 dbadmin]# blockdev --report /dev/xvda RO RA SSZ BSZ StartSec Size Device rw 2048 512 4096 0 32212254720 /dev/xvda [root@ip-172-31-9-216 dbadmin]# blockdev --report /dev/xvdf RO RA SSZ BSZ StartSec Size Device rw 2048 512 4096 0 107374182400 /dev/xvdf
[/simterm]
RA == 2048, OK.
Start over the cluster creation:
[simterm]
[root@ip-172-31-9-216 dbadmin]# /opt/vertica/sbin/install_vertica --hosts 127.0.0.1 --dba-user-password-disabled --data-dir /data/vertica/ Vertica Analytic Database 9.2.1-7 Installation Tool AWS Detected. Using AWS defaults. AWS Default: --point-to-point was not specified, enabling point-to-point spread communication by default while on AWS ... Default shell on nodes: 127.0.0.1 /bin/bash ... >> Creating or validating DB Admin user/group... Successful on hosts (1): 127.0.0.1 Provided DB Admin account details: user = dbadmin, group = verticadba, home = /home/dbadmin Creating group... Group already exists Validating group... Okay Creating user... User already exists Validating user... Okay >> Validating node and cluster prerequisites... Prerequisites not fully met during local (OS) configuration for verify-127.0.0.1.xml: HINT (S0305): https://www.vertica.com/docs/9.2.x/HTML/index.htm#cshid=S0305 TZ is unset for dbadmin. Consider updating .profile or .bashrc System prerequisites passed. Threshold = WARN >> Establishing DB Admin SSH connectivity... Installing/Repairing SSH keys for dbadmin >> Setting up each node and modifying cluster... Creating Vertica Data Directory... Updating agent... Creating node node0001 definition for host 127.0.0.1 ... Done >> Sending new cluster configuration to all nodes... AWS node-hour pricing enabled Starting agent... >> Completing installation... Running upgrade logic No spread upgrade required: /opt/vertica/config/vspread.conf not found on any node Installation complete. ... To add or remove hosts, select Cluster Management from the Advanced Menu.
[/simterm]
Create a database
Switch back to the dbadmin user and run the adminTools
utility:
[simterm]
[dbadmin@ip-172-31-9-216 ~]$ /opt/vertica/bin/adminTools
[/simterm]
Accept agreement:
Chose Configuration Menu:
Create a database:
Chose the database model, check the documentation here>>>.
Can leave it with the default value Enterprise and set the database’s name:
Set a password:
Leave localhost as we have only one node running in our cluster:
Set directories to be used to store data:
Okay here as we are running Dev, not a Production solution:
Confirm:
Wait a minute and check if a port is already opened:
[simterm]
[root@ip-172-31-9-216 dbadmin]# netstat -anp | grep 4804 udp 0 0 127.0.0.1:4804 0.0.0.0:* 8612/spread
[/simterm]
Now you can connect to the database as the dbadmin user using the /opt/vertica/bin/vsql
tool:
[simterm]
[root@ip-172-31-9-216 dbadmin]# /opt/vertica/bin/vsql testdb dbadmin Password: Welcome to vsql, the Vertica Analytic Database interactive terminal. Type: \h or \? for help with vsql commands \g or terminate with semicolon to execute query \q to quit testdb=>
[/simterm]
Done.
Read the Administrator’s Guide here>>>.