Neo4j: graph database – run with Docker and Cypher QL examples

By | 07/28/2020

In contrast to the RDMS (Relational Database Management System), where data objects are the main part, in a Graph Database, the relations between such data objects are playing the main role and are represented as dedicated objects which gives better performance especially when you have a lot of small data pieces tied to each other.

One of the first graph database systems was the Neo4j which will be examined in this post.

For queries, Neo4j uses the Cypher Query language with the cypher-shell tool, and to access a Neo4j database via common web-browser it has built-in UI. Also, Neo4j supports REST API.

Neo4j is distributed by the paid model, but it has free Community Edition with some limitations (no clustering, no online backups, only one user database, no scaling, etc), plus SaaS Aura. See their comparison тут>>>.

So, in this post, we will spin up the Neo4j Community Edition instance with Docker, will take a brief overview of its query language, and how a backup-restore can be performed.

Running Neo4j with Docker

Let’s run a container with Docker on a working laptop to see how it’s working. See the documentation here>>>.

[simterm]

$ docker run --rm --name neo4j -p 7474:7474 -p 7687:7687 neo4j:latest
...
Directories in use:
  home:         /var/lib/neo4j
  config:       /var/lib/neo4j/conf
  logs:         /logs
  plugins:      /var/lib/neo4j/plugins
  import:       /var/lib/neo4j/import
  data:         /var/lib/neo4j/data
  certificates: /var/lib/neo4j/certificates
  run:          /var/lib/neo4j/run
Starting Neo4j.
...
2020-07-27 10:11:30.394+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2020-07-27 10:11:31.640+0000 INFO  Remote interface available at http://localhost:7474/
2020-07-27 10:11:31.640+0000 INFO  Started

[/simterm]

Check it – open a browser, navigate to the http://localhost:7474, and log in with the default login-pass neo4j:neo4j:

Admin password

To set a new password – use the --env NEO4J_AUTH:

[simterm]

$ docker run --rm --name neo4j --env NEO4J_AUTH=neo4j/pass -p 7474:7474 -p 7687:7687 neo4j:latest
Changed password for user 'neo4j'.
...

[/simterm]

cypher-shell

To work with the databases you can use REST API or a local tool – cypher-shell.

Connect to the container and ruin the shell:

[simterm]

$ docker exec -ti neo4j cypher-shell -u neo4j -p pass
Connected to Neo4j 4.1.0 at neo4j://localhost:7687 as user neo4j.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.
neo4j@neo4j> 

[/simterm]

Neo4j configuration file

In the container, the main configuration file located at the $NEO4J_HOME/conf/neo4j.conf path, e.g. /var/lib/neo4j/conf/neo4j.conf:

[simterm]

root@65d8061ac13e:/var/lib/neo4j# head /var/lib/neo4j/conf/neo4j.conf 
#*****************************************************************
# Neo4j configuration
#
# For more details and a complete list of settings, please see
# https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/
#*****************************************************************

# The name of the default database
#dbms.default_database=neo4j

[/simterm]

To redefine any setting – mount  anew config file to the  /conf directory of the container.

All settings for the neo4j.conf can be found here>>>.

To get current config from the shell – use the dbms.listConfig() call:

[simterm]

neo4j@neo4j> CALL dbms.listConfig()
             YIELD name, value
             WHERE name STARTS WITH 'dbms.default'
             RETURN name, value
             ORDER BY name
             LIMIT 3;
+-------------------------------------------------+
| name                              | value       |
+-------------------------------------------------+
| "dbms.default_advertised_address" | "localhost" |
| "dbms.default_database"           | "neo4j"     |
| "dbms.default_listen_address"     | "0.0.0.0"   |
+-------------------------------------------------+

3 rows available after 216 ms, consumed after another 13 ms

[/simterm]

cypher-shell && CQL

CREATE

Let’s play with data.

There is a great tutorial of the data types on the Tutorialspoint here>>>.

Create a new node:

[simterm]

neo4j@neo4j> create (test);
0 rows available after 56 ms, consumed after another 0 ms
Added 1 nodes

[/simterm]

DELETE

Delete it:

[simterm]

neo4j@neo4j> MATCH (test) DETACH DELETE test;
0 rows available after 32 ms, consumed after another 0 ms
Deleted 1 nodes

[/simterm]

To delete all records from a database – use the (n):

[simterm]

neo4j@neo4j> MATCH (n) detach delete n;

[/simterm]

Labels

Create a node with the label1 label with the Properties which holds two keys –  key1 and key2:

[simterm]

neo4j@neo4j> create (node1:label1 {key1: "value1", key2: "value2"} );
0 rows available after 47 ms, consumed after another 0 ms
Added 1 nodes, Set 2 properties, Added 1 labels

[/simterm]

Check it:

[simterm]

neo4j@neo4j> MATCH (node1) RETURN node1;
+--------------------------------------------+
| node1                                      |
+--------------------------------------------+
| (:label1 {key1: "value1", key2: "value2"}) |
+--------------------------------------------+

[/simterm]

Or by using RETURN – get the node right after creation, in the same query:

[simterm]

neo4j@neo4j> CREATE (node2:label2 {key1: "value1", key2: "value2"} ) RETURN node2;
+--------------------------------------------+
| node2                                      |
+--------------------------------------------+
| (:label2 {key1: "value1", key2: "value2"}) |
+--------------------------------------------+

[/simterm]

Check from the browser using match(n) return n to display all the records:

Relations

A new relationship can be created between any new nodes, or between already existing.

To create a Relation between new nodes – add the -[r:RelationName]->:

[simterm]

neo4j@neo4j> create (node3:label3 {key1: "value1", key2: "value2"}) -[r:RelationName]-> (node4:label4{key1: "value1", key2: "value2"}) RETURN node3, node4;
+-----------------------------------------------------------------------------------------+
| node3                                      | node4                                      |
+-----------------------------------------------------------------------------------------+
| (:label3 {key1: "value1", key2: "value2"}) | (:label4 {key1: "value1", key2: "value2"}) |
+-----------------------------------------------------------------------------------------+

1 row available after 88 ms, consumed after another 8 ms
Added 2 nodes, Created 1 relationships, Set 4 properties, Added 2 labels

[/simterm]

Check it:

To create a Relation between already existing nodes – use MATCH to select those nodes:

[simterm]

neo4j@neo4j> MATCH (node3:label3), (node4:label4) CREATE (node3) -[r:RelationName2]-> (node4) RETURN node3, node4;
+-----------------------------------------------------------------------------------------+
| node3                                      | node4                                      |
+-----------------------------------------------------------------------------------------+
| (:label3 {key1: "value1", key2: "value2"}) | (:label4 {key1: "value1", key2: "value2"}) |
+-----------------------------------------------------------------------------------------+

1 row available after 124 ms, consumed after another 9 ms
Created 1 relationships

[/simterm]

Backup && Restore

Data is stored in the $NEO4J_HOME/data which is actually a symlink to the /data, see here>>>.

Check directories:

[simterm]

root@65d8061ac13e:/var/lib/neo4j# ls -l /var/lib/neo4j/data
lrwxrwxrwx 1 root root 5 Jul 23 09:01 /var/lib/neo4j/data -> /data

root@65d8061ac13e:/var/lib/neo4j# ls -l /data/
total 12
drwxrwxrwx 4 neo4j neo4j 4096 Jul 27 11:19 databases
drwxr-xr-x 2 neo4j neo4j 4096 Jul 27 11:19 dbms
drwxrwxrwx 4 neo4j neo4j 4096 Jul 27 11:19 transactions

[/simterm]

Databases files are stored in the databases directory, where you can find two default databases – the system and neo4j, which can be found with the show databases:

[simterm]

neo4j@neo4j>  show databases;
+------------------------------------------------------------------------------------------------+
| name     | address          | role         | requestedStatus | currentStatus | error | default |
+------------------------------------------------------------------------------------------------+
| "neo4j"  | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | TRUE    |
| "system" | "localhost:7687" | "standalone" | "online"        | "online"      | ""    | FALSE   |
+------------------------------------------------------------------------------------------------+

[/simterm]

The system database is used for the… Well, for the system itself, while nedo4j is the default user database.

Neo4j dump

Create a new directories which will hold our data:

[simterm]

$ mkdir -p /tmp/neo4/{data,logs}

[/simterm]

Restart the Neo4j container, mount those directories to it:

[simterm]

$ docker run --rm --name neo4j --env NEO4J_AUTH=neo4j/pass -p 7474:7474 -p 7687:7687 -v /tmp/neo4/data/:/data -v /tmp/neo4/logs/:/logs neo4j:latest
Changed password for user 'neo4j'.
Directories in use:
  home:         /var/lib/neo4j
  config:       /var/lib/neo4j/conf
  logs:         /logs
  plugins:      /var/lib/neo4j/plugins
  import:       /var/lib/neo4j/import
  data:         /var/lib/neo4j/data
  certificates: /var/lib/neo4j/certificates
  run:          /var/lib/neo4j/run
...

[/simterm]

Check the data on the host:

[simterm]

$ ll /tmp/neo4/data/databases/
total 0
drwxr-xr-x 2 7474 7474 720 Jul 27 16:07 neo4j
-rw-r--r-- 1 7474 7474   0 Jul 27 16:07 store_lock
drwxr-xr-x 3 7474 7474 740 Jul 27 16:07 system

[/simterm]

Connet, create a new record:

[simterm]

$ docker exec -ti neo4j cypher-shell -u neo4j -p pass
neo4j@neo4j> create (test:tobackup);
0 rows available after 131 ms, consumed after another 0 ms
Added 1 nodes

[/simterm]

To create a database dump you first need to stop the instance (as the Community Edition doesn’t have ability for the online backups):

[simterm]

root@771f04312148:/var/lib/neo4j# neo4j-admin dump --database=neo4j --to=/data/backups/
The database is in use. Stop database 'neo4j' and try again.

[/simterm]

So, exit from the container and stop it:

[simterm]

$ docker stop neo4j
neo4j

[/simterm]

Start it over but at this time add the bash command to prevent Neo4j service from starting:

[simterm]

$ docker run -ti --rm --name neo4j --env NEO4J_AUTH=neo4j/pass -p 7474:7474 -p 7687:7687 -v /tmp/neo4/data/:/data -v /tmp/neo4/logs/:/logs neo4j:latest bash
neo4j@6d4e9854bc1d:~$

[/simterm]

Create a dump:

[simterm]

neo4j@015ba14bdba2:~$ mkdir /data/backup
neo4j@015ba14bdba2:~$ neo4j-admin dump --database=neo4j --to=/data/backup/
Done: 34 files, 250.8MiB processed.

[/simterm]

Check it:

[simterm]

neo4j@015ba14bdba2:~$ ls -l /data/backup/
total 12
-rw-r--r-- 1 neo4j neo4j 9971 Jul 27 13:46 neo4j.dump

[/simterm]

Restore

On the host create a new set of directories – for the second Neo4j instance:

[simterm]

$ mkdir -p /tmp/neo4-2/{data,logs}

[/simterm]

Copy the backups directory from the first one:

[simterm]

$ sudo cp -r /tmp/neo4/data/backup/ /tmp/neo4-2/data/

[/simterm]

Run the service as usual, mount the /tmp/neo4-2, replace ports and its name:

[simterm]

$ docker run --rm --name neo4j-2 --env NEO4J_AUTH=neo4j/pass -p 7475:7474 -p 7688:7687 -v /tmp/neo4-2/data/:/data -v /tmp/neo4-2/logs/:/logs neo4j:latest

[/simterm]

Connect and check the data:

[simterm]

$ docker exec -ti neo4j-2 cypher-shell -u neo4j -p pass 
Connected to Neo4j 4.1.0 at neo4j://localhost:7687 as user neo4j.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.
neo4j@neo4j> match (n) return n;
+---+
| n |
+---+
+---+

[/simterm]

Okay – nothing found here as this is a brand new database.

Exit from the container, stop it and run over with the bash:

[simterm]

$ docker run -ti --rm --name neo4j-2 --env NEO4J_AUTH=neo4j/pass -p 7475:7474 -p 7688:7687 -v /tmp/neo4-2/data/:/data -v /tmp/neo4-2/logs/:/logs neo4j:latest bash
neo4j@b0f324cb7c9b:~$

[/simterm]

Load the dump to the database with the --force key as the default neo4j database already present:

[simterm]

neo4j@7bca892e9538:~$ neo4j-admin load --from=/data/backup/neo4j.dump --database=neo4j --force
Done: 34 files, 250.8MiB processed.

[/simterm]

Exit, restart container again in the normal way to start the Neo4j process:

[simterm]

$ docker run -ti --rm --name neo4j-2 --env NEO4J_AUTH=neo4j/pass -p 7475:7474 -p 7688:7687 -v /tmp/neo4-2/data/:/data -v /tmp/neo4-2/logs/:/logs neo4j:latest

[/simterm]

Connect, check:

[simterm]

neo4j@neo4j> match (n) return n;
+-------------+
| n           |
+-------------+
| (:tobackup) |
+-------------+

[/simterm]

Our record is on its place – all done.