Install and configure Citus Data Extension in PostgreSQL 15 on Alma Linux 8



Node preparations:

It is assumed that all nodes have PostgreSQL 15 installed on them. If SSL is not configured during PostgreSQL installation, it will be set up by the Citus Extension, as SSL has been required for node communication since Citus 8.1. For a multi-node installation, a minimum of three nodes is needed, with two functioning as worker nodes and one as the coordinator. Establish networks between nodes so they can reach each other. Allow the nodes through firewall rules for all nodes or disable firewalls on the nodes.


Install Citus on all nodes:

Execute on all nodes. 


Step-1: Install Citus extension(v12.0) from linux packages.

curl https://install.citusdata.com/community/rpm.sh > add-citus-repo.sh

sudo bash add-citus-repo.sh

sudo yum install -y citus120_15


Step-2: To add Citus to your local PostgreSQL database, add the following to postgresql.conf:

echo "shared_preload_libraries = 'citus'" | sudo tee -a /var/lib/pgsql/15/data/postgresql.conf


Step-3: Configure connection and authentication

sudo vi /var/lib/pgsql/15/data/postgresql.conf

listen_addresses = '*'

max_prepared_transactions = 200


sudo vi /postgresql_data/data/pg_hba.conf

# required as the other hosts will connect through postgres user 

host  all    all     192.168.56.0/24      scram-sha-256   


Step-4: Restart the postgres service for the changes to take effect.

sudo su - postgres 

/usr/pgsql-15/bin/pg_ctl restart  -D /var/lib/pgsql/15/data/


Step-5: Create citus extension.

sudo su - postgres

psql -U postgres -d postgres  -c "CREATE EXTENSION citus;"


Step-6 : Change password for the worker nodes same as the coordinator node.

sudo su - postgres 

psql -U postgres -d postgres

ALTER USER postgres WITH PASSWORD 'coordinator_password';


Step-7: Create .pgpass file under home directory of postgres user and make entries for the postgres user for all nodes. Do this on all workers.

sudo su - postgres

vi .pgpass

localhost:5432:postgres:postgres:coordinator_password

citus-coordinator:5432:postgres:postgres:coordinator_password

citus-worker1:5432:postgres:postgres:coordinator_password

citus-worker2:5432:postgres:postgres:coordinator_password


Step-8: Change pgpass files permission on all nodes.

sudo su - postgres

chmod 600 .pgpass


Execute on the coordinator node:


Step-1: Add Coordinator node information.

Register the hostname that future workers will use to connect to the coordinator node. Supply the master nodes machine host names and default postgres port 5432.

psql -U postgres -d postgres  -c "SELECT citus_set_coordinator_host('citus-coordinator', 5432);"


Step-2: Add the worker nodes. Supply the master node host names.

psql -U postgres -d postgres -c "SELECT * from citus_add_node('citus-worker1', 5432);"

psql -U postgres -d postgres -c "SELECT * from citus_add_node('citus-worker2', 5432);"


Step-3: Make sure the configuration is successful.

psql -U postgres -d postgres -c "SELECT * FROM citus_get_active_worker_nodes();"


Step-4: Show the shard nodes.

psql -U postgres -d postgres -c "SELECT * from pg_dist_node;"


Step-5: Check for the Citus miantenance daemon is running on all nodes.

psql -U postgres -d postgres -c "SELECT * FROM pg_stat_activity WHERE application_name = 'Citus Maintenance Daemon';"

psql -U postgres -d postgres -c "SELECT * FROM pg_stat_activity WHERE application_name = 'Citus Maintenance Daemon';"

psql -U postgres -d postgres -c "SELECT * FROM pg_stat_activity WHERE application_name = 'Citus Maintenance Daemon';"


Load sample data on the cluster:


Step-1: Connect to the citus coordinator node by ssh to postgres os user.

Download the required files from the urls:

https://docs.citusdata.com/en/v12.0/get_started/tutorial_multi_tenant.html#data-model-and-sample-data

https://docs.citusdata.com/en/v12.0/get_started/tutorial_realtime_analytics.html#data-model-and-sample-data


Step-2: Create tables and keys.

psql -U postgres -d postgres

CREATE TABLE companies (

    id bigint NOT NULL,

    name text NOT NULL,

    image_url text,

    created_at timestamp without time zone NOT NULL,

    updated_at timestamp without time zone NOT NULL

);

CREATE TABLE campaigns (

    id bigint NOT NULL,

    company_id bigint NOT NULL,

    name text NOT NULL,

    cost_model text NOT NULL,

    state text NOT NULL,

    monthly_budget bigint,

    blacklisted_site_urls text[],

    created_at timestamp without time zone NOT NULL,

    updated_at timestamp without time zone NOT NULL

);

CREATE TABLE ads (

    id bigint NOT NULL,

    company_id bigint NOT NULL,

    campaign_id bigint NOT NULL,

    name text NOT NULL,

    image_url text,

    target_url text,

    impressions_count bigint DEFAULT 0,

    clicks_count bigint DEFAULT 0,

    created_at timestamp without time zone NOT NULL,

    updated_at timestamp without time zone NOT NULL

);


Step-3: Distribute tables and load data.

SELECT create_distributed_table('companies', 'id');

SELECT create_distributed_table('campaigns', 'company_id');

SELECT create_distributed_table('ads', 'company_id');

\copy companies from 'companies.csv' with csv

\copy campaigns from 'campaigns.csv' with csv

\copy ads from 'ads.csv' with csv


Step-4: Run sample queries.

INSERT INTO companies VALUES (5000, 'New Company', 'https://randomurl/image.png', now(), now());

UPDATE campaigns

SET monthly_budget = monthly_budget*2

WHERE company_id = 5;

BEGIN;

DELETE FROM campaigns WHERE id = 46 AND company_id = 5;

DELETE FROM ads WHERE campaign_id = 46 AND company_id = 5;

COMMIT;


Step-5: Check the tables shards information.

SELECT * from pg_dist_shard; # outputs number of shards and related info

SELECT * FROM citus_shards; # outputs shards placement on the nodes

SELECT * FROM citus_dist_stat_activity; # outputs distributed query activities






Comments